How to install Kafka using Docker
Install and configure a Docker environment to run Apache Kafka locally
Introducing the Apache Kafka ecosystem
Apache Kafka¹ is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
Apache Zookeeper² is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. It is an essential part of Kafka
.
This tutorial provides the means to execute Kafka
in a distributed architecture with a 3-node Broker³ cluster. Also, Zookeeper
is configured in Replicated mode⁴ - called ensemble
- to take advantage of distributed architecture.
The code used to create this tutorial is available in this repository so, you can follow along.
Containerising Apache Kafka
Prerequisites
In order to run this environment, you’ll need Docker
installed and Kafka’s CLI tools
.
- This tutorial was tested using
Docker Desktop⁵ for macOS Engine version 20.10.2
. - The CLI tools can be downloaded here as a
tar
orzip
. After the download is complete and the files extracted, you'll need to add it to yourPATH
environment variable. - If using
bash
, edit the bash_profile file available at~/.bash_profile
and add the following:PATH=$PATH:~/bin/confluent-6.0.1/bin
(Assuming that the files were extracted to the~/bin
location. Any other location will work as long as the path of thePATH
environment is targeting the right path). - If using
zshell
, you'll need to edit~/.zshrc
instead. - Update local
hosts
file. Execute theupdate-hosts.sh
script to update your local/etc/hosts
file to resolve the name of the containers.
bash update-hosts.sh
Docker Image
It was decided to use the Bitnami⁶ images for its ready-to-use approach, well-known reliability, and it’s a non-root container⁷.
Consult Bitnami official documentation for more information:
Docker Compose
A docker-compose file
⁸ is provided to create the local development environment.
To spin up the environment execute:
docker-compose -f ./build/docker-compose.yml build
Zookeeper Configuration
The containers only expose the client port
accordingly to the container:
zk-1:12181
zk-2:22181
zk-3:32181
Zookeeper Environment Variables
ALLOW_ANONYMOUS_LOGIN
- Allow to accept connections from unauthenticated users.ZOO_SERVER_ID
- ID of the server in the ensemble.ZOO_SERVERS
- Comma, space, or semi-colon separated list of servers.ZOO_PORT_NUMBER
- ZooKeeper client portZOO_TICK_TIME
: Basic time unit in milliseconds used by ZooKeeper for heartbeats.ZOO_INIT_LIMIT
: ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader.ZOO_SYNC_LIMIT
: How far out of date a server can be from a leader.ZOO_AUTOPURGE_PURGEINTERVAL
: the time interval in hours for which the purge task has to be triggered.ZOO_AUTOPURGE_SNAPRETAINCOUNT
: number of most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir to keep.ZOO_MAX_CLIENT_CNXNS
: Limits the number of concurrent connections that a single client may make to a single member of the ZooKeeper ensemble.ZOO_HEAP_SIZE
: Size in MB for the Java Heap options (Xmx and XMs).
Kafka Configuration
The containers only expose the following ports:
INTERNAL
9092 - for intra-cluster communicationEXTERNAL
- for connection from local computer:
kafka-1: 19093
kafka-2: 29093
kafka-3: 39093
Kafka Broker Environment Variables
KAFKA_CFG_ZOOKEEPER_CONNECT
: Comma separated host:port pairs, each corresponding to a Zookeeper Server.ALLOW_PLAINTEXT_LISTENER
: Allow to use the PLAINTEXT listener.KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP
: Map between listener names and security protocols.KAFKA_CFG_LISTENERS
: Comma-separated list of URIs we will listen on and the listener names.KAFKA_CFG_ADVERTISED_LISTENERS
: Listeners to publish to ZooKeeper for clients to use, if different than the listeners config property.KAFKA_INTER_BROKER_LISTENER_NAME
: Name of listener used for communication between brokers.KAFKA_CFG_NUM_PARTITIONS
: The default number of log partitions per topicKAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE
: Allow automatic topic creation on the broker when subscribing to or assigning a topic.KAFKA_CFG_DEFAULT_REPLICATION_FACTOR
: default replication factors for automatically created topicsKAFKA_CFG_OFFSETS_TOPIC_REPLICATION_FACTOR
: The replication factor for the offsets topicKAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR
: The replication factor for the transaction topicKAFKA_HEAP_OPTS
: Kafka's Java Heap size.KAFKA_CFG_BROKER_ID
: Kafka's broker custom id.
Validate Environment
- Confirm that the containers are up and running:
docker-compose -f build/docker-compose.yml ps
it should return similar information:
Name Command State Ports
--------------------------------------------------------------------
build_kafka-ui_1 /kafdrop.sh Up 0.0.0.0:8080->8080/tcp
kafka-1 /opt/bitnami/scripts/kafka ... Up 0.0.0.0:19093->19093/tcp, 9092/tcp
kafka-2 /opt/bitnami/scripts/kafka ... Up 0.0.0.0:29093->29093/tcp, 9092/tcp
kafka-3 /opt/bitnami/scripts/kafka ... Up 0.0.0.0:39093->39093/tcp, 9092/tcp
zk-1 /opt/bitnami/scripts/zooke ... Up 0.0.0.0:12181->12181/tcp, 2181/tcp, 2888/tcp, 3888/tcp, 8080/tcp
zk-2 /opt/bitnami/scripts/zooke ... Up 2181/tcp, 0.0.0.0:22181->22181/tcp, 2888/tcp, 3888/tcp, 8080/tcp
zk-3 /opt/bitnami/scripts/zooke ... Up 2181/tcp, 2888/tcp, 0.0.0.0:32181->32181/tcp, 3888/tcp, 8080/tcp
- Certify that communication can be established with the
Zookeeper
:
zookeeper-shell zk-1:12181 get /zookeeper/config
it should return similar information:
Connecting to zk-1:12181WATCHER::WatchedEvent state:SyncConnected type:None path:null
server.1=0.0.0.0:2888:3888:participant;0.0.0.0:12181
server.2=zookeeper2:2888:3888:participant;0.0.0.0:12181
server.3=zookeeper3:2888:3888:participant;0.0.0.0:12181
version=0
- List the
Kafka Brokers
registered withZookeeper
executing:
zookeeper-shell zk-1:12181 ls /brokers/ids
it should return similar information:
Connecting to zk-1:12181WATCHER::WatchedEvent state:SyncConnected type:None path:null
[1001, 1002, 1003]
Kafdrop
Kafdrop⁹ is a web UI for viewing Kafka topics and browsing consumer groups. The tool displays information such as brokers, topics, partitions, consumers, and lets you view messages.
To use Kafdrop we must access http://localhost:8080. This application is a graphical representation of:
- View brokers, all topics and settings
- View all data on topics
- View messages sent
- View consumer groups
- Create topics (with few configurations)
Conclusion
This tutorial presented the Docker environment to run a 3-node Apache Kafka environment locally. With this local environment up and running you can begin your adventures in the distributed system world.
I vividly recommend you to take a look at the Apache Kafka Quickstart your next stop and get more hands-on experience.
References
- https://kafka.apache.org
- https://zookeeper.apache.org
- https://kafka.apache.org/intro#intro_nutshell
- https://zookeeper.apache.org/doc/r3.5.4-beta/zookeeperOver.html
- https://www.docker.com/products/docker-desktop
- https://bitnami.com/stack/kafka/containers
- https://docs.bitnami.com/tutorials/work-with-non-root-containers/
- https://docs.docker.com/compose/compose-file/
- https://github.com/obsidiandynamics/kafdrop