Table of Contents
- 1. Overview of Kafka and its Key Terminology
- 2. Managing Topics
- 2.1: Creating a Topic
- 2.2: Examining Topics
- 2.3: Listing Topics and Partitions
- 2.4: Deleting a Topic
- 3: Advanced Commands for Working with the Kafka CLI
- 4: Commands for Administering the Kafka Cluster
- 5: Frequently Asked Questions (FAQs) About the Kafka CLI
- Wrapping Up
Apache Kafka is a popular distributed streaming platform used for real-time data processing and streaming applications. Although the Kafka CLI is user-friendly, having a complete cheat sheet of commands can be helpful for managing topics, consumer groups, partitions, and administering the cluster. This blog post will give you a set of essential commands for using the Kafka CLI. You'll learn how to create topics, configure retention policies, and more. It also includes frequently asked questions about the Kafka CLI as well as an overview of the different configuration options available. With this reference guide in hand, you'll be able to quickly and easily manage your Apache Kafka environment like a pro!
Note: This article covers commands for Apache Kafka 3.x. The location of the scripts may differ depending on the installation method and operating system.
1: Overview of Kafka and its Key Terminology
Apache Kafka is an open-source distributed streaming platform that provides real-time data processing and streaming applications. It is used by many organizations for their data ingestion and integration needs, as well as for powering microservices architectures.
Kafka's core components include topics, partitions, brokers, and producers. Topics are the channels through which messages are sent and received in Kafka. Partitions refer to the physical storage of these messages on a disk or other storage medium. Brokers are the servers that manage topics, while producers are the clients responsible for publishing messages to topics in Kafka.
Kafka's architecture involves one or more brokers running on machines connected by a network. Brokers communicate with each other via APIs. Producers can send messages to topics on any broker, while consumers subscribe to these topics to receive messages from brokers they connect to. Consumers can also specify how many times they want to receive each message so that they don't miss any important updates.
When working with Kafka’s CLI, you may encounter different types of messages including key/value pairs, collections of records/events (JSON), binary payloads (avro), etc. Each message has its own structure which includes fields such as timestamping information and metadata about where it came from. Understanding this structure will help you understand how data is being transmitted between services using Kafka’s command line interface (CLI).
2: Managing Topics
Managing topics with Kafka's CLI is an essential part of leveraging the power of the platform. A topic is a named channel or feed that stores messages within Kafka. Topics are the fundamental unit of data in Kafka, and they are unique within a given cluster.
The kafka-topics command is used to manage topics from the command line. This command can be used to create, list, describe, and delete topics. To create a topic, you need to provide three parameters: a unique name for the topic within the cluster, the number of partitions, and the replication factor that determines how many nodes will replicate each record.
Once a topic is created it can then be listed using kafka-topics -list. It will produce a list of all topics that exist in your cluster along with their associated partition counts and replication factors. You can use options like -describe and -configs to get more details about a topic, including configuration settings like retention and cleanup policies.
You can delete topics using kafka-topics -delete, but be careful as there is no undo option. Once you delete a topic, Kafka permanently deletes all messages written to it and you cannot recover them.
Understanding how to use these commands correctly allows you to efficiently manage your topics and take full advantage of all that Apache Kafka has to offer!
2.1: Creating a Topic
> bin/kafka-topics.sh \
-bootstrap-server localhost:9092 \
-create \
-topic my-topic \
-partitions 2 \
-replication-factor 2 \
-config retention.ms=86400000 \
-config retention.bytes=1073741824 \
-if-not-exists
# In this example, retention.ms is set to 86400000 milliseconds, which is equivalent to 1 day,
# and retention.bytes is set to 1073741824 bytes, which is equivalent to 1 gigabyte.
2.2: Examining Topics
# To get details of a specific topic
> bin/kafka-topics.sh \
-bootstrap-server localhost:9092 \
-describe \
-topic my-topic
# ..
# To produce message to a specific topic
> bin/kafka-console-producer.sh \
-bootstrap-server localhost:9092 \
-topic my-topic
Hello
World
..
..
Press Ctrl-C or Command+C to exit
2.3: Listing Topics and Partitions
The CLI of Apache Kafka makes it easy for users to gain insight into the topics and partitions stored within their clusters.
> bin/kafka-topics.sh \
-bootstrap-server localhost:9092 \
-list
2.4: Deleting a Topic
Deleting topics is a powerful tool for managing data streams in Apache Kafka. It allows users to free up resources and reclaim disk space, as well as improve the performance of the cluster. However, it’s important to understand the implications of deleting topics before acting.
> bin/kafka-topics.sh \
-bootstrap-server localhost:9092 \
-delete \
-topic my-topic
3: Advanced Commands for Working with the Kafka CLI
The Apache Kafka CLI is a powerful tool for managing topics and consumer groups, as well as administering the Kafka cluster. This section provides an in-depth look at advanced commands for working with the Kafka CLI. It will cover topics such as managing topics, consumer groups and partitions, and using replication factor and leader election to ensure data durability and availability. Additionally, it will explore the various flags that can be used to gain granular control over the Kafka CLI, as well as the different configuration options available to customize the Kafka CLI.
Using a replication factor is one of the most important aspects of working with Apache Kafka’s CLI. Replication factor is a setting that determines how many replicas of each partition exist in a cluster. By increasing this number you can increase data durability, which guarantees that your messages won’t be lost if one node goes down or experiences issues. Additionally, when used in conjunction with leader elections, it allows multiple nodes to act as leaders at any given time which helps prevent service disruptions due to single point failures.
3.2: Working with Consumers and Consumer Groups
Consumer groups and consumers are essential components of Apache Kafka. Consumer groups allow a group of related or similar consumers to be managed collectively, while consumers read and process messages from the Kafka topics. Working with consumer groups and consumers can be complex, but understanding the basics is key for successful management of your streaming data.
# To consume messages from a topic
> bin/kafka-console-consumer.sh \
-bootstrap-server localhost:9092 \
-topic my-topic \
-from-beginning
# ..
# To consume messages from a topic with limit
> bin/kafka-console-consumer.sh \
-bootstrap-server localhost:9092 \
-topic my-topic \
-from-beginning \
-max-messages 5
# ..
# To consume messages from a topic which are produced in last 5min
> bin/kafka-console-consumer.sh \
-bootstrap-server localhost:9092 \
-topic my-topic \
-from-beginning \
-offset $(bin/kafka-run-class.sh kafka.tools.GetOffsetShell -broker-list localhost:9092 -topic my-topic -time -300000 | awk -F ":" '{print $3}')
# To list all consumer groups
> bin/kafka-consumer-groups.sh \
-bootstrap-server localhost:9092 \
-list
# ..
# To get details of a specific consumer group
> bin/kafka-consumer-groups.sh \
-bootstrap-server localhost:9092 \
-describe \
-group my-group
# ..
# To reset a consumer group's offsets to a specific timestamp
> bin/kafka-consumer-groups.sh \
-bootstrap-server localhost:9092 \
-reset-offsets \
-to-datetime "2023-04-16T00:00:00.000" \
-group my-group \
-topic my-topic \
-execute
# ..
# To delete a consumer group
> bin/kafka-consumer-groups.sh \
-bootstrap-server localhost:9092 \
-delete \
-group my-group
# ..
3.3: Configuring Log Retention Policies and Compaction
Log retention policies and compaction are two essential components of Apache Kafka's architecture. Retention policies dictate how long messages can stay in a topic before they expire. Log compaction is the process of reducing log size by deleting older messages. Configuring both properly can make sure that data is stored efficiently and securely.
To begin configuring log retention, you must first set the time limit for how long each message should remain stored in the topic. This is done using the log.retention.ms
configuration parameter, which sets a time period (in milliseconds) after which messages will expire and be deleted from the topic. It's important to note that this setting only applies if no other conditions for expiry have been configured- otherwise, the latest condition will take precedence over this one.
// You can define a global log.retention.ms value for all topics in Kafka by adding
// the following line to your server.properties file
log.retention.ms=600000
// This will set the retention time for all topics to 10 minutes (600,000 milliseconds)
Compaction feature can reduce log size by deleting older messages when a new message with the same key arrives on a compacted topic. This ensures only one version of a message with a given key exists at any time, saving storage costs associated with redundant data. To enable compaction for a topic, use the log.cleanup.policy
configuration parameter and set it to compact. You can also configure the clean-up interval for compacted topics using the log.cleaner.*
configuration parameters; these parameters control how often compaction is performed and allow users to customize their compaction policy according to their needs and preferences.
// You can define the log.cleanup.policy setting in your server.properties file to specify the policy used for log compaction and deletion
log.cleanup.policy=compact
// delete: This policy deletes old log segments when the disk is full or when the retention time has been reached.
// compact: This policy enables log compaction, which removes duplicate records based on the record key. Only the latest record for each key is retained.
log.cleaner.delete.retention.ms=600000
// The time to retain deleted records before they are discarded, in milliseconds.
log.cleaner.io.buffer.size=524288 //512*1024
// The size of the I/O buffers used by the log cleaner, in bytes.
log.cleaner.threads=8
// The number of threads to use for log cleaning.
4: Commands for Administering the Kafka Cluster
Apache Kafka provides a comprehensive command line interface (CLI) for administering its clusters. This section provides an overview of the most essential commands for managing a Kafka cluster.
The first command to be aware of is kafka-configs.sh
, which allows you to configure brokers, set up replication, and specify ACLs (access control lists). With this command, you can also track throughput, and latency metrics as well as change Kafka's JMX metrics configuration. This command gives administrators full control over the cluster and guarantees that it is running optimally.
# Get topic configuration
> bin/kafka-configs.sh \
-bootstrap-server localhost:9092 \
-describe \
-entity-type topics \
-entity-name my-topic
# ..
# Update topic configuration
# This command will update the configuration of the topic my-topic by adding
# the configuration max.message.bytes with a value of 1048576.
> bin/kafka-configs.sh \
-bootstrap-server localhost:9092 \
-alter \
-entity-type topics \
-entity-name my-topic \
-add-config max.message.bytes=1048576
# ..
# Delete a topic configuration
# This command will delete the configuration max.message.bytes from the topic my-topic.
> bin/kafka-configs.sh \
-bootstrap-server localhost:9092 \
-alter \
-entity-type topics \
-entity-name my-topic \
-delete-config max.message.bytes
# List all configurations for all brokers:
> bin/kafka-configs.sh \
-bootstrap-server localhost:9092 \
-entity-type brokers \
-describe
# ..
# List specific configurations for a single broker
> bin/kafka-configs.sh \
-bootstrap-server localhost:9092 \
-entity-type brokers \
-entity-name 0 \
-describe \
-all
# ..
# Update a specific configuration for all brokers:
> bin/kafka-configs.sh \
-bootstrap-server localhost:9092 \
-entity-type brokers \
-alter \
-add-config 'max.connections=500'
# ..
# Update a specific configuration for a single broker:
> bin/kafka-configs.sh \
-bootstrap-server localhost:9092 \
-entity-type brokers \
-entity-name 0 \
-alter \
-add-config 'log.cleanup.policy=delete'
Secondly, kafka-acls
is the command used to manage access control lists in Apache Kafka's CLI. It allows you to grant permissions on resources such as topics and consumer groups and revoke those permissions when needed. Understanding how these commands are used correctly ensures that users have full control over their Kafka clusters while providing data security at all times.
# List all ACLs for a specific topic
> bin/kafka-acls.sh \
-bootstrap-server localhost:9092 \
-list \
-topic my-topic
# ..
# Add a new ACL for a specific user to a specific topic
> bin/kafka-acls.sh \
-bootstrap-server localhost:9092 \
-add \
-allow-principal User:myuser \
-operation Read \
-topic my-topic
# ..
# Remove an ACL for a specific user from a specific topic
> bin/kafka-acls.sh \
-bootstrap-server localhost:9092 \
-remove \
-allow-principal User:myuser \
-operation Read \
-topic my-topic
# ..
# List all ACLs for a specific user
> bin/kafka-acls.sh \
-bootstrap-server localhost:9092 \
-list \
-principal User:myuser
# ..
# Add a new ACL for a specific user to a specific consumer group
> bin/kafka-acls.sh \
-bootstrap-server localhost:9092 \
-add \
-allow-principal User:myuser \
-operation Read \
-group my-consumer-group
4.1: Adding and Removing Brokers
Adding and removing brokers is an important process that must be performed carefully in order to ensure the health and performance of a Kafka cluster.
When setting up a Kafka cluster, it’s important to first determine whether you will have one node or multiple nodes. Single-node clusters are simpler and require fewer configuration steps than multi-node clusters, but they are less resilient and lack scalability. Multi-node clusters provide greater redundancy, scalability, and fault tolerance but require more complex configuration steps when adding or removing nodes.
In either case, before adding or removing brokers from a Kafka cluster, it’s essential to ensure that all necessary configurations are set up correctly for both the broker(s) being added/removed and for the Kafka cluster itself. This involves ensuring that each broker is assigned its own unique ID (which should not match any other existing broker), setting up properties such as advertised hostname(s) and port numbers for incoming connections, and configuring any desired replication factors for topics across multiple nodes if applicable.
Once these preparations are complete, administrators can begin adding or removing brokers from their Kafka clusters according to their needs via either by modifying server.properties
or by passing the JSON file to the kafka-reassign-partitions.sh
CLI command.
It’s important to keep in mind that changes will take time to propagate throughout the system; thus monitoring should be performed throughout this process in order to assess when changes have taken effect so administrators can react accordingly if needed.
5: Frequently Asked Questions (FAQs) About the Kafka CLI
Kafka's CLI offers a convenient and powerful way to interact with Apache Kafka clusters, but it can be intimidating to those unfamiliar with its various commands. This section provides answers to some common questions about working with the CLI, so users can get up and running quickly.
Q: What is the difference between producer and consumer topics?
A: Producer topics are used when producing data into the Apache Kafka cluster, while consumer topics are used for consuming data from the cluster. Producer topics allow messages to be published, while consumer topics allow messages to be consumed. It is important to understand which type of topic you need in order to use the correct commands in the CLI.
Q: How do I determine which version of Kafka I am using?
A: The easiest way to determine which version of Apache Kafka you are using is by typing kafka-run-class.sh kafka.Kafka
in your terminal or command prompt. This will display your current version number as well as other information related to your installation. You can also check your version number by navigating to your configuration file (core/src/main/scala/kafka/server/KafkaServer.scala
) and looking for a currentVersion
key.
Q: How do I view detailed information about my topics and consumer groups?
A: To view detailed information about topics or consumer groups, use the kafka-topics.sh --describe
or kafka-consumer-groups.sh --describe
commands, respectively. These commands will provide you with detailed information such as topic name, current leader broker ID, number of partitions, replication factor, etc., allowing you to gain insights into how your cluster is configured and functioning.
Q: Are there any alternatives for managing a Kafka cluster that don't involve using a command line interface?
A: Yes! There are several alternatives for managing Apache Kafka clusters without relying on a command-line interface, such as web interfaces like Confluent Control Center or Datadog Kafka Dashboard or custom user interfaces built specifically for managing Apache Kafka clusters. Additionally, there are tools such as Yahoo Kafka Manager or LinkedIn Burrow that provide an easy way to install, configure and manage clusters through their intuitive graphical user interfaces (GUIs).
Wrapping Up
Wrapping up, this article provides a comprehensive cheat sheet of essential commands for working with Kafka’s command line interface (CLI). It discussed topics such as managing topics, consumer groups, partitions, and more, as well as advanced commands for administering the Kafka cluster. Additionally, it included frequently asked questions about the Kafka CLI and provided an overview of the different configuration options available.
Using these commands allows users to gain greater control over their streams of data and take full advantage of Apache Kafka's capabilities. This cheat sheet is a powerful tool for working with Kafka and should be used with caution. To use the Kafka CLI effectively, it's crucial to understand the message structure. Additionally, when modifying or deleting topics, careful consideration is important.
The goal of this post was to provide a comprehensive cheat sheet that would make it easier for users to quickly find and use the right commands, while providing insight into some of the common questions asked in Kafka interviews. We hope this article has been informative and helpful, so please share it with your colleagues if you find it useful. If you have any questions or need help understanding any of the commands discussed here, please feel free to ask in the comments below.
awesome!