Introduction

I recently took the Confluent Certified Developer for Apache Kafka (CCDAK) exam. Having mainly worked with managed messaging services such as Google Cloud’s Pub/Sub and Amazon’s SNS, I found this quite a challenging but rewarding certificate to take.

Below is a description of the certificate, why you should consider taking it and my top study tips for passing the exam.

What is the Kafka Developer Certificate?

The Confluent Certified Developer for Apache Kafka exam assesses the most critical job activities that an Apache Kafka Developer performs.

“Passing the examination demonstrates the candidate is proficient as a Kafka Developer, is able to discuss core Kafka architectural concepts and understands Kafka’s role in the modern data distribution pipeline." Confluent website

The exam consists of 60 multiple choice questions in 90 minutes, costs $150 and can be taken on demand under remote proctored conditions. After passing the exam, the certification is valid for two years.

There is more information on the exam provided by Confluent on their ‘Certification Bootcamp ’ course where they explain the benefits of getting certified, the types of exam questions and some exam tips in more detail.

Why get certified?

Personally, I did not have much experience with Kafka itself before taking the exam. I had only used managed cloud messaging services such as Google Cloud’s Pub/Sub and Amazon’s SNS.

However, clients I work with have increasingly been asking about Kafka for their streaming pipelines. In particular, I get a lot of questions about when you should manage and deploy your own Kafka cluster instead of using managed cloud messaging services.

Managed cloud services abstract away a lot of complexity which is convenient, but I found it allows you to be lazy as you can use the services without fully understanding what is going on under the hood. Using the managed services is not always the best option, therefore, it is diligent to understand Kafka in more detail to recommend the most appropriate solution to clients.

Completing the certification helped me understand the different use cases for Kafka and solidify core concepts of streaming architectures in my mind.

In my opinion, certification holds three main benefits:

  1. Broadens your understanding of streaming challenges, architectures and concepts
  2. Demonstrates competence
  3. Forces you to understand many aspects of the Kafka ecosystem

Streaming is becoming an increasingly important part of the modern machine learning and data engineer’s skill set. As more companies adopt machine learning, many of the most impactful use cases often involve real-time data pipelines. For example, fraud detection, predictive maintenance monitoring from IoT devices and serving real-time content recommendations to users on websites. Therefore, it is important to understand the key concepts and challenges of designing streaming systems when you try and implement them yourself.

Consequently, being able to work streaming frameworks like Kafka is very in-demand. Demonstrating that you understand the key components of streaming architectures and that you can develop applications with Kafka will really boost your career prospects in this current market.

Finally, when developing with Kafka or managed services like Pub/Sub for the first time it is easy just to focus on a small area of the documentation to achieve your specific task. Studying for the certification forces you to read the documentation in detail and get a broad understanding of the entire ecosystem. This enables you to better understand the best component for each task and in turn help you build better solutions.

Exam Content

The exam is broken into three main ‘knowledge domains’ (defined on Confluent’s website ) which comprehensively cover Kafka’s core concepts and the documentation.

Application Design - 40% of the exam

  • Kafka’s command-line tools
  • Pub/Sub and streaming concepts
  • Kafka architecture and design decisions
  • Kafka APIs, configuration and metrics
  • Message structure, key selection (choices and factors) and metadata
  • System metrics
  • Schema management

Domain Development - 30% of the exam

  • Kafka Clients: Producer and Consumer key concepts and functions
  • Troubleshooting/debugging
  • Performance, throughput, latency, scaling considerations
  • Message ordering and delivery guarantees
  • Serialisation/deserialisation
  • Producer partition selection
  • Consumer offset management
  • Consumer groups, partition assignments, partition rebalances
  • Data retention strategies and implications
  • Topic co-partitioning

Deployment/Testing - 30% of the exam

  • Application deployment choices
  • Security
  • Kafka Streams features and use cases
  • KSQL features and use cases

If I’m honest, I did not find this breakdown of topics that useful for understanding what I actually needed to study as it does not clearly link to sections of the documentation.

I would redefine the key topics to know for the exam as follows:

Studying for the Exam

So now that you have been convinced to take the exam 😉, how should you prepare?

I used a mixture of free and paid resources for the exam.

Online Courses

I would recommend working through an instructor-led course (either paid or free on YouTube). I don’t think reading the documentation alone is enough to feel confident about the exam. To me it was not obvious from reading the documentation which are the most important parts required for the exam. Instructor-led courses will do a much better job of focussing on the most relevant topics for the exam.

My employer is a Confluent partner, so luckily I was able to access the training materials on Confluent’s Partner Portal . I took the Fundamentals for Apache Kafka and the Developer Skills for Building Apache Kafka led by the brilliant Tim Berglund.

If you do not have access to the partner portal, most of the content of these courses is available for free anyway on Confluent’s YouTube channel .

I have also heard good reviews about Stephane Maarek’s Udemy courses , although I did not complete these myself.

Practice Exams

Even after completing the online learning courses, I did not feel 100% confident about the exam and therefore looked for some practice exam questions to validate my knowledge.

Unfortunately, there is a lack of quality practice tests for free online, so I resorted to purchasing the following exam packs on Udemy:

These provided six practice exams with 350 exam questions.

Even though I felt confident on the core concepts after studying the online resources, this did not translate into good performance on these tests as I had missed key parts of the documentation which were relevant for the exam (e.g. default configurations, port numbers etc.).

These practice tests were extremely useful and I would highly recommend completing these tests as part of your preparation (provided you can get them on sale for a reasonable price).

I found the practice tests harder than the actual exam. If you are consistently scoring 70-80%+ in these tests you will be ready to take the exam as the real exam questions are very similar.

Active Recall - Flashcards

Finally, to consolidate my knowledge from all these sources, I created some flashcards (using Ankiapp ) for all the key exam topics.

Active recall has been shown to be a highly effective revision technique. Using flashcards forces you to actively engage in cognitive effort to retrieve the information to answer questions. This strengthens connections between information in our mind which improves our ability to recall knowledge in an exam scenario.

I have made my flashcards available for those who are interested. Click the button below if you are interested in accessing them for free.

Get Revision Cards Free

After taking the exam, I went back through my revision cards to improve them and to ensure they matched the type and content of the real questions as closely as possible

After a couple days revising with the flashcards on the Anki mobile app, I felt ready for the exam.

Other Good Resources

In addition to the resources described above, I have curated a number of other good resources:

Conclusion and Parting Thoughts

The Confluent Developer Certification for Apache Kafka was a challenging but rewarding certificate. I would highly recommend completing this certification.

Event-driven and streaming architectures are becoming increasingly important for machine learning use cases. Upskilling in Kafka will help you differentiate from other engineers and set you up well for delivering advanced ML use cases.

Confluent’s recent IPO , which saw a 25% increase in share price on the first day of trading, shows just how much interest there is for Apache Kafka and the growing importance of streaming architectures.

If you use Ankiapp for your revision (I highly recommend if not, it’s free and easy to use). You can get access to 116 of my revision cards for free from the link below (tips appreciated ;))

Good luck in taking the exam!

Further Reading