《KAFKA STREAMS IN ACTION》part 3

Part 3 Administering Kafka Streams

7. Monitoring and performance

1. Basic Kafka monitoring

Measuring consumer and producer performance
Checking for consumer lag

Intercepting the producer and consumer

Although interceptors aren’t typically your first line for debugging, they can prove useful in observing the behavior of your Kafka streaming application, and they’re a valuable addition to your toolbox.

2. Application metrics

3. More Kafka Streams debugging techniques

Using the StateListener
State restore listener

8. Testing a Kafka Streams application

1. Testing a topology

ProcessorTopologyTestDriver

The critical point to keep in mind with this test is that you now have a
repeatable test running a record through your entire topology, without the overhead of running Kafka.

2. Integration testing

Integration tests with the EmbeddedKafkaCluster should be used sparingly, and only when you have interactive behavior that can only be verified with a live,running Kafka broker.




Kafka Streams In Action part 2

Part 2 Kafka Streams development

3. Kafka Streams development

1. Hello World for Kafka Streams

2. Working with customer data

3. New requirements

Filtering Purchases
Splitting/Branching The Stream
Generating A Key
Foreach Actions

4. Streams and state

1. Applying stateful operations to Kafka Streams

The transformValues processor

2.Repartitioning The Data

Repartitioning In Kafka Streams

3. Updating the rewards processor

4. Data locality

5. Failure recovery and fault tolerance

The state stores provided by Kafka Streams meet both the locality and fault-tolerance requirements. They’re local to the defined processors and don’t share access across processes or threads. State stores also use topics for backup and quick recovery.

6. Joining streams for added insight

Generating keys containing customer IDs to perform joins

Constructing the join

Outer joins

With an inner join, if either record isn’t present, the join doesn’t occur . Outer joins always output a record .

Left-outer join

7. Timestamps in Kafka Streams

5. The KTable API

1. The relationship between streams and tables

The record stream

Updates to records or the changelog

2. Record updates and KTable configuration

3. Aggregations and windowing operations

Aggregating share volume by industry

Windowing operations


Session windows

There are a couple of key points to remember from this section:
 Sessions are not a fixed-size window. Rather, the size of a session is driven by the amount of activity within a given time frame.
 Timestamps in the data determine whether an event fits into an existing session or falls into an inactivity gap.

Tumbling windows

By not specifying the duration of the window, you’ll get the default retention of 24 hours .

Sliding or hopping windows

 Session windows aren’t fixed by time but are driven by user activity.
 Tumbling windows give you a set picture of events within the specified time frame.
 Hopping windows are of fixed length, but they’re frequently updated and can contain overlapping records in each window.

In conclusion, the key thing to remember is that you can combine event streams (KStream) and update streams (KTable), using local state. Additionally, when the lookup data is of a manageable size, you can use a GlobalKTable. GlobalKTables replicate all partitions to each node in the Kafka Streams application, making all data available, regardless of which partition the key maps to.

6.The Processor API

1. The trade-offs of higher-level abstractions vs. more control

It’s a DSL that allows developers to create robust applications with minimal code. The ability to quickly put together processing topologies is an important feature of the Kafka Streams DSL.

It allows you to iterate quickly to flesh out ideas for working on your data without getting bogged down in the intricate setup details that some other frameworks may need.

What the Processor API lacks in ease of development, it makes up for in power. You can write custom processors to do almost anything you want.

2.Working with sources, processors, and sinks to create a topology

3. Digging deeper into the Processor API with a stock analysis processor

4. The co-group processor

Adding the sink node

The Processor API gives you more flexibility at the cost of more code .

Kafka Streams In Action part 1

Part 1 Getting started with Kafka Streams

1.Welcome to Kafka Streams

1. The PageRank algorithm

2. Introducing stream processing

3. When to use stream processing, and when not to use it

If you need to report on or take action immediately as data arrives, stream processing is a good approach. If you need to perform in-depth analysis or are compiling a large repository of data for later analysis, a stream-processing approach may not be a good fit.

4. Deconstructing the requirements into a graph

5. Applying Kafka Streams to the purchase transaction flow

2. Kafka quickly

1. Using Kafka to handle data


ZMart’s original data platform


A Kafka sales transaction data hub

2. Kafka architecture

Kafka is a message broker


Kafka and partitions

The distributed log


Replication


Controller responsibilities


Deleting logs

Compacting logs

Sending messages with producers

Reading messages with consumers