Kafka Streams In Action part 2

Part 2 Kafka Streams development

3. Kafka Streams development

1. Hello World for Kafka Streams

2. Working with customer data

3. New requirements

Filtering Purchases
Splitting/Branching The Stream
Generating A Key
Foreach Actions

4. Streams and state

1. Applying stateful operations to Kafka Streams

The transformValues processor

2.Repartitioning The Data

Repartitioning In Kafka Streams

3. Updating the rewards processor

4. Data locality

5. Failure recovery and fault tolerance

The state stores provided by Kafka Streams meet both the locality and fault-tolerance requirements. They’re local to the defined processors and don’t share access across processes or threads. State stores also use topics for backup and quick recovery.

6. Joining streams for added insight

Generating keys containing customer IDs to perform joins

Constructing the join

Outer joins

With an inner join, if either record isn’t present, the join doesn’t occur . Outer joins always output a record .

Left-outer join

7. Timestamps in Kafka Streams

5. The KTable API

1. The relationship between streams and tables

The record stream

Updates to records or the changelog

2. Record updates and KTable configuration

3. Aggregations and windowing operations

Aggregating share volume by industry

Windowing operations


Session windows

There are a couple of key points to remember from this section:
 Sessions are not a fixed-size window. Rather, the size of a session is driven by the amount of activity within a given time frame.
 Timestamps in the data determine whether an event fits into an existing session or falls into an inactivity gap.

Tumbling windows

By not specifying the duration of the window, you’ll get the default retention of 24 hours .

Sliding or hopping windows

 Session windows aren’t fixed by time but are driven by user activity.
 Tumbling windows give you a set picture of events within the specified time frame.
 Hopping windows are of fixed length, but they’re frequently updated and can contain overlapping records in each window.

In conclusion, the key thing to remember is that you can combine event streams (KStream) and update streams (KTable), using local state. Additionally, when the lookup data is of a manageable size, you can use a GlobalKTable. GlobalKTables replicate all partitions to each node in the Kafka Streams application, making all data available, regardless of which partition the key maps to.

6.The Processor API

1. The trade-offs of higher-level abstractions vs. more control

It’s a DSL that allows developers to create robust applications with minimal code. The ability to quickly put together processing topologies is an important feature of the Kafka Streams DSL.

It allows you to iterate quickly to flesh out ideas for working on your data without getting bogged down in the intricate setup details that some other frameworks may need.

What the Processor API lacks in ease of development, it makes up for in power. You can write custom processors to do almost anything you want.

2.Working with sources, processors, and sinks to create a topology

3. Digging deeper into the Processor API with a stock analysis processor

4. The co-group processor

Adding the sink node

The Processor API gives you more flexibility at the cost of more code .