Kafka Streams In Action

《KAFKA STREAMS IN ACTION》part 3

Posted on 2020年7月26日 | by bob | Leave a Comment

Part 3 Administering Kafka Streams

7. Monitoring and performance

1. Basic Kafka monitoring

Measuring consumer and producer performance

Intercepting the producer and consumer

Although interceptors aren’t typically your first line for debugging, they can prove useful in observing the behavior of your Kafka streaming application, and they’re a valuable addition to your toolbox.

2. Application metrics

3. More Kafka Streams debugging techniques

8. Testing a Kafka Streams application

1. Testing a topology

ProcessorTopologyTestDriver

The critical point to keep in mind with this test is that you now have a
repeatable test running a record through your entire topology, without the overhead of running Kafka.

2. Integration testing

Integration tests with the EmbeddedKafkaCluster should be used sparingly, and only when you have interactive behavior that can only be verified with a live,running Kafka broker.

Kafka Streams In Action part 2

Posted on 2020年7月25日 | by bob | Leave a Comment

Part 2 Kafka Streams development

3. Kafka Streams development

1. Hello World for Kafka Streams

2. Working with customer data

3. New requirements

4. Streams and state

1. Applying stateful operations to Kafka Streams

2.Repartitioning The Data

3. Updating the rewards processor

4. Data locality

5. Failure recovery and fault tolerance

The state stores provided by Kafka Streams meet both the locality and fault-tolerance requirements. They’re local to the defined processors and don’t share access across processes or threads. State stores also use topics for backup and quick recovery.

6. Joining streams for added insight

Generating keys containing customer IDs to perform joins

Constructing the join

With an inner join, if either record isn’t present, the join doesn’t occur . Outer joins always output a record .

7. Timestamps in Kafka Streams

5. The KTable API

1. The relationship between streams and tables

The record stream

Updates to records or the changelog

2. Record updates and KTable configuration

3. Aggregations and windowing operations

Aggregating share volume by industry

Windowing operations

There are a couple of key points to remember from this section:
 Sessions are not a fixed-size window. Rather, the size of a session is driven by the amount of activity within a given time frame.
 Timestamps in the data determine whether an event fits into an existing session or falls into an inactivity gap.

By not specifying the duration of the window, you’ll get the default retention of 24 hours .

 Session windows aren’t fixed by time but are driven by user activity.
 Tumbling windows give you a set picture of events within the specified time frame.
 Hopping windows are of fixed length, but they’re frequently updated and can contain overlapping records in each window.

In conclusion, the key thing to remember is that you can combine event streams (KStream) and update streams (KTable), using local state. Additionally, when the lookup data is of a manageable size, you can use a GlobalKTable. GlobalKTables replicate all partitions to each node in the Kafka Streams application, making all data available, regardless of which partition the key maps to.

6.The Processor API

1. The trade-offs of higher-level abstractions vs. more control

It’s a DSL that allows developers to create robust applications with minimal code. The ability to quickly put together processing topologies is an important feature of the Kafka Streams DSL.

It allows you to iterate quickly to flesh out ideas for working on your data without getting bogged down in the intricate setup details that some other frameworks may need.

What the Processor API lacks in ease of development, it makes up for in power. You can write custom processors to do almost anything you want.

2.Working with sources, processors, and sinks to create a topology

3. Digging deeper into the Processor API with a stock analysis processor

4. The co-group processor

The Processor API gives you more flexibility at the cost of more code .

Kafka Streams In Action part 1

Posted on 2020年7月24日 | by bob | Leave a Comment

Part 1 Getting started with Kafka Streams

1.Welcome to Kafka Streams

1. The PageRank algorithm

2. Introducing stream processing

3. When to use stream processing, and when not to use it

If you need to report on or take action immediately as data arrives, stream processing is a good approach. If you need to perform in-depth analysis or are compiling a large repository of data for later analysis, a stream-processing approach may not be a good fit.

4. Deconstructing the requirements into a graph