A detailed walkthrough of the setup and example code is in the readme. All example code and configuration info involved are available here. Next, we’ll walk you through an example application using the ingestion of credit-card data as the use case. This functionality expands your ability to utilize all the features of Flume such as bucketing and event modification / routing, Kite SDK Morphline Integration, and NRT indexing with Cloudera Search. In-flight transformations and processing.Consumers – Write to Flume sinks reading from Kafka.Producers – Use Flume sources to write to Kafka.Using the new Flafka source and sink, now available in CDH 5.2, Flume can both read and write messages with Kafka.įlume can act as a both a consumer (above) and producer for Kafka (below).įlume-Kafka integration offers the following functionality that Kafka, absent custom coding, does not. Flume provides a tested, production-hardened framework for implementing ingest and real-time processing pipelines. Flume is a distributed, reliable, and available system for efficiently collecting, aggregating, and moving large amounts of data from many different sources to a centralized data store. While there are a number of Kafka clients that support this process, for the most part custom coding is required.Ĭloudera engineers and other open source community members have recently committed code for Kafka-Flume integration, informally called “Flafka,” to the Flume project. As a consequence however, the responsibility is on the developer to write code to either produce or consume messages from Kafka. Part of this simplicity comes from its independence from any other applications (excepting Apache ZooKeeper). While there is a lot of sophisticated engineering under the covers, Kafka’s general functionality is relatively straightforward. One key feature of Kafka is its functional simplicity. This post takes you a step further and highlights the integration of Kafka with Apache Hadoop, demonstrating both a basic ingestion capability as well as how different open-source components can be easily combined to create a near-real time stream processing workflow using Kafka, Apache Flume, and Hadoop. In this previous post you learned some Apache Kafka basics and explored a scenario for using Kafka in an online application. The new integration between Flume and Kafka offers sub-second-latency event processing without the need for dedicated infrastructure.
0 Comments
Leave a Reply. |