What is Flume?

A flume is a channel that directs water from a source to some other location where water is needed. As its clever name implies, Apache® Flume™ was created to allow you to flow data from a source into your Hadoop® environment.

In Flume, the entities you work with are called sources, decorators, and sinks. A source can be any data source, and Flume has many predefined source adapters. A sink is the target of a specific operation (and in Flume, among other paradigms that use this term, the sink of one operation can be the source for the next downstream operation). A decorator is an operation on the stream that can transform the stream in some manner, which could be to compress or uncompress data, modify data by adding or removing pieces of information, and more.

Three types of sinks in Flume

Collector Tier Event

This is where you would land a flow (or possibly multiple flows joined together) into an HDFS-formatted file system.

Agent Tier Event

This is used when you want the sink to be the input source for another operation. When you use these sinks, Flume will also ensure the integrity of the flow by sending back acknowledgments that data has actually arrived at the sink.

Basic

This sink can be a text file, the console display, a simple HDFS path, or a null bucket where the data is simply deleted.

Related products or solutions

IBM Big SQL screenshot

IBM Big SQL

A hybrid SQL engine for Apache Hadoop that concurrently exploits Hive, HBase and Spark using a single database connection or a single query.

Learn more

Resources

 

The Data Warehouse Evolved: A Foundation for Analytical Excellence

ReExplore a Best-in-Class approach to data management and how companies are prioritizing data technologies to drive growth and efficiency.

 

Understanding Big Data Beyond the Hype

Read this practical introduction to the next generation of data architectures that introduces the role of the cloud and NoSQL technologies and discusses the practicalities of security, privacy and governance.

 

Starting Flume agents by using the Ambari web interface

Apache Flume can be used to efficiently collect, aggregate, and move large amounts of log data from many different sources to a centralized data store.

 

How to use Flume in IOP with Message Hub

This blog shows you how to use Flume in IBM Open Platform (IOP) 4.2 with Message Hub, a service on Bluemix that is based upon Apache Kafka.