Introduction
Building a cluster of Apache NiFi instances provides several advantages over a single instance:
- Increased throughput. Depending on the amount of data that you want to ingest, you might find that a single NiFi instance, installed on a single server, does not provide sufficient throughput. You can increase the number of threads available to your processors, so that they can process multiple documents in parallel, but you are limited by the resources available on a single server. Distributing work across several servers allows you to increase throughput.
- Failover. Running NiFi instances in a cluster provides failover. You can continue ingesting data if one of your servers becomes unavailable.
The nodes in an Apache NiFi cluster are managed by Apache ZooKeeper. Apache NiFi can automatically launch an embedded ZooKeeper server, which you might prefer if you don't already use ZooKeeper. This guide explains how to configure a ZooKeeper cluster with the embedded ZooKeeper server.
The Apache NiFi documentation recommends that you run ZooKeeper on either three or five of your NiFi nodes. Running ZooKeeper on only one node makes the cluster susceptible to a single machine failure. This guide assumes that you will run ZooKeeper on three of your NiFi nodes.
After setting up three ZooKeeper nodes, you can add further NiFi instances to the cluster, but the additional instances do not need to run ZooKeeper. For information about how to configure the initial (ZooKeeper) nodes, see Set up a NiFi Cluster. Then, if you want to add additional nodes to your cluster, see Set up Additional Nodes.
TIP: You can find more detailed information about clustering in the Apache NiFi documentation. This document describes only the minimum requirements.