The goal of this project is to provide a stable implementation of the PutHive3Streaming processor, that is currently subject to 3 memory leaks. The fixed processor is named PutHive3StreamingFixed.
PutHive3Streaming leverages the new HiveStreaming API to write to Hive 3.0+ tables. Therefore it is needed for ingesting data from NiFi to Hive in HDP (Hortonworks Data Platform) 3.0+ clusters. It is available since NiFi 1.7.0.
You might not notice it during testing phases, but there are:
- 2 major memory leaks (HIVE-20979) in the HiveStreaming API that will make your NiFi crash after a given number of flowfile ingested to Hive (depending of the heap size you allocated to NiFi)
- 1 memory leak in the NiFi processor itself (NIFI-5841). It was fixed in NiFi 1.9.0, so it will still hit your NiFi 1.7.0/1.8.0 instances.
The fixes for these 3 memory leaks are already merged in Hive and NiFi master branches but you will need to manually fix them before they are integrated in official HDP/HDF releases.
Fortunately this processor is here to help!
I just modified the official source code to fix the data leaks in NiFi and Hive, and embedded the needed Hive classes with the processor, based on the official pull requests (NIFI-5841, HIVE-20979).
For now, the fixed processor is available for the following versions. Each version is tagged.
NiFi version | HDF version | Git tag |
---|---|---|
1.7.0 | 3.2.x | v1.7.0 |
1.9.0 | 3.4.x | v1.9.0 |
-
Clone this repo:
git clone https://github.com/Nuttymoon/nifi-hive3streaming-fixed.git
-
Move to the tag corresponding to your NiFi version (e.g. for NiFi 1.7.0 on HDF 3.2.x):
cd nifi-hive3streaming-fixed git checkout v1.7.0
-
Edit the
bundle/pom.xml
to match your cluster versions (e.g. for HDP 3.x):<properties> <hive3.version>3.1.0</hive3.version> <hive3.hadoop.version>3.1.0</hive3.hadoop.version> </properties>
-
Build the package with Maven:
cd bundle mvn clean install
-
Copy the
nar
file generated to the$NIFI_PATH/lib/
directory of NiFi nodes. For example, for an HDF (Hortonworks DataFlow) setup:scp nifi-hive3streaming-fixed-nar/target/nifi-hive3streaming-fixed-nar-1.0-SNAPSHOT.nar user@nifihost:/usr/hdf/current/nifi/lib
-
Don't forget to make
nifi
owner of the file:ssh user@nifihost " chown nifi /usr/hdf/current/nifi/lib/nifi-hive3streaming-fixed-nar-1.0-SNAPSHOT.nar && chgrp nifi /usr/hdf/current/nifi/lib/nifi-hive3streaming-fixed-nar-1.0-SNAPSHOT.nar && chmod 640 /usr/hdf/current/nifi/lib/nifi-hive3streaming-fixed-nar-1.0-SNAPSHOT.nar"
-
Restart NiFi
-
You should be able to see the processor in the NiFi Web UI:
-
Use this processor instead of PutHive3Streaming, it should keep NiFi from crashing.
Don't hesitate to create an issue if you have any problem!