Hadoop Chart

This chart is modified from stable/hadoop and mgit-at/helm-hadoop-3 and has been updated to:

run use multi-architecture Docker image and
use the currently latest version of Hadoop.

This chart is primarily intended to be used for YARN and MapReduce job execution where HDFS is just used as a means to transport small artifacts within the framework and not for a distributed filesystem. Data should be read from cloud based datastores such as Google Cloud Storage, S3 or Swift.

Chart Details

Installing the Chart

To install the chart with the release name hadoop:

helm helm repo add pfisterer-hadoop https://pfisterer.github.io/apache-hadoop-helm/
helm install --name hadoop pfisterer-hadoop/hadoop

Configuration

The following table lists the configurable parameters of the Hadoop chart and their default values.

Parameter	Description	Default
`image.repository`	Hadoop image	`farberg/apache-hadoop`
`image.tag`	Hadoop image tag	`3.3.2`
`imagee.pullPolicy`	Pull policy for the images	`IfNotPresent`
`hadoopVersion`	Version of hadoop libraries being used	`3.3.2`
`antiAffinity`	Pod antiaffinity, `hard` or `soft`	`hard`
`hdfs.nameNode.pdbMinAvailable`	PDB for HDFS NameNode	`1`
`hdfs.nameNode.resources`	resources for the HDFS NameNode	`requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m`
`hdfs.dataNode.replicas`	Number of HDFS DataNode replicas	`1`
`hdfs.dataNode.pdbMinAvailable`	PDB for HDFS DataNode	`1`
`hdfs.dataNode.resources`	resources for the HDFS DataNode	`requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m`
`hdfs.webhdfs.enabled`	Enable WebHDFS REST API	`true`
`yarn.resourceManager.pdbMinAvailable`	PDB for the YARN ResourceManager	`1`
`yarn.resourceManager.resources`	resources for the YARN ResourceManager	`requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m`
`yarn.nodeManager.pdbMinAvailable`	PDB for the YARN NodeManager	`1`
`yarn.nodeManager.replicas`	Number of YARN NodeManager replicas	`1`
`yarn.nodeManager.parallelCreate`	Create all nodeManager statefulset pods in parallel (K8S 1.7+)	`false`
`yarn.nodeManager.resources`	Resource limits and requests for YARN NodeManager pods	`requests:memory=2048Mi,cpu=1000m,limits:memory=2048Mi,cpu=1000m`
`persistence.nameNode.enabled`	Enable/disable persistent volume	`false`
`persistence.nameNode.storageClass`	Name of the StorageClass to use per your volume provider	`-`
`persistence.nameNode.accessMode`	Access mode for the volume	`ReadWriteOnce`
`persistence.nameNode.size`	Size of the volume	`50Gi`
`persistence.dataNode.enabled`	Enable/disable persistent volume	`false`
`persistence.dataNode.storageClass`	Name of the StorageClass to use per your volume provider	`-`
`persistence.dataNode.accessMode`	Access mode for the volume	`ReadWriteOnce`
`persistence.dataNode.size`	Size of the volume	`200Gi`

Customized Hadoop Base Docker Image

This image is modified from comcast/kube-yarn and mgit-at/helm-hadoop-3. Currently, native libraries are not been included.

Build and Push the Docker Image

# Set version
HADOOP_VERSION=3.3.2

# Build
docker buildx build --push --platform "linux/arm64,linux/amd64" -t farberg/apache-hadoop:latest -t farberg/apache-hadoop:$HADOOP_VERSION .

Testing with minikube

If you are running locally with minikube and want to try your images without pushing them to a registry, build the images on the minikube VM first:

eval $(minikube docker-env)
# use the build command from above

Development

Help is always appreciated. Please create pull requests.

Open Issues

Include native libraries
List of ports needs to be updated (cf. https://www.oreilly.com/library/view/big-data-analytics/9781788628846/5c5821cc-4a3d-498a-a3eb-23256cd79c8b.xhtml)

Upload a new version of the chart

helm lint
helm package .
mv hadoop*.tgz docs/
helm repo index docs/ --url https://pfisterer.github.io/apache-hadoop-helm/
git add docs/
git commit -a -m "Updated helm repository"
git push origin master

Changes

Version 1.2.0

Initial release of this chart
Use multi-architecture base image
Apache Hadoop 3.3.2

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
docs		docs
image		image
templates		templates
.helmignore		.helmignore
Chart.yaml		Chart.yaml
LICENSE		LICENSE
README.md		README.md
values.yaml		values.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hadoop Chart

Chart Details

Installing the Chart

Configuration

Customized Hadoop Base Docker Image

Build and Push the Docker Image

Testing with minikube

Development

Open Issues

Upload a new version of the chart

Changes

About

Releases

Packages

Languages

License

pfisterer/apache-hadoop-helm

Folders and files

Latest commit

History

Repository files navigation

Hadoop Chart

Chart Details

Installing the Chart

Configuration

Customized Hadoop Base Docker Image

Build and Push the Docker Image

Testing with minikube

Development

Open Issues

Upload a new version of the chart

Changes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages