Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build hadoop from scratch #7

Merged
merged 1 commit into from
Jul 29, 2024
Merged

Build hadoop from scratch #7

merged 1 commit into from
Jul 29, 2024

Conversation

MrCreosote
Copy link
Member

The apache/hadoop:3.3.6 container is built on a 5+ year old version of CentOS and doesn't run on MacOS. Hopefully this will fix things.

I have no name!@5c6eec42ebf6:/opt/bitnami/spark$ ./bin/spark-submit --master yarn --conf spark.hadoop.yarn.resourcemanager.hostname=yarn-resourcemanager --conf spark.hadoop.yarn.resourcemanager.address=yarn-resourcemanager:8032 --conf spark.hadoop.fs.s3a.endpoint=http://minio:9002 --conf spark.hadoop.fs.s3a.access.key=minio --conf spark.hadoop.fs.s3a.secret.key=minio123 --conf spark.hadoop.fs.s3a.path.style.access=true --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf spark.yarn.stagingDir=s3a://yarn --deploy-mode client examples/src/main/python/pi.py 10 2>/dev/null
Pi is roughly 3.138560

The apache/hadoop:3.3.6 container is built on a 5+ year old version of
CentOS and doesn't run on MacOS. Hopefully this will fix things.

```
I have no name!@5c6eec42ebf6:/opt/bitnami/spark$ ./bin/spark-submit
--master yarn --conf
spark.hadoop.yarn.resourcemanager.hostname=yarn-resourcemanager --conf
spark.hadoop.yarn.resourcemanager.address=yarn-resourcemanager:8032
--conf spark.hadoop.fs.s3a.endpoint=http://minio:9002 --conf
spark.hadoop.fs.s3a.access.key=minio --conf
spark.hadoop.fs.s3a.secret.key=minio123 --conf
spark.hadoop.fs.s3a.path.style.access=true --conf
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --conf
spark.yarn.stagingDir=s3a://yarn --deploy-mode client
examples/src/main/python/pi.py 10 2>/dev/null
Pi is roughly 3.138560
```
@MrCreosote MrCreosote requested a review from Tianhao-Gu July 26, 2024 23:25
Comment on lines -36 to -40
## OS notes:

* The Hadoop containers don't seem to start correctly on Mac machines. Ubuntu linux works
normally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counting my chickens before they're hatched here

@bio-boris
Copy link

Not sure if this is useful, but here is a dockerfile that shows how they build it https://github.com/apache/hadoop/blob/docker-hadoop-3/Dockerfile

@MrCreosote
Copy link
Member Author

Not sure if this is useful, but here is a dockerfile that shows how they build it https://github.com/apache/hadoop/blob/docker-hadoop-3/Dockerfile

Looks like the top of the image stack from docker history --no-trunc apache/hadoop:3.3.6

Copy link
Collaborator

@Tianhao-Gu Tianhao-Gu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

But on my Mac:

yarn-resourcemanager   | ERROR: JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/jre/ does not exist.
yarn-nodemanager       | ERROR: JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64/jre/ does not exist.
yarn-resourcemanager exited with code 1

@MrCreosote
Copy link
Member Author

Can you run the images and check if the directory exists?

@MrCreosote
Copy link
Member Author

$ docker run -it --entrypoint bash cdm-prototype-yarn-yarn-resourcemanager
hadoop@668dd1d87bd3:~$ ls /usr/lib/jvm/java-8-openjdk-amd64/jre/
ASSEMBLY_EXCEPTION  THIRD_PARTY_README	bin  lib  man

@MrCreosote
Copy link
Member Author

MrCreosote commented Jul 29, 2024

I just double checked and the images start normally for me on this commit

@Tianhao-Gu
Copy link
Collaborator

I just double checked and the images start normally for me on this commit

It does exist. Trying to figure our what's going on here. I cannot ls or cd directly. But can do it after I cd /usr/lib/jvm. Because it's a link?

(dev) tgu@cdm-prototype-yarn (dev-hadoop_from_scratch)$docker run -it --entrypoint bash cdm-prototype-yarn-yarn-resourcemanager
hadoop@cb790f07f5a7:~$ ls /usr/lib/jvm/java-8-openjdk-amd64/jre/
ls: cannot access '/usr/lib/jvm/java-8-openjdk-amd64/jre/': No such file or directory
hadoop@cb790f07f5a7:~$ cd /usr/lib/jvm/java-8-openjdk-amd64/jre/
bash: cd: /usr/lib/jvm/java-8-openjdk-amd64/jre/: No such file or directory
hadoop@cb790f07f5a7:~$ cd /usr/lib/jvm/java-8-openjdk-amd64
bash: cd: /usr/lib/jvm/java-8-openjdk-amd64: No such file or directory
hadoop@cb790f07f5a7:~$ cd  /usr/lib/jvm
hadoop@cb790f07f5a7:/usr/lib/jvm$ cd java-8-openjdk-arm64/jre/
hadoop@cb790f07f5a7:/usr/lib/jvm/java-8-openjdk-arm64/jre$ ls
ASSEMBLY_EXCEPTION  THIRD_PARTY_README  bin  lib  man
hadoop@cb790f07f5a7:/usr/lib/jvm/java-8-openjdk-arm64/jre$ 

@MrCreosote
Copy link
Member Author

Well that's bizarre

@MrCreosote
Copy link
Member Author

What link are you talking about? There's no link in the path that I see

@Tianhao-Gu
Copy link
Collaborator

What link are you talking about? There's no link in the path that I see

I mean the java-8-openjdk-arm64 is a link seems like.

hadoop@cb790f07f5a7:/usr/lib/jvm/java-8-openjdk-arm64/jre$ ls -l /usr/lib/jvm
total 4
lrwxrwxrwx 1 root root   20 May 30 05:57 java-1.8.0-openjdk-arm64 -> java-8-openjdk-arm64
drwxr-xr-x 5 root root 4096 Jul 29 15:25 java-8-openjdk-arm64
hadoop@cb790f07f5a7:/usr/lib/jvm/java-8-openjdk-arm64/jre$

Copy link
Collaborator

@Tianhao-Gu Tianhao-Gu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@MrCreosote
Copy link
Member Author

java-1.8.0 links to java-8, but the path goes through java-8 so there's no links in $JAVA_HOME AFAICT

@MrCreosote
Copy link
Member Author

Merging this since it's better than before, and no worse on MacOS. If we can figure out what's going on with MacOS we can make another PR.

@MrCreosote MrCreosote merged commit 3c0adbe into main Jul 29, 2024
6 checks passed
@MrCreosote MrCreosote deleted the dev-hadoop_from_scratch branch July 30, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants