dwh_eng_test_task

Running mini ETL in PySpark

For some reason I couldn't export data to csv, so used persist() method while creating DataFrame, there are similar issues in the stackoverflow

https://stackoverflow.com/questions/45963507/spark-dataframes-are-getting-created-successfully-but-not-able-to-write-into-the

Running

Install the docker
Run below commands:

    docker build . -t sparkhome

If you want only run test

    docker run --name spark_container sparkhome /bin/bash -c "cd /opt/spark/ && pytest etl_test.py"

If you want run main script

    docker run --name spark_container sparkhome /bin/bash -c "cd /opt/spark/ && python main.py"

This command will transfer exported file from container to the local directory

    docker cp spark_container:opt/spark/output/test_transformed test_transformed

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
input		input
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker_commands.txt		docker_commands.txt
etl_test.py		etl_test.py
main.py		main.py
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input

input

.gitignore

.gitignore

Dockerfile

Dockerfile

README.md

README.md

docker_commands.txt

docker_commands.txt

etl_test.py

etl_test.py

main.py

main.py

notebook.ipynb

notebook.ipynb

requirements.txt

requirements.txt

Repository files navigation

dwh_eng_test_task

For some reason I couldn't export data to csv, so used persist() method while creating DataFrame, there are similar issues in the stackoverflow

Running

About

Releases

Packages

Languages

musakarimli/dwh_eng_test_task

Folders and files

Latest commit

History

Repository files navigation

dwh_eng_test_task

For some reason I couldn't export data to csv, so used persist() method while creating DataFrame, there are similar issues in the stackoverflow

Running

About

Resources

Stars

Watchers

Forks

Languages