(should we remove this part?)

Workshop data cleaning

Work in process!
To add: Information on data structure (float/meters)

#Background Kolumbus is creating a project to predict delays in traffic... (add more)

#Task You have been given one day of bus data. Your task is to clean this data and then train a machine learning model with it.

##Setup

VM

Conda

conda env create -f datavask_workshop.yml

# To activate this environment
source activate datavask_workshop
# to deactivate the environment use
source deactivate

Start the jupyter lab from the current directory of this repo:

jupyter lab

You will get a token link to the jupyter server. Open this in your browser.

Where do I start?

Go to Welcome.ipynb and follow instructions from there.

(should we remove this part?)

##Folder structure

Workshop root
- Data\
- read_json.py - The entry point to your application
- DataCleaner.py - Where you implement all the data cleaning rules
- test_dataCleaner.py - The script that tests your rules, no sneak peaking!

##Hints a.k.a Where do I start?! To print the first 10 records in the dataframe:

print(frame.head(10))

To get the maximum value of a column:

print(frame['column'].max())

The same can be done for minimum, mean and standard deviation.

There are hints placed in the code, specifically in the rules, to help you on your way.

The test script is not super strict, it simply contains the minimum requirements for the cleaning, such as which columns need to stay and which should be removed etc. Therefore, you will not have to do it exactly the same way as we have.

Your first step should be to look at the data...

#Extra task/If you have time (?)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
media		media
.gitignore		.gitignore
DataCleaner.py		DataCleaner.py
LICENSE		LICENSE
Machine Learning Example.ipynb		Machine Learning Example.ipynb
Running Code.ipynb		Running Code.ipynb
Start.ipynb		Start.ipynb
Vagrantfile		Vagrantfile
Welcome.ipynb		Welcome.ipynb
datavask_workshop.yml		datavask_workshop.yml
intro-python.ipynb		intro-python.ipynb
ml_example.py		ml_example.py
read_json.py		read_json.py
readme.md		readme.md
requirements.txt		requirements.txt
test_dataCleaner.py		test_dataCleaner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VM

Conda

Where do I start?

(should we remove this part?)

About

Uh oh!

Releases

Packages

Languages

License

mbertani/ml-intro-cx

Folders and files

Latest commit

History

Repository files navigation

VM

Conda

Where do I start?

(should we remove this part?)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages