Skip to content

darenasc/eda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis

  • 12.10.2024. Slides CorrelCon 2024 available here.
  • 11.12.2023. Slides DataKindUK SDS available here.
  • 11.11.2023. Slides CorrelCon 2023 available here.

How to use this repository

You can clone this repository and use it locally as follows.

git clone https://github.com/darenasc/eda.git
cd eda
pip install pipenv
pipenv install Pipfile

Or you can go to the eda.ipynb notebook and open it in Colab.

EDA Mindmap

Mindmap created with freeplane.

Linux commands

Some useful commands for the terminal.

# Explore directories
ls
# Explore content of files
cat
more
less
head
tail 
# Count number of lines
wc -l
# Search in files
grep
# Get documentation of commands
man
# Download data or files
wget
curl
# Monitor resources
htop
btop
# Modify files
vim
sed

Python libraries

Some python libraries to explore data.

EDA Checklist

  • What are the formats?
  • Are there files with problems? (can't be opened)
  • How many files, tables, databases?
  • Per item: How many columns and rows?
  • Are there any encoding issues?
  • Verify data types of columns: Discrete, Continuous, Dates, GIS, network, other.
  • Univariate analysis
    • Histogram
    • Bar plot
    • Boxplot
  • Multivariate analysis
    • Correlations
    • Use target variable to visualize other features

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published