This repository contains a collection of statistical exercises designed for Data Science learning and practice.
It demonstrates core statistical concepts through interactive Jupyter Notebooks, using Python libraries for data manipulation, visualization, and statistical modeling.
This example demonstrates the Central Limit Theorem through a simple animated visualization.
It shows how the distribution of sample means tends toward a normal distribution as more samples are drawn from an original uniform distribution.
Key features:
- Animated histogram of sample means evolving over time
- Saved as an animated GIF for easy viewing
- Educational visualization of the Central Limit Theorem process
This example generates an interactive animated visualization illustrating the Central Limit Theorem.
We repeatedly draw samples from a uniform distribution, calculate their means, and show how the distribution of these means becomes approximately normal as more samples are collected.
Key features:
- Three synchronized plots:
- Distribution of sample means (top left)
- Current sample distribution (top right)
- Original population distribution (bottom)
- Animated frames updating dynamically with each new sample
- Interactive controls (start button) for running the animation
- Saved as an HTML file for easy sharing and embedding in web projects
This exercise focuses on visualizing a bimodal distribution using multiple types of statistical plots:
- Boxplot: Showing distribution spread with the mean highlighted.
- Violin plot: Displaying the probability density function.
- Histogram: Representing frequency of occurrences.
Key features:
- Combined mosaic layout for side-by-side comparison
- Customized styles like dashed mean lines and dotted median bars
- Clear axis labels for better interpretation
- Useful for exploratory data analysis (EDA)
numpy
pandas
scipy
seaborn
matplotlib
statsmodels
plotly
bokeh
ipympl
ipython
jupyterlab
Data_Science_Statistics/
├── assets/ ← Images and GIFs
├── datasets/ ← CSV datasets
├── html/ ← HTML documents
├── workshop/ ← Workshop practice notebooks
│ └── bimodal_dist.ipynb ← Bimodal distribution visualization
├── CTL.ipynb ← Central Limit Theorem sampling animation
├── CTL_for_web_.ipynb ← Central Limit Theorem interactive animation
├── descriptive_visualisation.ipynb← Descriptive statistics visualizations
├── intro.ipynb ← Introduction notebook
├── pebble_world.ipynb ← Toy example for sampling exercises
├── pyproject.toml ← Project dependencies
├── README.md ← Project documentation
└── uv.lock ← Lock file for uv
-
Clone the repository:
git clone [email protected]:Kinetics20/Data_Science_Statistics.git cd Data_Science_Statistics
-
Install dependencies using uv:
uv sync
-
Launch JupyterLab:
jupyter lab
-
Open and explore the
.ipynb
notebooks.
- Central Limit Theorem
- Data Visualization
- Descriptive Statistics
- Distribution Analysis
- Sampling Techniques
- Exploratory Data Analysis (EDA)
Contributions and suggestions are welcome!
👤 Author: Piotr Lipiński
🗓 Date: May 2025