Skip to content

A collection of Jupyter Notebook exercises covering key statistical concepts for Data Science, featuring interactive visualizations and real-world datasets.

Notifications You must be signed in to change notification settings

Kinetics20/Data_Science_Statistics

Repository files navigation

📈 Data Science Statistics

Python License Jupyter UV

📄 Project Overview

This repository contains a collection of statistical exercises designed for Data Science learning and practice.
It demonstrates core statistical concepts through interactive Jupyter Notebooks, using Python libraries for data manipulation, visualization, and statistical modeling.

📚 Examples

📊 Central Limit Theorem - Sampling Animation (CTL.ipynb)

This example demonstrates the Central Limit Theorem through a simple animated visualization.
It shows how the distribution of sample means tends toward a normal distribution as more samples are drawn from an original uniform distribution.

Key features:

  • Animated histogram of sample means evolving over time
  • Saved as an animated GIF for easy viewing
  • Educational visualization of the Central Limit Theorem process

Generated output preview:
CLT Sampling Animation


📈 Central Limit Theorem - Interactive Animation (CTL_for_web_.ipynb)

This example generates an interactive animated visualization illustrating the Central Limit Theorem.
We repeatedly draw samples from a uniform distribution, calculate their means, and show how the distribution of these means becomes approximately normal as more samples are collected.

Key features:

  • Three synchronized plots:
    • Distribution of sample means (top left)
    • Current sample distribution (top right)
    • Original population distribution (bottom)
  • Animated frames updating dynamically with each new sample
  • Interactive controls (start button) for running the animation
  • Saved as an HTML file for easy sharing and embedding in web projects

Generated output preview:
Interactive CLT Animation


🎻 Bimodal Distribution Visualization (workshop/bimodal_dist.ipynb)

This exercise focuses on visualizing a bimodal distribution using multiple types of statistical plots:

  • Boxplot: Showing distribution spread with the mean highlighted.
  • Violin plot: Displaying the probability density function.
  • Histogram: Representing frequency of occurrences.

Key features:

  • Combined mosaic layout for side-by-side comparison
  • Customized styles like dashed mean lines and dotted median bars
  • Clear axis labels for better interpretation
  • Useful for exploratory data analysis (EDA)

Generated output preview:
Bimodal Distribution Visualization


🛠️ Libraries Used

  • numpy
  • pandas
  • scipy
  • seaborn
  • matplotlib
  • statsmodels
  • plotly
  • bokeh
  • ipympl
  • ipython
  • jupyterlab

📂 Project Structure

Data_Science_Statistics/
├── assets/                        ← Images and GIFs
├── datasets/                      ← CSV datasets
├── html/                          ← HTML documents
├── workshop/                      ← Workshop practice notebooks
│   └── bimodal_dist.ipynb          ← Bimodal distribution visualization
├── CTL.ipynb                      ← Central Limit Theorem sampling animation
├── CTL_for_web_.ipynb              ← Central Limit Theorem interactive animation
├── descriptive_visualisation.ipynb← Descriptive statistics visualizations
├── intro.ipynb                    ← Introduction notebook
├── pebble_world.ipynb              ← Toy example for sampling exercises
├── pyproject.toml                  ← Project dependencies
├── README.md                       ← Project documentation
└── uv.lock                         ← Lock file for uv

🚀 How to Run

  1. Clone the repository:

    git clone [email protected]:Kinetics20/Data_Science_Statistics.git
    cd Data_Science_Statistics
  2. Install dependencies using uv:

    uv sync
  3. Launch JupyterLab:

    jupyter lab
  4. Open and explore the .ipynb notebooks.

📊 Topics Covered

  • Central Limit Theorem
  • Data Visualization
  • Descriptive Statistics
  • Distribution Analysis
  • Sampling Techniques
  • Exploratory Data Analysis (EDA)

💬 Feedback

Contributions and suggestions are welcome!

👤 Author: Piotr Lipiński
🗓 Date: May 2025

About

A collection of Jupyter Notebook exercises covering key statistical concepts for Data Science, featuring interactive visualizations and real-world datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published