A comprehensive Python toolkit for intuitive data exploration, analysis, and visualization
Features • Installation • Quick Start • Documentation • Examples • Contributing
DataViz Toolkit is a powerful, user-friendly library designed specifically for Exploratory Data Analysis (EDA) portfolios. It streamlines the process of analyzing major datasets in common formats (CSV, Excel, JSON), making EDA tasks easier, cleaner, and more presentable. Developed for data scientists, analysts, and students looking to build impressive EDA portfolios, this toolkit automates the repetitive aspects of data exploration while producing publication-quality visualizations and insights. The standardized workflow allows you to focus on interpreting results rather than writing boilerplate code. The library emphasizes an intuitive API design with sensible defaults while offering deep customization options for advanced users. It integrates seamlessly with Pandas while providing enhanced functionality for common data science tasks. Perfect for academic projects, job applications, or professional data analysis work that needs to be presented clearly.
- Effortless loading from CSV, Excel, and JSON files
- Automatic metadata extraction and type inference
- Robust error handling for real-world messy data
- Smart detection and handling of missing values
- Outlier identification and treatment
- Column name standardization and data type conversion
- Duplicate detection and removal
- Feature normalization and scaling
- Categorical variable encoding
- Feature engineering tools including date feature extraction
- Dimensionality reduction techniques
- One-line creation of common plots (histograms, scatter plots, etc.)
- Support for both static (Matplotlib/Seaborn) and interactive (Plotly) outputs
- Multi-plot dashboards and summary visualizations
- Consistent styling with themes and customizable configurations
- Comprehensive summary statistics
- Correlation analysis and feature importance
- Distribution and time series analysis
- Group comparison and clustering
Full documentation with API reference
- loaders: Functions for loading data from various sources
- cleaner: Tools for data cleaning and validation
- transformer: Methods for feature transformation and engineering
- visualizer: Functions for creating visualizations and dashboards
- analyzer: Statistical analysis and insight generation utilities
- themes: Visualization styling and color schemes
- utils: Helper functions and utilities
This section will be expanded with demonstrations and use cases showcasing the toolkit's capabilities
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request