Skip to content

ConsolidateXcel is a basic ETL project that automates the extraction, consolidation, and loading of multiple Excel files into a single dataset, simplifying data handling and reporting.

Notifications You must be signed in to change notification settings

raquelcreis/ConsolidateXcel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ConsolidateXcel

consolidateXcel

Description

ConsolidateXcel is a basic ETL (Extract, Transform, Load) project designed to automate the extraction of multiple Excel files, consolidate these files into a single cohesive dataset, and load the consolidated file into a new directory. This project simplifies data handling by integrating various Excel datasets into one, making data analysis and reporting more efficient.

Features

  • Extraction: Automated extraction of data from multiple Excel files.
  • Transformation: Consolidation of data into a single dataset.
  • Loading: Saving the consolidated dataset into a new Excel file in a specified directory.

Usage

  1. Clone the Repository
git clone https://github.com/raquelcreis/ConsolidateXcel.git
cd ConsolidateXcel
  1. Configure the correct Python version with pyenv
pyenv install 3.11.7
pyenv local 3.11.7
  1. Configure Poetry to use Python version 3.11.7 and activate the virtual environment
poetry env use 3.11.7
poetry shell
  1. Install the project dependencies
poetry install
  1. Run the tests to ensure everything is working as expected
task test
  1. Prepare your input files

Place all Excel files to be consolidated in the input directory

  1. Run the ETL process
task run
  1. Output

The consolidated Excel file will be saved in the output directory with the name consolidated_output.xlsx

Configuration

Modify the config.json file to set the input and output directories:

{
  "input_directory": "path/to/input_files",
  "output_directory": "path/to/output_files",
  "output_filename": "consolidated_output.xlsx"
}

Example

An example structure of the input directory:

input/
    ├── data1.xlsx
    ├── data2.xlsx
    ├── data3.xlsx

Contributing

Contributions are welcome! Please submit a pull request or open an issue to discuss what you would like to change.

Credits

This project is part of Professor Luciano Filho's course (GitHub). His repository in Portuguese has been incredibly helpful and deserves a million stars ⭐

About

ConsolidateXcel is a basic ETL project that automates the extraction, consolidation, and loading of multiple Excel files into a single dataset, simplifying data handling and reporting.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages