Getl: An Efficient Data Synchronization and ETL Manager
Getl is a powerful and flexible tool written in Go that streamlines the extraction, transformation, and loading (ETL) of data. It also supports continuous synchronization between various sources and destinations. Whether you're working with databases, file-based formats, or real-time messaging systems, Getl offers a unified approach to build dynamic data pipelines with ease.
Getl is designed to be a robust solution for data integration and synchronization across heterogeneous systems. It supports a variety of data sources and destinations – from traditional relational databases to modern messaging systems – and bridges them through dynamic ETL workflows and flexible configurations.
Why Getl?
- Ease of Use: Configure and manage complex data flows with simple JSONC/YAML configuration files.
- Flexibility: Supports multiple databases and file formats with customizable field mappings and transformation rules.
- Continuous Synchronization: Schedule periodic syncs for near real-time updates.
- Extensibility: Integrate messaging systems such as Kafka or Redis for real-time data pipelines.
- Data Extraction: Connect and extract data from various sources (Oracle, PostgreSQL, MySQL, SQLite, MongoDB, etc.) using customizable SQL queries.
- Custom Transformations: Define mapping and transformation rules to convert data between formats and structures.
- Data Loading and Synchronization: Automatically create destination tables (with validations and constraints) and load data efficiently; supports both batch and continuous sync modes.
- Multiple Output Formats: Export data to CSV, JSON, XML, YAML, PDF, and more.
- Dynamic Configuration: Manage your ETL processes with configuration files that support comments (JSONC) for clarity.
- Real-Time Integration: Leverage messaging integrations for live data flow using Kafka, Redis, and others.
- Go (version 1.19 or above)
- Appropriate permissions to access data sources and destinations
# Clone this repository
git clone https://github.com/faelmori/getl.git
# Navigate to the project directory
cd getl
# Build the binary using the Makefile
make build
# Install the binary (optional)
make install
# (Optional) Add the binary to your PATH
export PATH=$PATH:$(pwd)
Below are some examples of how to use Getl’s CLI:
# Basic synchronization: extracts data from a source and loads it into a destination
getl sync -f examples/configFiles/exp_config_a.json
# Extract data with a custom SQL query
getl extract --source "oracle_db" --query "SELECT * FROM products"
# Transform and load data using a custom configuration file
getl transform -f examples/configFiles/exp_config_b.json
{
"sourceType": "godror",
"sourceConnectionString": "username/[email protected]:1521/orcl",
"destinationType": "sqlite3",
"destinationConnectionString": "/home/user/.kubex/web/gorm.db",
"destinationTable": "erp_products",
"destinationTablePrimaryKey": "CODPARC",
"sqlQuery": "SELECT P.CODPARC, P.NOMEPARC FROM TABLE P",
"outputFormat": "csv",
"outputPath": "/home/user/Documents/erp_products.csv",
"needCheck": true,
"checkMethod": "SELECT * FROM erp_products WHERE CODPARC = ? AND NOMEPARC = ?",
"kafkaURL": "",
"kafkaTopic": "",
"kafkaGroupID": ""
}
{
"sourceType": "godror",
"sourceConnectionString": "username/[email protected]:1521/orcl",
"destinationType": "sqlite3",
"destinationConnectionString": "/home/user/.kubex/web/gorm.db",
"destinationTable": "erp_products",
"destinationTablePrimaryKey": "CODPARC",
"sqlQuery": "SELECT P.CODPARC, P.NOMEPARC FROM TABLE P",
"syncInterval": "30 * * * * *",
"kafkaURL": "",
"kafkaTopic": "",
"kafkaGroupID": ""
}
{
"sourceType": "sqlite3",
"sourceConnectionString": "/home/user/.kubex/web/gorm.db",
"sourceTable": "erp_products",
"sourceTablePrimaryKey": "CODPROD",
"sqlQuery": "",
"destinationType": "sqlServer",
"destinationConnectionString": "sqlserver://username:password@localhost:1433?database=my_db_test&encrypt=disable&trustservercertificate=true",
"destinationTable": "erp_products_test",
"destinationTablePrimaryKey": "id_v",
"kafkaURL": "",
"kafkaTopic": "",
"kafkaGroupID": "",
"transformations": [
{
"sourceField": "CODPROD",
"destinationField": "id_v",
"operation": "none",
"sPath": "erp_products",
"dPath": "erp_products_test"
},
{
"sourceField": "PRODDESCR",
"destinationField": "name_v",
"operation": "none",
"sPath": "erp_products",
"dPath": "erp_products_test"
}
// Additional transformations can be specified here.
]
}
Getl uses JSON or YAML configuration files (supporting JSONC for comments) to set up data source and destination connections, transformation rules, and synchronization intervals. These files are central to configuring the ETL process, and detailed documentation is available in the Configuration Documentation.
🔜 Planned Features:
- Support for additional data sources and destinations (e.g., MongoDB, Redis)
- Enhanced transformation operations and custom processing functions
- Expanded real-time data integration via Kafka and Redis
- A dashboard for monitoring the status and performance of ETL jobs
Contributions are welcome!
Please feel free to open issues or submit pull requests. For more details, see the Contributing Guide.
This project is licensed under the MIT License.
- Developer: Rafael Mori
- GitHub: faelmori
If you find this project interesting or would like to collaborate, please reach out!