Skip to content
This repository was archived by the owner on Mar 12, 2022. It is now read-only.
Alexsey Konstantinov edited this page Mar 5, 2020 · 2 revisions

Project description

This project contains examples of the development of ETL processes for working with data sources based on the open source product Getl.

Install

Use git to clone this repository to yourself. Use the Idea or Eclipse development environment to easily work with source code and run examples.

Initialization for working examples

From under your development environment, run class getl.examples.init.InitRun.

Along the way of storing OS temp files, he will create the "getl_examples" directory, expand the H2 database with the structure in it, and fill the directories with data.

Repository demo

Package getl.examples.repository contains scripts for describing database tables, JSON and XML files.

These scripts register the description of objects in the repository and are called from data processing processes.

Generation and filling in database tables

Package getl.examples.init contains scripts that are used to initialize the examples.

In the script CreateDBObjects the schema and database objects are created.

In the script GenerateDate data is filled into the created database tables. The "price" table is filled out with a direct record indicating the values. Data on customers and their phones are loaded into tables "customers" and "customers_phones" from the Xml file customers.xml, which is located in the project resources.

Sales data generation example

Package getl.examples.sales contains script Sales, in which sales are generated by a random set of values according to the given rules. To fill in the field "id" a "sales" sequence is used.

You can use class SalesRun to start generating sales.

Multi-threaded file processing

Package getl.examples.events contains scripts that allow you to generate Json files with event data and upload files to the database.

The script GenerateFiles in multithreaded mode generates events and writes them to Json files. Files are generated for the specified number of days and stored with the grouping by day in directories. "historypoint('db:points')" object allows you to save the maximum generated file date in the database table. At the next call, it will be used to generate the next day.

Script ProcessingFiles collects a list of files on a file source and in multithreaded mode parses them and loads them into a database table "events". Download occurs in turn for each date, where the directory files for the date are downloaded multithreaded, sorted by number. This ensures smooth loading. Download data is written to the database history table and deleted from the source.

To run the sample package "getl.examples.events" use the class EventsRun.

Work with resource files

The script getl.examples.repository.Db loads the logins for the database from the configuration, which is located in the resource file logins.db.conf. For connection, "loginsConfigStore" indicates the configuration section in which logins are stored. Processes when working with such a connection can simply set the desired login using "useLogin".

To create a schema, users in the database and set rights, the script getl.examples.init.CreateDBObjects uses the sql script from the resources file create_db_objects.sql. This demonstrates how to work with the Getl procedural extension for sql.