DataIO is an internal software platform for data ingestion, transformation, and distribution developed and used by our company DBC.
- Java SE 17
- Jakarta EE 10 / Payara Micro 6
- Maven multi-module reactor, use mvn
- Docker, images are built as part of the maven package lifecycle
DataIO is organized around job processing:
- Input data is stored as data files in the file-store-service component
- These data files, together with other primary job configuration parameters are referenced in job specifications
- Jobs are created by submitting job specifications to the job-store-service component
- The job-store-service partitions jobs into chunks of up to 10 items containing the actual records to be processed
- The job-store-service uses the flow-store-service component to determine the actual processing flow and destination to be used for each job
- Each chunk is processed by the job-processor component, which uses JavaScript business logic external to the dataIO system to transform the data
- Sinks deliver results from the processing to internal and external systems
- All component paths are relative to the root of the project
| Component | Path | Notes |
|---|---|---|
| file-store-service | file-store-service/ | service |
| file-store-service-connector | commons/utils/file-store-service-connector/ | client lib |
| flow-store-service | flow-store-service/ | service |
| flow-store-service-connector | commons/utils/flow-store-service-connector/ | client lib |
| harvester | harvester/ | see subdirectories of harvester/ for the full list of harvester implementations. |
| job-processor | job-processor2/app | service |
| job-store-service | job-store-service/war/ | service |
| job-store-service-connector | commons/utils/job-store-service-connector/ | client lib |
| sink | sink/ | see subdirectories of sink/ for the full list of sink implementations. |