-
Notifications
You must be signed in to change notification settings - Fork 14
11_Tutorial 1 ‐ Using CSV input
This tutorial explains how data from the Camunda process engine can be preprocessed as a csv file using the bpmn.ai-core pipeline.
The following installations have to be done in advance:
- Java 8 (JRE, JDK)
- Development environment (Eclipse or IntelliJ)
- Maven
- Hadoop (only required for Windows, follow this tutorial)
Afterwards, clone the repository and import the files into your preferred IDE (note Maven dependencies). Select the bpmnai-core folder instead of bpmn.ai as the root of the project.
If you do not have a Camunda database available yet, you can skip this step and use the example file camundaExport as a data basis.
To export the data from Camunda the tables ACT_HI_PROCINST and ACT_HI_VARINST have to be selected from the Camunda h2 database. For example the tool razorsql can be used for this.
The SQL statement for exporting the data from Camunda is as follows:
SELECT * FROM ACT_HI_PROCINST a JOIN ACT_HI_VARINST v ON a.PROC_INST_ID_ = v.PROC_INST_ID_ AND a.proc_def_key_ = 'XYZ'
After selecting the tables, the data must be stored in the format csv. The pipeline currently expects all column names for the CSVImportAndProcessingApplication to be lowercase. If this is not the case in the exported CSV, remember to convert the column names in advance (the example file is already in the correct format).
In the following step the exported file will be preprocessed with the help of the bpmnai-core. Since we have a csv file, this will be done with the CSVImportAndProcessingApplication. Execute the application via the Run Configuration of your IDE. Therefore, the following arguments have to be passed:
Program Arguments:
-fs "./tutorials/bpmnai-core/camundaExport.csv" -fd "./tutorials/bpmnai-core" -d ";" -of "csv"
If you extracted data from your own Camunda database, adjust the file path accordingly.
VM Arguments:
-Dspark.master="local[*]"
Your working directory should now contain a result folder containing the preprocessed data in csv and parquet format. To compare your results, the result file can also be found in the tutorial folder. Beside this file a default configuration file is generated listing all steps that have been performed during the last run. If you want to modify certain steps or add new steps, you have to adjust this file accordingly and run the application again.
bpmn.ai is built to harvest low hanging fruits with ML. Starting is easy. Take a look at the tutorials in the wiki, to get your Camunda event history into a ML table.