-
Notifications
You must be signed in to change notification settings - Fork 14
02_How to run
The easiest way to run bpmn.ai is via the provided Docker images from Docker Hub. This way you neither need to build the application, nor do you need to have Apache Spark setup locally.
The latest tag is synchronised with the master branch. So the best way is to use the latest tag or a specific version tag.
When specifying paths as parameters it is important to use paths from inside the Docker container, which have to be mapped in order for the container to be able to access it.
The Docker container will automatically use all CPUs available to Docker.
docker run -it --rm \
-v <local_folder_to_map_into_docker_container>:/data \
viadee/bpmn.ai:latest \
de.viadee.bpmnai.core.CSVImportAndProcessingApplication \
-fs /data/<path_to_input_csv_using_path_inside_docker_container> \
-fd /data/<path_to_target_folder_for_results_using_path_inside_docker_container> \
-d <field_delimiter>
All application parameters available in each bpmn.ai application can be found in this document.
Make sure to run the applciation twice if no configuration file was existing during first run, as then it will only create the configuration file, but not do the complete processing.
In order to run the application with spark-submit you first need to package the applcation to a JAR file with maven.
mvn clean package
Then you can run the spark-submit command from the bin folder of your Apache Spark installation by referencing the created JAR file and the Spark application class you would like to run including its parameters.
bin/spark-submit \
--class de.viadee.bpmnai.core.CSVImportAndProcessingApplication \
--master "local[*]" \
--deploy-mode client \
--name ViadeeBpmnai \
<path_to_packaged_jar_with_dependencies> \
-fs <path_to_input_csv> \
-fd <path_to_target_folder_for_results> \
-d <field_delimiter>
All application parameters available in each bpmn.ai application can be found in this document.
Make sure to run the applciation twice if no configuration file was existing during first run, as then it will only create the configuration file, but not do the complete processing.
In order for the bpmn.ai-core applications to run in IntelliJ the run configuration needs to be amended. Try to run the Applicaiton once as a Java Application and the add the following parameters in the run configuration:
This defines that Spark should run in local standalone mode and utilise all CPUs.
-Dspark.master=local[*]
-Dspark.executor.memoryOverhead=1g
-Dspark.driver.memory=2g
Here you define the parameters of the respective Spark application. You need to define the parameters as listed above, e.g.
-fs <path_to_input_csv> -fd <path_to_target_folder_for_results> -d <field_delimiter>
Now you can run the application via the run configuration.
All application parameters available in each bpmn.ai application can be found in this document.
Make sure to run the applciation twice if no configuration file was existing during first run, as then it will only create the configuration file, but not do the complete processing.
bpmn.ai is built to harvest low hanging fruits with ML. Starting is easy. Take a look at the tutorials in the wiki, to get your Camunda event history into a ML table.