This Setup involves activation of RStudio on bda Gateways and uses of the following libraries for R :
Setup in rstudio_sparklyr , run in this order :
- $ bash -x install_rstudio.sh
- $ bash -x install_additional_packages.sh
- $ bash -x install_sparklyr_2.1.sh
Using sparklyr (Apache Spark) , DBI (Apache Hive SQL) , H2O (Machine Learning)
Using H2O with SparklyR : https://spark.rstudio.com/h2o.html
- Productionizing an H2O sparklir model : http://docs.h2o.ai/h2o/latest-stable/h2o-docs/productionizing.html
- Usage of H2O Models as Hive UDF POJO/MOJO : https://github.com/h2oai/h2o-tutorials/tree/master/tutorials/hive_udf_template/hive_udf_pojo_template
Using SparkliR with conda R zio in the driver for spark containers :
- From Cloudera Workbench option2 : https://blog.cloudera.com/blog/2017/09/how-to-distribute-your-r-code-with-sparklyr-and-cdsw/
Enjoy :)