A decision tree builder using hadoop 2.3.0 and map reduce.
It produces a binary decision tree formatted in an XML file.
The program assumes that categorical values cannot be parsed as doubles.
treeBuilder contains the project for building the tree.
treeTester contains the project for testing a built tree.
The output class can be a numerical value, however if there are many numerical values relating to a given output class, the algorithm will not produce good results.
see instructions folder for detailed instructions and bashrc file
change directory to either treeBuilder or treeTester
edit the command line arguments through pom.xml
the input files and output class location is specified as command line arguments
to build the jar file: mvn clean compile package
to run the built jar file: mvn exec:exec
for treeBuilder
create /tmp/inputs and /tmp/outputs/ in HDFS
add input files in /tmp/inputs
after running mvn exec:exec and job completes
output tree file will be in /tmp/outputs/tree
output tree file will also be in local project directory
chess dataset obtained from: https://archive.ics.uci.edu/ml/datasets/Chess+(King-Rook+vs.+King-Pawn)
wdbc dataset obtained from: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
SUSY dataset (demonstrating scalability) obtained from http://archive.ics.uci.edu/ml/datasets/SUSY