Skip to content

Parallel recursive and streaming decision tree implemented in CilkPlus.

Notifications You must be signed in to change notification settings

XXYXie/ParallelDecisionTree

Repository files navigation

Parallel Decision Tree Building

Sirui Li, Xueyan Xie

CSE 549 Final Project

We studied how to parallelize the Decision Tree Building procedure in a multi-processors setting. We first used Cilk spawn, sync, and parallel for to parallelize the algorithm. We tested its correctness as well as the time and speedup. Then we looked at an approximation algorithm, Streaming Parallel Decision Tree, which is designed for classifying large data sets and streaming data. It assumes a setting where the size of data set is so large that it has to be stored distributively in different processors. We implemented SPDT in shared memory machine, measured its accuracy and performance and compared it with our recursive decision tree. At last, we applied the decision tree algorithm as a subroutine to build Bagging Forest. We saw some very interesting behaviors in terms of accurracy, time and speedup.

Recursive Decision Tree Performance

SPDT Performance

About

Parallel recursive and streaming decision tree implemented in CilkPlus.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published