Skip to content

devals94/spark-proof-of-concept

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

spark-proof-of-concept

This repository contains the proof of concept for spark technologies.

It includes Spark-Core , Spark SQL

Spark-Core includes Python & Scala.

Spark SQL with Scala.

Spark-Core with Python:

It is shown on Uttar Pradesh Assembly Elections 2017 Dataset.

Objective in Spark-Core with Python is:

  1. To determine the candidate with maximum votes.
  2. To determine the candidate with minimum votes.
  3. To determine total candidates allotted by different parties.
  4. To determine number of Congress Candidates (INC) allotted with respect to district.
  5. To determine number of Congress Candidates (INC) allotted with respect to Assembly Constituency.
  6. Total candidates allotted with respect to different parties at Saharanpur District Level.
  7. Total number of candidates allotted with respect to phases.
  8. Who got maximum votes in BJP+.

Spark-Core with Scala:

It is shown on IPL Dataset.

Objective in Spark-Core with Scala is:

  1. To determine total number of matches played in every season.
  2. To determine number of matches played in a particular stadium.
  3. To determine the decision on winning the toss and how many times batting and fielding were selected on winning toss from season1 to season 9.
  4. To determine total number of matches played by every team.
  5. To determine total number of matches won by every team.
  6. To determine total number of matches won by winType (i.e. by runs, by wickets, tie, no results) at different stadiums.
  7. To determine total number of matches won by batting first at different stadiums.
  8. To determine total number of matches won by bowling first at different stadiums.
  9. To determine winning percentage by batting first at different stadiums.
  10. To determine winning percentage by bowling first at different stadiums.

Spark SQL with Scala:

It is shown on IMDB Dataset.

Objective in Spark SQL with Scala is:

  1. To determine movies with maximum budget.
  2. To determine the movies with maximum Facebook likes.
  3. To determine top 5 IMDB rating movies.
  4. To determine total number of movies released in different years.
  5. To determine the movies, popular with respect to actor-1.
  6. To determine the movies, popular with respect to actor-2.
  7. To determine the movies, popular with respect to actor-3.
  8. To determine the movies, popular with respect to the director.
  9. To determine the net profit of movies.
  10. To determine the worst movies according to critic reviews.
  11. To determine the best movies according to critic reviews.
  12. To determine movies with longest runtime (duration).
  13. To determine movies with shortest runtime (duration).
  14. To determine the best movies according to user reviews.
  15. To determine the worst movies according to user reviews.

Spark Streaming:

It is shown with Twitter Analysis.

Objective in Spark Streaming with Twitter App is to determine:

  1. The Popular Topics in last 60 seconds.
  2. The Popular Topics in last 10 seconds.
  3. The Username & his/her Tweets.
  4. The Time at which the User Tweeted.
  5. The FriendsCount of User.
  6. The number of Tweets & Score of User.

Releases

No releases published

Packages

No packages published