Skip to content

This repository shows the Case Study of Yellow Taxi Cabs of NYC, using the Hadoop-Hive ecosystem with HiveQL.

Notifications You must be signed in to change notification settings

PriyankaJhaTheAnalyst/YellowTaxiNYC_HiveCaseStudy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

Project Overview

Yellow Taxicabs are the only vehicles that have the right to pick up street-hailing and prearranged passengers anywhere in New York City. My objective is upload the collected dataset to hadoop ecosytem, analyse, and explore the uploaded dataset while answering some important questions.


HiveQL is the language used to query the dataset.

You can see my HiveQL queries here.


Data and Exploration

Dataset:

The dataset used in this Hadoop-Hive Case Study is collected from the official website of the NYC Taxi and Limousine Commission (TLC) of the year 2015. The yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.


Exploration Questions:

In this case study I explore the following questions:

  1. What is the total number of trips (equal to the number of rows)?
  2. What is the total revenue generated by all the trips?
  3. What fraction of the total is paid for tolls?
  4. What fraction of it is driver tips?
  5. What is the average trip amount?
  6. What is the average distance of the trips?
  7. How many different payment types are used?
  8. For each payment type, display the following details:
  • Average fare generated
  • Average tip
  • Average tax
  1. On average which hour of the day generates the highest revenue?

About

This repository shows the Case Study of Yellow Taxi Cabs of NYC, using the Hadoop-Hive ecosystem with HiveQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published