Project Overview

Yellow Taxicabs are the only vehicles that have the right to pick up street-hailing and prearranged passengers anywhere in New York City. My objective is upload the collected dataset to hadoop ecosytem, analyse, and explore the uploaded dataset while answering some important questions.

HiveQL is the language used to query the dataset.

You can see my HiveQL queries here.

Data and Exploration

Dataset:

The dataset used in this Hadoop-Hive Case Study is collected from the official website of the NYC Taxi and Limousine Commission (TLC) of the year 2015. The yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

Exploration Questions:

In this case study I explore the following questions:

What is the total number of trips (equal to the number of rows)?
What is the total revenue generated by all the trips?
What fraction of the total is paid for tolls?
What fraction of it is driver tips?
What is the average trip amount?
What is the average distance of the trips?
How many different payment types are used?
For each payment type, display the following details:

Average fare generated
Average tip
Average tax

On average which hour of the day generates the highest revenue?

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
NYC Yellow Taxi Hive Case Study.txt		NYC Yellow Taxi Hive Case Study.txt
README.md		README.md
yellow_tripdata_2015-01-06.csv		yellow_tripdata_2015-01-06.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

HiveQL is the language used to query the dataset.

You can see my HiveQL queries here.

Data and Exploration

Dataset:

Exploration Questions:

About

Releases

Packages

PriyankaJhaTheAnalyst/YellowTaxiNYC_HiveCaseStudy

Folders and files

Latest commit

History

Repository files navigation

Project Overview

HiveQL is the language used to query the dataset.

You can see my HiveQL queries here.

Data and Exploration

Dataset:

Exploration Questions:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages