Skip to content

A collection of diverse datasets designed for practicing data analysis and visualization skills in Microsoft Excel, suitable for beginners and advanced users. Includes structured datasets with accompanying tutorials for mastering Excel functions, charts, and pivot tables.

License

Notifications You must be signed in to change notification settings

rohanmistry231/Practice-Datasets-for-Excel

Repository files navigation

Excel Datasets for Data Analytics Beginners

Welcome to the Excel Datasets for Data Analytics Beginners repository! This repository contains 15 different datasets stored in Excel format, each designed to help beginners practice and improve their data analytics skills. These datasets cover various topics, including financial analysis, market analysis, and time series analysis.

Table of Contents

  1. What are Excel Datasets?
  2. List of Excel Datasets
  3. Dataset Descriptions
    1. Superstore Sales
    2. Iris
    3. Titanic
    4. Wine Quality
    5. Adult Census Income
    6. Boston Housing
    7. Breast Cancer Wisconsin Dataset
    8. Online Shoppers Purchasing Intention
    9. Bank Marketing
    10. Avocado Prices
    11. Amazon Top 50 Bestselling Books 2009 – 2019
    12. FIFA World Cup
    13. New York City Airbnb Open Data
    14. World Happiness Report
    15. Stock Price
  4. Common Practice Questions
  5. Final Thoughts

What are Excel Datasets?

Excel datasets are collections of data stored in an Excel spreadsheet, a commonly used software for creating, manipulating, and analyzing data in a structured format. These datasets can come in two main formats: Excel (.xlsx) and Comma Separated Values (CSV).

  • Excel (.xlsx): Provides advanced features for organizing and analyzing complex data, including formulas and visualizations.
  • CSV: Offers a simpler format compatible with various software applications, making data sharing easier.

List of Excel Datasets

  1. Superstore Sales
  2. Iris
  3. Titanic
  4. Wine Quality
  5. Adult Census Income
  6. Boston Housing
  7. Breast Cancer Wisconsin Dataset
  8. Online Shoppers Purchasing Intention
  9. Bank Marketing
  10. Avocado Prices
  11. Amazon Top 50 Bestselling Books 2009 – 2019
  12. FIFA World Cup
  13. New York City Airbnb Open Data
  14. World Happiness Report
  15. Stock Price

Dataset Descriptions

1. Superstore Sales

Provides sales data for a fictional retail company, including information on products, orders, and customers. Useful for practicing data analytics techniques.

Variables:

  • Order ID, Customer ID, Order Date, Ship Date, Ship Mode, Segment, Region, Category, Sub-Category, Product Name, Sales, Quantity, Discount, Profit

2. Iris

Includes measurements of iris flowers, belonging to 3 different species: setosa, versicolor, and virginica.

Variables:

  • Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species

3. Titanic

Offers information on passengers onboard the Titanic, useful for data cleaning, preprocessing, descriptive statistics, data visualization, and predictive modeling.

Variables:

  • PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked

4. Wine Quality

Contains information on red and white wine samples, aiming to classify wine quality based on chemical properties.

Variables:

  • Fixed Acidity, Volatile Acidity, Citric Acid, Residual Sugar, Chlorides, Free Sulfur Dioxide, Total Sulfur Dioxide, Density, pH, Sulphates, Alcohol, Quality

5. Adult Census Income

Collection of demographic, social, and economic attributes about individuals from the 1994 Census database.

Variables:

  • age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country, income

6. Boston Housing

Information on housing in Boston, Massachusetts, used to analyze the relationship between house prices and other features.

Variables:

  • CRIM, ZN, INDUS, CHAS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, B, LSTAT, MEDV

7. Breast Cancer Wisconsin Dataset

Information about breast cancer tumors, used for classifying tumors as malignant or benign.

Variables:

  • ID number, Diagnosis, Radius, Texture, Perimeter, Area, Smoothness, Compactness, Concavity, Concave points, Symmetry, Fractal dimension

8. Online Shoppers Purchasing Intention

Data related to purchase patterns and consumer behavior in online shopping.

Variables:

  • Administrative, Administrative_Duration, Informational, Informational_Duration, ProductRelated, ProductRelated_Duration, BounceRates, ExitRates, PageValues, SpecialDay

9. Bank Marketing

Information about marketing campaigns for a Portuguese banking institution.

Variables:

  • Age, Job, Marital, Education, Default, Balance, Housing, Loan, Contact, Day, Month, Duration, Campaign, Pdays, Previous, Poutcome, y

10. Avocado Prices

Data related to avocado prices in the United States, collected from various sources.

Variables:

  • Date, AveragePrice, Total Volume, PLU, Type, Region, Year

11. Amazon Top 50 Bestselling Books 2009 – 2019

Data related to the top 50 best-selling books on Amazon for each year from 2009 to 2019.

Variables:

  • Name, Author, User Rating, Reviews, Price, Year, Genre

12. FIFA World Cup

Data related to the FIFA World Cup from 1930 to 2014.

Variables:

  • Year, Country, Winner, Runners-Up, Third, Fourth, GoalsScored, QualifiedTeams, MatchesPlayed, Attendance

13. New York City Airbnb Open Data

Public information about Airbnb listings in New York City.

Variables:

  • Id, Name, Host_id, Host_name, Neighbourhood_group, Neighbourhood, Latitude, Longitude, Room_type, Price, Minimum_nights, Number_of_reviews, Last_review, Reviews_per_month, Calculated_host_listings_count, Availability_365

14. World Happiness Report

Information on happiness levels of over 150 countries, including economic, social, and health factors.

Variables:

  • Country name, Year, Life Ladder, Log GDP per capita, Social support, Healthy life expectancy, Freedom to make life choices, Generosity, Perceptions of corruption, Positive affect, Negative affect

15. Stock Price

Daily stock prices of various companies.

Variables:

  • Date, Open, High, Low, Close, Adj Close, Volume

Common Practice Questions

Here are some common practice questions for each dataset to help you get started:

Superstore Sales

  • What is the total revenue generated by the store?
  • Which category of products contributes the most to sales?
  • How has the sales trend been for the past year?
  • Which region has the highest sales and which one has the lowest?
  • What is the average profit margin of the store?

Iris

  • What is the distribution of each species of iris in the dataset?
  • What is the correlation between petal length and petal width?
  • What is the average sepal length for each species of iris?
  • Which species of iris has the largest petal area?
  • How many observations are there for each species of iris?

Titanic

  • What is the survival rate of the passengers?
  • What is the average age of the passengers?
  • What is the proportion of male and female passengers?
  • Which class of passengers had the highest survival rate?
  • What is the distribution of the fare paid by the passengers?

Wine Quality

  • What is the correlation between pH and alcohol content?
  • Which type of wine (red or white) has a higher median quality rating?
  • What is the median volatile acidity for each type of wine?
  • What is the proportion of each wine type in the dataset?
  • What is the distribution of citric acid for each wine type?

Adult Census Income

  • What is the proportion of people who earn more than $50K?
  • What is the average age of people who earn more than $50K?
  • What is the correlation between age and education level?
  • What is the proportion of men and women who earn more than $50K?
  • What is the median hours worked per week for people who earn more than $50K?

Boston Housing

  • What is the correlation between the number of rooms and the median value of owner-occupied homes?
  • Which variable has the highest correlation with the median value of owner-occupied homes?
  • What is the average age of the homes?
  • What is the distribution of the pupil-teacher ratio by town?
  • Which town has the highest median value of owner-occupied homes?

Breast Cancer Wisconsin Dataset

  • What is the proportion of benign and malignant tumors?
  • What is the correlation between tumor radius and perimeter?
  • What is the average smoothness of the tumors?
  • What is the distribution of the concavity of the tumors?
  • What is

the median area of the tumors?

Online Shoppers Purchasing Intention

  • What is the proportion of visitors who make a purchase?
  • What is the average duration of visits?
  • What is the correlation between bounce rate and exit rate?
  • What is the proportion of visits on special days?
  • What is the distribution of page values?

Bank Marketing

  • What is the proportion of people who subscribed to a term deposit?
  • What is the average age of people who subscribed to a term deposit?
  • What is the correlation between age and job?
  • What is the proportion of people who have a housing loan?
  • What is the distribution of education levels?

Avocado Prices

  • What is the average price of avocados?
  • What is the correlation between total volume and average price?
  • What is the distribution of avocado types?
  • What is the trend of avocado prices over the years?
  • Which region has the highest average price of avocados?

Amazon Top 50 Bestselling Books 2009 – 2019

  • What is the average user rating of the books?
  • What is the correlation between price and user rating?
  • What is the proportion of books by genre?
  • Which author has the most bestsellers?
  • What is the distribution of reviews for the books?

FIFA World Cup

  • What is the average number of goals scored per World Cup?
  • Which country has won the most World Cups?
  • What is the distribution of attendance per World Cup?
  • What is the correlation between the number of goals scored and the number of matches played?
  • What is the trend of qualified teams over the years?

New York City Airbnb Open Data

  • What is the average price of listings?
  • What is the correlation between number of reviews and price?
  • What is the proportion of room types?
  • Which neighborhood group has the highest number of listings?
  • What is the distribution of availability for the listings?

World Happiness Report

  • What is the average life ladder score for the countries?
  • What is the correlation between GDP per capita and life ladder score?
  • What is the distribution of social support scores?
  • Which country has the highest healthy life expectancy?
  • What is the trend of positive affect scores over the years?

Stock Price

  • What is the average closing price of the stocks?
  • What is the correlation between opening and closing prices?
  • What is the distribution of trading volume?
  • What is the trend of stock prices over the years?
  • Which company has the highest average closing price?

Final Thoughts

We hope you find these datasets helpful for practicing your data analytics skills. Feel free to reach out if you have any questions or suggestions for additional datasets. Happy analyzing!

About

A collection of diverse datasets designed for practicing data analysis and visualization skills in Microsoft Excel, suitable for beginners and advanced users. Includes structured datasets with accompanying tutorials for mastering Excel functions, charts, and pivot tables.

Topics

Resources

License

Stars

Watchers

Forks