Skip to content

"Apple Store App Analytics: A Practical SQL Data Analysis" provides real-world insights into app categories, pricing, and user ratings. Exploratory Data Analysis, Insights of paid vs. free app impacts, language support, and app description length in correlation with user rating.

Notifications You must be signed in to change notification settings

UdaykiranEstari/Practical-SQL-DataAnalysis-AppleStore-App-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Unlocking App Success: A Data-Driven Journey with SQL on Apple Store Apps

Introduction

In this project, I leveraged SQL to uncover valuable insights offering essential guidance to app developers. From the advantages of paid apps and language support optimization to pinpointing lower-rated app categories, I navigated the intricacies of the Apple Store ecosystem. I also examined the role of app descriptions in user ratings and provided a benchmark for new apps looking to stand out in a competitive market. Particularly, I addressed the competitive games and entertainment genres, revealing a path to success through unique value and quality.

This repository is a culmination of my extensive practice and learning in SQL, and I'm excited to showcase my skills in the real world. The main purpose of this project is to serve as a future reference for myself, where I can review and reinforce what I have learned. Additionally, I hope this repository will be a valuable resource for anyone who is learning SQL and looking for real-world examples.

SQL Expertise

Throughout my learning, I've honed my skills in SQL and its applications in data analysis. Some key areas where I excel include:

  • Crafting complex SQL queries to efficiently extract and manipulate data for analysis.
  • Performing exploratory data analysis (EDA) using SQL to gain insights into data characteristics.
  • Utilizing SQL for data cleaning, integration, and preparation tasks to ensure data quality.
  • Optimizing SQL queries for performance and efficiency, even with large datasets and databases.

Showcase Project: Practical Data Analysis of Apple Store Apps

In this project, we worked with two datasets: applestore.csv and AppleStore_description. These datasets contain information about various apps available on the Apple Store, such as app names, ratings, app types, and descriptions. The goal was to provide data-driven insights to an aspiring app developer who needed to make decisions about app categories, pricing, and strategies to maximize user ratings.

Project Overview

Dataset and Tools

For this project, I utilized the powerful online resource sqliteonline.com, which enables SQL analysis without any installations. Due to the platform's 4 MB size limit, I divided the large CSV file into four smaller files and combined them using the UNION ALL feature to create a unified dataset for analysis.

However, it's essential to note that this project is highly adaptable to different flavors of SQL, such as PostgreSQL or MSSQL. While there might be minor differences in syntax, the overall idea and methodology of the project remain consistent across various SQL environments.

Feel free to use your preferred SQL flavor and tools to replicate and extend the analysis presented in this project. Whether you choose PostgreSQL, MSSQL, MySQL, or any other SQL variant, the fundamental principles of data analytics and querying remain applicable.

Happy exploring and analyzing with your flavor of SQL!

Project Steps

Step 1: Stakeholder Identification

The first step of the project involved identifying the stakeholder, who was an aspiring app developer looking for data-driven insights to build successful apps. Our goal was to answer essential questions such as:

  • Identifying popular app categories: We aimed to determine which app categories were the most popular among users.
  • Determining pricing strategies: We sought to understand how app pricing affected user engagement and ratings.
  • Understanding factors influencing user ratings: We aimed to analyze various factors that influenced user ratings for apps on the Apple Store.

Step 2: Exploratory Data Analysis (EDA)

EDA played a crucial role in understanding the dataset's structure and quality. We conducted several exploratory data analysis tasks, including:

  • Checking for missing values: We ensured that the data was clean and reliable by checking for any missing values in the dataset.
  • Exploring app categories' distribution: Through data visualisation and summary statistics, we identified the dominant genres among the apps in the dataset.

In this phase, we gained valuable insights into the dataset, which provided a foundation for further analysis and decision-making.

for more info check the sql file AppStore_Analysis.sql

Step 3 : Data Analysis and Insights

After completing the exploratory data analysis (EDA), we proceeded with the data analysis phase, where we derived valuable insights to guide our stakeholder's app development decisions. The following are the key insights we obtained:

1. Paid vs. Free Apps

-- Finding Whether avg rating of paid or free app is higher 

SELECT CASE
        WHEN price > 0 THEn "Paid"
        ELSE "Free" 
       END AS App_Type,
       AVG(user_rating) as AvgRating
  FROM AppleStore
 GROUP BY App_Type

-- Number of paid and free apps with the top user rating in each genre

WITH cte_topranked_apps AS
(
    SELECT
        prime_genre,
        track_name,
        price,
        user_rating,
        RANK() OVER (ORDER BY user_rating DESC) AS rank,
        CASE
            WHEN price > 0 THEN "Paid"
            ELSE "Free" 
        END AS App_Type
    FROM AppleStore
)

SELECT
    prime_genre,
    App_Type,
    COUNT(*) AS no_of_apps
 FROM cte_topranked_apps
WHERE rank = 1
GROUP BY prime_genre, App_Type
ORDER BY prime_genre;

On conducting the analysis, we found that, on average, paid apps received slightly higher ratings compared to free apps. This suggests that users might perceive paid apps as having higher value and engagement, leading to better ratings. As a recommendation, our stakeholder could consider offering a paid app to potentially enhance user engagement and increase the perceived value of their app.

In conlucsion Paid and free apps have 3.72, 3.38 avg_userrating respectively. Paid apps have 10% more avgrating compared to free apps.

2. Languages and Ratings

-- Dividing languages into groups <10 , 10-30 , 31-50 , 51-100

SELECT  CASE
  	    WHEN lang_num < 10 THEN "< 10 Languages"
            WHEN lang_num BETWEEN 10 AND 30 THEN "10-30 Languages" 
            WHEN lang_num BETWEEN 31 AND 50 THEN "31-50 Languages"
            WHEN lang_num BETWEEN 50 AND 100 THEN "50-100 Languages"
            END as LangGroup,
        AVG(user_rating) as AvgRating
FROM AppleStore
GROUP BY LangGroup
ORDER BY AvgRating DESC

-- Here 10-30 Language group has more ratings .. if we want more info let's go into little depth

SELECT CASE
 	    WHEN lang_num BETWEEN 10 AND 15 THEN "10-15 Languages"
            WHEN lang_num BETWEEN 16 AND 20 THEN "16-20 Languages"
            WHEN lang_num BETWEEN 21 AND 25 THEN "21-25 Languages"
            WHEN lang_num BETWEEN 26 AND 30 THEN "26-30 Languages"
       END AS LangGroup,
       AVG(user_rating) as AvgRating
FROM AppleStore
GROUP BY LangGroup
ORDER BY AvgRating DESC

-- 10-15 Language support is enough is good.

Through our data analysis, we observed a positive correlation between the number of languages supported by an app and its user ratings. Apps that supported a moderate number of languages tended to have better average user ratings. This insight indicates that focusing on supporting the most relevant languages for the target audience can significantly improve user satisfaction and overall ratings.

But as per conclusion 10-15 Language support is enough is good and allot time for other improvement instead of adding more language support

3. Low-Rated Genres

-- Check Genre with low rating

SELECT prime_genre,
Avg(user_rating) as AvgRating
FROM AppleStore
GROUP BY prime_genre
ORDER BY AvgRating
LIMIT 5

-- Catalogs, Finance and Book genre has very less reviews. Creating a better product in this genre gives high chance of success

Our analysis also identified genres with low ratings in the dataset. This finding presents an opportunity for our stakeholder to explore these genres further and potentially address user needs in those categories. By understanding the pain points and shortcomings of low-rated apps in these genres, our stakeholder can make strategic decisions to create high-quality apps that cater to user preferences, leading to the possibility of achieving higher ratings.

4. App Description Length

-- Check if there is realation between app_description and user_rating

SELECT  CASE
	    WHEN Length(b.app_desc) < 500 THEN "Short"
            WHEN Length(b.app_desc) BETWEEN 500 AND 1000 THEN "Medium"
            ELSE "Long"
        END AS LangLenGroup,
        Avg(user_rating) as Avg_Rating
FROM AppleStore a
JOIN appleStore_description_combined b
ON a.id = b.id
GROUP BY LangLenGroup
ORDER BY Avg_Rating DESC

A notable insight from the analysis was the positive correlation between app description length and user ratings. Apps with longer and more comprehensive descriptions tended to receive better user ratings. This suggests that clear and detailed app descriptions can positively influence user perceptions and satisfaction. As a recommendation, we advised our stakeholder to focus on crafting clear and engaging app descriptions to enhance user ratings.

5. App Description Keywords

-- Having keywords and their explain like Features, Usage, Hints, Tips will increase your rating 

PRAGMA case_sensitive_like = true;

SELECT avg(a.user_rating) as Avg_Rating
FROM AppleStore a
JOIN appleStore_description_combined b
ON a.id = b.id
WHERE  	b.app_desc LIKE '%Feature%' 
		OR b.app_desc LIKE '%Usage%'
        OR b.app_desc LIKE '%Hint%'
        OR b.app_desc LIKE '%Tips%'

keywords could enhance app features and usage, leading to better user satisfaction and higher ratings.

6.Top Apps in Each Category

-- Top Apps in each Category by user ratings if there is tie between 2 apps it's broken by total number of ratings 

WITH cte_topranked_apps AS
(
    SELECT
        prime_genre,
        track_name,
        user_rating,
	rating_count_tot,
        RANK() OVER (PARTITION BY prime_genre ORDER BY user_rating DESC, rating_count_tot DESC) AS rank
    FROM AppleStore
)

SELECT  prime_genre,
        track_name,
        user_rating,
	rating_count_tot
FROM cte_topranked_apps
WHERE rank = 1
ORDER BY prime_genre, rating_count_tot DESC

If you are interested in all the top apps with substantial number of users. here I have taken this substantial number of users as 10000

WITH cte_topranked_apps AS
(
    SELECT
        prime_genre,
        track_name,
        user_rating,
	rating_count_tot,
        RANK() OVER (ORDER BY user_rating DESC) AS rank
    FROM AppleStore
    WHERE rating_count_tot > 10000
)

SELECT *
FROM cte_topranked_apps
WHERE rank = 1
ORDER BY prime_genre, rating_count_tot DESC

Considering the rating count filter, we can focus on apps that have been rated by a substantial number of users, ensuring that the top-ranked apps are genuinely popular and influential within their respective genres. This analysis provides valuable insights for app developers seeking to understand the most highly-rated apps in different genres and make informed decisions for app development strategies.

Final Recommendations

  1. Paid Vs Free: In general, paid apps tend to have higher user ratings compared to free apps. To optimize ratings and pricing, consider analyzing different price ranges to find the sweet spot that resonates best with users.

  2. App Supporting more languages: Supporting 10-15 languages is sufficient, and focusing on the most relevant languages is key. Adding too many languages may not significantly impact user ratings.

  3. Categories like Catalogs, Finance & Books: These categories have lower average user ratings, presenting a valuable opportunity to create apps that address user needs and provide better satisfaction.

  4. Length of App Description: Apps with comprehensive descriptions, highlighting features, usage, hints, and tips, tend to receive higher ratings. Ensure app descriptions are informative and user-friendly.

  5. New Apps: Aim for an average rating above 3.5 to stand out and gain popularity in the competitive app market.

  6. Games and Entertainment: Entry into these genres is highly competitive due to the vast library of apps with great user ratings. However, if your app offers unique value and exceptional quality, it can succeed and achieve significant incentives due to high demand.

Implementing these recommendations can enhance app development strategies and lead to greater user engagement, higher user satisfaction, and ultimately, increased success in the app market.

I hope this project serves as a practical reference for both myself and others in the data community who are learning SQL and seeking real-world examples. If you have any questions or wish to collaborate on SQL-related projects, please feel free to reach out.

Thank you for joining me on this data analytics journey!

Special thanks to Lore So What for his insights and inspiration.

About

"Apple Store App Analytics: A Practical SQL Data Analysis" provides real-world insights into app categories, pricing, and user ratings. Exploratory Data Analysis, Insights of paid vs. free app impacts, language support, and app description length in correlation with user rating.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published