Exploratory Data Analysis (EDA) on Hostels in Japan using Python

An analysis on an affordable lodging option for travelers using Pandas, Matplotlib, Seaborn and Scipy modules.

Data Source

This dataset was webscraped from Hostelworld by Koki Ando and is available to the public on Kaggle (https://www.kaggle.com/koki25ando/hostel-world-dataset).

Reading CSV File and Data Clean-Up

# Dependencies
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
from wordcloud import WordCloud

hostels_df = pd.read_csv("../raw_data/Hostel.csv", index_col=0)

# Rename column names
hostels_df.columns = ["Hostel Name", "City", "Min. Price for One Night (yen)", 
                      "Distance from City Center", "Summary Score", "Rating", "Atmosphere", 
                      "Cleaniness", "Facilities", "Location", "Security", "Staff", "Value for Money", 
                      "Longitude", "Latitude"]
hostels_df.head()

	Hostel Name	City	Min. Price for One Night (yen)	Distance from City Center	Summary Score	Rating	Atmosphere	Cleaniness	Facilities	Location	Security	Staff	Value for Money	Longitude	Latitude
1	"Bike & Bed" CharinCo Hostel	Osaka	3300	2.9km from city centre	9.2	Superb	8.9	9.4	9.3	8.9	9.0	9.4	9.4	135.513767	34.682678
2	& And Hostel	Fukuoka-City	2600	0.7km from city centre	9.5	Superb	9.4	9.7	9.5	9.7	9.2	9.7	9.5	NaN	NaN
3	&And Hostel Akihabara	Tokyo	3600	7.8km from city centre	8.7	Fabulous	8.0	7.0	9.0	8.0	10.0	10.0	9.0	139.777472	35.697447
4	&And Hostel Ueno	Tokyo	2600	8.7km from city centre	7.4	Very Good	8.0	7.5	7.5	7.5	7.0	8.0	6.5	139.783667	35.712716
5	&And Hostel-Asakusa North-	Tokyo	1500	10.5km from city centre	9.4	Superb	9.5	9.5	9.0	9.0	9.5	10.0	9.5	139.798371	35.727898

# Get hostel count for each city
city_group = hostels_df.groupby("City").count()
city_group

	Hostel Name	Min. Price for One Night (yen)	Distance from City Center	Summary Score	Rating	Atmosphere	Cleaniness	Facilities	Location	Security	Staff	Value for Money	Longitude	Latitude
City
Fukuoka-City	19	19	19	17	17	17	17	17	17	17	17	17	14	14
Hiroshima	14	14	14	14	14	14	14	14	14	14	14	14	13	13
Kyoto	78	78	78	73	73	73	73	73	73	73	73	73	66	66
Osaka	104	104	104	101	101	101	101	101	101	101	101	101	89	89
Tokyo	127	127	127	122	122	122	122	122	122	122	122	122	116	116

# Reset index
city_group = city_group.reset_index()
city_group

	City	Hostel Name	Min. Price for One Night (yen)	Distance from City Center	Summary Score	Rating	Atmosphere	Cleaniness	Facilities	Location	Security	Staff	Value for Money	Longitude	Latitude
0	Fukuoka-City	19	19	19	17	17	17	17	17	17	17	17	17	14	14
1	Hiroshima	14	14	14	14	14	14	14	14	14	14	14	14	13	13
2	Kyoto	78	78	78	73	73	73	73	73	73	73	73	73	66	66
3	Osaka	104	104	104	101	101	101	101	101	101	101	101	101	89	89
4	Tokyo	127	127	127	122	122	122	122	122	122	122	122	122	116	116

# Create new data frame for city name and hostel count
city_group = city_group[["City", "Hostel Name"]]
city_group

	City	Hostel Name
0	Fukuoka-City	19
1	Hiroshima	14
2	Kyoto	78
3	Osaka	104
4	Tokyo	127

# Rename columns
city_group.rename(columns={"Hostel Name": "Hostel Count"}, inplace=True)

	City	Hostel Count
0	Fukuoka-City	19
1	Hiroshima	14
2	Kyoto	78
3	Osaka	104
4	Tokyo	127

Basic Visualizations using Matplotlib Charts

Tokyo, Osaka, and Kyoto are the top three cities for hostel counts which comes to no surprise with those cities being the top tourist destinations in Japan.

# Create bar chart for hostel count by city
city_bar = city_group.plot.bar(x="City", y="Hostel Count", rot=90, legend=None, color="teal", zorder=3)
plt.grid(which="major", axis="y", zorder=0)
plt.xticks(rotation=45)
plt.title("Hostel Count by City in Japan")
plt.ylabel("# of Hostels")
plt.savefig("../reports/figures/hostel_bar.png")
plt.show()

As seen in the pie chart below, more than 1/3 of the hostels are located in Tokyo. Around 2/3 of the hostels are located in Tokyo and Osaka.

# Create pie chart for hostel count by city
hostel_count = city_group["Hostel Count"]
colors = ["aqua", "lightblue", "gold", "olive", "turquoise"]
city_labels = city_group["City"]
plt.figure(figsize=(8,6))
plt.pie(hostel_count, labels=city_labels, colors=colors, startangle=115, autopct="%1.1f%%")

plt.title("Hostel Count Percentage by City in Japan")
plt.savefig("../reports/figures/hostel_pie.png")
plt.show()

# Use split to obtain numeric value from distance column
hostels_df["Distance from City Center (km)"] = hostels_df["Distance from City Center"].str.split("km").str[0]

# Drop unneeded column
hostels_df = hostels_df.drop(["Distance from City Center"], axis=1)
hostels_df.head()

	Hostel Name	City	Min. Price for One Night (yen)	Summary Score	Rating	Atmosphere	Cleaniness	Facilities	Location	Security	Staff	Value for Money	Longitude	Latitude	Distance from City Center (km)
1	"Bike & Bed" CharinCo Hostel	Osaka	3300	9.2	Superb	8.9	9.4	9.3	8.9	9.0	9.4	9.4	135.513767	34.682678	2.9
2	& And Hostel	Fukuoka-City	2600	9.5	Superb	9.4	9.7	9.5	9.7	9.2	9.7	9.5	NaN	NaN	0.7
3	&And Hostel Akihabara	Tokyo	3600	8.7	Fabulous	8.0	7.0	9.0	8.0	10.0	10.0	9.0	139.777472	35.697447	7.8
4	&And Hostel Ueno	Tokyo	2600	7.4	Very Good	8.0	7.5	7.5	7.5	7.0	8.0	6.5	139.783667	35.712716	8.7
5	&And Hostel-Asakusa North-	Tokyo	1500	9.4	Superb	9.5	9.5	9.0	9.0	9.5	10.0	9.5	139.798371	35.727898	10.5

# Change all numeric values to integer/float data types
hostels_df.apply(pd.to_numeric, errors="ignore")

hostels_df.dtypes

Hostel Name                        object
City                               object
Min. Price for One Night (yen)      int64
Summary Score                     float64
Rating                             object
Atmosphere                        float64
Cleaniness                        float64
Facilities                        float64
Location                          float64
Security                          float64
Staff                             float64
Value for Money                   float64
Longitude                         float64
Latitude                          float64
Distance from City Center (km)     object
dtype: object

hostels_df[["Distance from City Center (km)"]] = hostels_df[["Distance from City Center (km)"]].apply(pd.to_numeric)
hostels_df.dtypes

Hostel Name                        object
City                               object
Min. Price for One Night (yen)      int64
Summary Score                     float64
Rating                             object
Atmosphere                        float64
Cleaniness                        float64
Facilities                        float64
Location                          float64
Security                          float64
Staff                             float64
Value for Money                   float64
Longitude                         float64
Latitude                          float64
Distance from City Center (km)    float64
dtype: object

# Check min and max for minimum price column
hostels_df.groupby(["City"]).max()["Min. Price for One Night (yen)"]

City
Fukuoka-City       4300
Hiroshima          3400
Kyoto              4000
Osaka           1003200
Tokyo           1003200
Name: Min. Price for One Night (yen), dtype: int64

hostels_df.groupby(["City"]).min()["Min. Price for One Night (yen)"]

City
Fukuoka-City    2300
Hiroshima       2000
Kyoto           1000
Osaka           1200
Tokyo           1300
Name: Min. Price for One Night (yen), dtype: int64

Outliers

There are two outliers that are skewing with the hostel price distribution. Since the two outliers are of the same value and questionably high in value, it is reasonable to assume that a number was unintentionally scraped as a price value. It would be appropriate to remove these outliers.

# Check value count for each unique value for minimum price column
hostels_df.groupby(["Min. Price for One Night (yen)"]).count()["City"]

Min. Price for One Night (yen)
1000        1
1200        5
1300        5
1400        3
1500       21
1600        9
1700        2
1800       13
1900       11
2000       39
2100        8
2200       15
2300       18
2400       10
2500       45
2600       14
2700       14
2800       12
2900       13
3000       23
3100        2
3200        6
3300       10
3400        4
3500        6
3600        4
3700        2
3800        6
3900        2
4000        5
4100        2
4200        1
4300        1
5200        1
5400        1
5500        1
6000        1
6200        1
6300        1
6500        1
7600        1
1003200     2
Name: City, dtype: int64

# Remove outlier
hostels_reduced = hostels_df[hostels_df["Min. Price for One Night (yen)"] < 8000]
hostels_reduced.groupby(["Min. Price for One Night (yen)"]).count()["City"]

Min. Price for One Night (yen)
1000     1
1200     5
1300     5
1400     3
1500    21
1600     9
1700     2
1800    13
1900    11
2000    39
2100     8
2200    15
2300    18
2400    10
2500    45
2600    14
2700    14
2800    12
2900    13
3000    23
3100     2
3200     6
3300    10
3400     4
3500     6
3600     4
3700     2
3800     6
3900     2
4000     5
4100     2
4200     1
4300     1
5200     1
5400     1
5500     1
6000     1
6200     1
6300     1
6500     1
7600     1
Name: City, dtype: int64

Price vs. Distance

It is a common assumption that hotel and Airbnb listings are pricier the closer they are to a city center, especially in a popular tourist destination. Let's see if this also applies to hostels since they are known to be a more a affordable lodging option for travelers from any part of the world.

# Create scatter plot
hostels_reduced.plot.scatter("Distance from City Center (km)", "Min. Price for One Night (yen)", alpha=0.6)
plt.tight_layout()
plt.title("Minimum Price for One Night vs. Distance from City Center")
plt.grid()
plt.savefig("../reports/figures/distance_scatter.png")
plt.show()

It was surprising to see that there were plenty of hostels on the lower end of minimum prices that were close to city centers. It doesn't appear that distance from the city center plays a huge factor to hostel prices.

# Obtain the mean and standard deviation (STD) for minimum price column
price_mean = hostels_reduced.groupby(["City"]).mean()["Min. Price for One Night (yen)"]
price_std = hostels_reduced.groupby(["City"]).std()["Min. Price for One Night (yen)"]
price_mean

City
Fukuoka-City    2736.842105
Hiroshima       2578.571429
Kyoto           2293.589744
Osaka           2391.262136
Tokyo           2769.841270
Name: Min. Price for One Night (yen), dtype: float64

price_std

City
Fukuoka-City     538.733624
Hiroshima        428.195806
Kyoto            718.103517
Osaka            745.457722
Tokyo           1043.246459
Name: Min. Price for One Night (yen), dtype: float64

# Create new data frame for mean values
mean_df = pd.DataFrame({"City":price_mean.index, "Price Mean (yen)":price_mean.values})
mean_df = mean_df.set_index("City")
mean_df

	Price Mean (yen)
City
Fukuoka-City	2736.842105
Hiroshima	2578.571429
Kyoto	2293.589744
Osaka	2391.262136
Tokyo	2769.841270

Average Hostel Cost and Standard Deviation (STD)

This visualization shows a basic overview of the average cost and price distribution to stay at a hostel in each of the listed Japanese cities.

# Create bar chart with y error bar 
mean_df.plot(kind="bar", yerr=price_std.values, color="teal", legend=None)
plt.xticks(rotation=45)
plt.grid()
plt.title("Average Minimum Hostel Cost for One Night")
plt.ylabel("Minimum Hostel Cost (yen)")
plt.savefig("../reports/figures/hostel_avg.png")
plt.show()

# Create new data frame for hostel rating analysis and view # of missing data
sns.set(style="white")
score_df = hostels_df.loc[:, "Summary Score":"Value for Money"]
score_df = score_df.drop(["Rating"], axis=1)
print(score_df.isnull().sum())

Summary Score      15
Atmosphere         15
Cleaniness         15
Facilities         15
Location           15
Security           15
Staff              15
Value for Money    15
dtype: int64

Missing Data

To help determine what to do with missing data, the describe function for pandas was used to give a snapshot of the basic statistics of the data frame.
Since the distribution wasn't highly varied and only a small fraction of the dataset is missing data, dropping the rows with missing data won't affect the following analysis too much.

# Generate descriptive statistics for data frame to help determine what to do with missing data
score_df.describe()

	Summary Score	Atmosphere	Cleaniness	Facilities	Location	Security	Staff	Value for Money
count	327.000000	327.000000	327.000000	327.000000	327.000000	327.000000	327.000000	327.000000
mean	8.782569	8.238838	9.011927	8.597554	8.694801	8.947401	9.133333	8.848318
std	0.960909	1.382002	1.215775	1.285356	1.102703	1.114345	1.086513	1.047809
min	3.100000	2.000000	2.000000	2.000000	2.000000	2.000000	2.000000	4.000000
25%	8.600000	7.800000	8.800000	8.000000	8.000000	8.700000	9.000000	8.600000
50%	9.000000	8.600000	9.300000	9.000000	9.000000	9.200000	9.400000	9.000000
75%	9.400000	9.000000	9.800000	9.300000	9.400000	9.600000	9.800000	9.500000
max	10.000000	10.000000	10.000000	10.000000	10.000000	10.000000	10.000000	10.000000

# Drop missing values
score_df = score_df.dropna().reset_index(drop=True)

Hostel Rating Analysis

Seaborn and Scipy packages were used for the following visualizations and analysis.
To view the relationships between the different rating categories, a Pearsons R score generated from the Seaborn pairplot module was used to make this analysis. Atmosphere, Cleanliness, Facilities, and Value for Money are the categories that have the highest positive correlation with the summary score, positively impacting the summary score the most.

# Use seaborn package to find pairwise relationships in dataset
def corrfunc(x, y, **kws):
    r, _ = stats.pearsonr(x, y)
    ax = plt.gca()
    ax.annotate("R = {:.2f} ".format(r), 
                xy=(0.1, 0.9), xycoords=ax.transAxes)
    
g = sns.pairplot(score_df)
g.map_lower(corrfunc)
g.map_upper(corrfunc)
plt.savefig("../reports/figures/ratings_pairplots.png", bbox_inches="tight")
plt.show()

# Create a different dataset for further analysis
city_analysis = hostels_df.dropna().reset_index(drop=True)
city_analysis = city_analysis.drop(["Rating"], axis=1)

Box Plot Analyzing Hostel Ratings by City

The ratings for hostels in Hiroshima are compactly distributed around the average meaning that the these hostels have the most consistent scores. This data shows that hostels in Hiroshima provide the most consistent service and experience for travelers.

# Create Seaborn boxplots
fig = plt.figure(figsize=(12,18))
fig.subplots_adjust(hspace=1.4, wspace=0.3)
for idx, col in enumerate(city_analysis.columns[3:11]):
    fig.add_subplot(3, 3, idx+1)
    sns.boxplot(x=city_analysis["City"], y=city_analysis[col], data=city_analysis)
    plt.xticks(rotation=90)
plt.savefig("../reports/figures/ratings_boxplots.png", bbox_inches="tight")
plt.show()

Word Cloud of Approval

A word cloud was generated to create a visualization using the word ratings.

# Drop all NaN values
hostels_df = hostels_df.dropna().reset_index(drop=True)
print(hostels_df.isnull().sum())

Hostel Name                       0
City                              0
Min. Price for One Night (yen)    0
Summary Score                     0
Rating                            0
Atmosphere                        0
Cleaniness                        0
Facilities                        0
Location                          0
Security                          0
Staff                             0
Value for Money                   0
Longitude                         0
Latitude                          0
Distance from City Center (km)    0
dtype: int64

# Create list from column values
rating_list = hostels_df["Rating"].tolist()
rating_list

['Superb',
 'Fabulous',
 'Very Good',
 'Superb',
 'Very Good',
 'Superb',
 'Very Good',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Very Good',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Rating',
 'Very Good',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Very Good',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Very Good',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Very Good',
 'Superb',
 'Superb',
 'Very Good',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Rating',
 'Superb',
 'Superb',
 'Very Good',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Good',
 'Fabulous',
 'Fabulous',
 'Good',
 'Fabulous',
 'Good',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Very Good',
 'Very Good',
 'Fabulous',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Very Good',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Rating',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Very Good',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Very Good',
 'Superb',
 'Fabulous',
 'Good',
 'Very Good',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Rating',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Rating',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Fabulous',
 'Rating',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Good',
 'Superb',
 'Superb',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Good',
 'Very Good',
 'Very Good',
 'Fabulous',
 'Fabulous',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Good',
 'Fabulous',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Superb',
 'Fabulous',
 'Fabulous',
 'Superb',
 'Fabulous',
 'Fabulous']

# Convert list to one big string
rating_list = " ".join(rating_list)
rating_list

'Superb Fabulous Very Good Superb Very Good Superb Very Good Superb Fabulous Superb Superb Superb Superb Superb Fabulous Very Good Fabulous Fabulous Superb Fabulous Fabulous Superb Superb Superb Fabulous Fabulous Superb Superb Superb Superb Superb Superb Superb Fabulous Fabulous Superb Superb Fabulous Superb Fabulous Fabulous Superb Fabulous Superb Superb Fabulous Superb Superb Fabulous Fabulous Superb Fabulous Fabulous Superb Superb Rating Very Good Superb Fabulous Superb Superb Superb Fabulous Superb Superb Superb Superb Fabulous Superb Fabulous Superb Fabulous Fabulous Superb Superb Superb Fabulous Fabulous Very Good Superb Fabulous Superb Fabulous Superb Superb Fabulous Fabulous Fabulous Superb Superb Very Good Superb Superb Fabulous Superb Fabulous Superb Superb Superb Superb Fabulous Superb Fabulous Very Good Superb Superb Very Good Superb Superb Superb Superb Superb Fabulous Superb Fabulous Superb Superb Superb Superb Superb Rating Superb Superb Very Good Superb Superb Superb Superb Superb Superb Fabulous Superb Fabulous Fabulous Superb Superb Fabulous Superb Fabulous Fabulous Superb Good Fabulous Fabulous Good Fabulous Good Superb Fabulous Superb Fabulous Superb Fabulous Superb Superb Fabulous Fabulous Superb Superb Superb Superb Superb Superb Fabulous Very Good Very Good Fabulous Superb Superb Fabulous Superb Superb Superb Superb Superb Superb Fabulous Fabulous Superb Superb Superb Superb Superb Fabulous Fabulous Superb Fabulous Very Good Superb Fabulous Superb Fabulous Superb Superb Superb Fabulous Fabulous Superb Rating Superb Superb Superb Superb Superb Very Good Superb Superb Superb Superb Superb Superb Superb Superb Fabulous Superb Fabulous Fabulous Fabulous Fabulous Superb Superb Superb Superb Fabulous Very Good Superb Fabulous Good Very Good Fabulous Superb Superb Superb Superb Superb Rating Superb Superb Superb Fabulous Rating Fabulous Fabulous Superb Superb Fabulous Fabulous Superb Superb Fabulous Rating Superb Fabulous Fabulous Superb Good Superb Superb Fabulous Superb Fabulous Fabulous Superb Superb Superb Superb Superb Superb Superb Superb Superb Superb Superb Superb Fabulous Fabulous Superb Good Very Good Very Good Fabulous Fabulous Fabulous Fabulous Superb Good Fabulous Superb Superb Superb Superb Superb Superb Fabulous Fabulous Superb Fabulous Fabulous'

# Create word cloud
wordcloud = WordCloud().generate(rating_list)
wordcloud = WordCloud(background_color="white", max_words=len(rating_list), max_font_size=100, relative_scaling=0.5).generate(rating_list)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.savefig("../reports/figures/ratings_wordcloud.png")
plt.show()

This word cloud simply depicts travelers' approval and satisfaction with hostels in Japan.

Conclusion

It is clear that hostels in Japan are a travel housing option that shouldn't be overlooked. Their close proximity to city centers, low cost, value for your money, and customer satisfaction are all more than enough reason for any traveler to look into, whether they're on a budget or not. Happy traveling.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
notebooks		notebooks
raw_data		raw_data
reports/figures		reports/figures
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Exploratory Data Analysis (EDA) on Hostels in Japan using Python

Data Source

Reading CSV File and Data Clean-Up

Basic Visualizations using Matplotlib Charts

Outliers

Price vs. Distance

Average Hostel Cost and Standard Deviation (STD)

Missing Data

Hostel Rating Analysis

Box Plot Analyzing Hostel Ratings by City

Word Cloud of Approval

Conclusion

About

Uh oh!

Releases

Packages

Languages

lenatran/japan_hostels

Folders and files

Latest commit

History

Repository files navigation

Exploratory Data Analysis (EDA) on Hostels in Japan using Python

Data Source

Reading CSV File and Data Clean-Up

Basic Visualizations using Matplotlib Charts

Outliers

Price vs. Distance

Average Hostel Cost and Standard Deviation (STD)

Missing Data

Hostel Rating Analysis

Box Plot Analyzing Hostel Ratings by City

Word Cloud of Approval

Conclusion

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages