This is a small project to extract the insights from the AirBnB dataset.
http://data.insideairbnb.com/united-kingdom/england/london/2019-07-10/data/listings.csv.gz
The dataset consists of 83850 dimensions and 106 features from 3 countries as seen from the dataset - UK/Spain/France.
Getting a feel of the data, Exploring features, finding the Nan values from the dataset, finding t=out which are the independent variables and which are the dependent variables. I tried to analyse the number of Customer Accomodations, Analysing Reviews, Analysing Room & Property Types and Bed types and used visualization techniques to explore the data.
In this notebook, I tried to utilize the features like the neighbourhood, the target variable 'Price', and creating a wordcloud from the summary of the listings.
Here I tried to use the summary feature to use text timing using TfidfTransformer and using K nearest neighbours. Although a better analysis could have been done if review summary was availible at customer level instead.
Only selective features were extracted from the original listings to form a transformed dataframe. now this was splitted into 80-20 Train test ratio. Now different machine learning algorithms are applied like Linear regression, Logistic Regression, Ridge Regression, SVM, Decision tree, Random Forest.
Now a confusion matrix was made and the error was calculated which demonstrated SVM with maximum accuracy.