Data Collection :
1: [100 movie id fetch using API and download as csv]
1 : [100 data pt.s for given 22 cols fetched using API with condition budget > 0 and download as csv]
Data Pre-processing - DATA CLEANING, TRANSFORMATION, VISUALISATION
Data cleaning ::
- change column names : homepage - movie_url, original_title - movie_title
2)Finding and removing duplicate values(dataframes)
- checking of any null or empty values in dataframes and try to fetch and add that missing data using movie id from tmdb api's
4)check for any dataframes that has revenue as 0, average the revenue for the rest of dataframes and add it.
- finally convert that processed output(dataframes) to csv format
Data Visualisation::
- Create six different types of graphs using matplotlib and seaborn. Each graph will use a different type of visualization:
Data TRANSFORMATION::
- convert release_date(string) to release_day,release_date,release_month(seperate columns,integer format)
- use one hot encoding for applicable columns(genres,spoken_languages,production_countries)
- convert to category datatype and to numerical form using pandas library for for applicable columns (status,original_language)
code file upload to github as - tmdb(2nd milestone week 4).ipynb output csv file upload to google drive as - tmdb_output(2nd milestone week 4).csv
** TEST DATA CLEANING** ** TEST DATA TRANSFORMATION** ** TRAINING DATA PREDICTION** ** TEST DATA PREDICTION**
Q1* take movie_id,revenue and prediction_label columns from the prediction you did for training data and convert into csv file(if u have split the train data remove it)
Q2* read test data csv that is shared by Mentor
Q3* do data normalization for test data
Q4* check for any missing columns in test data(compare with training data) and add missing columns with null values
Q5* take movie_id, prediction_label columns from the prediction you did for test data and convert into csv file
Q6* upload these two csv files to google drive and to github - ipynb file
UI CREATION
USER TEST DATA ENTERED, PREDICTION BY MODEL ON WEBPAGE
Q1* Create a dict using test data, in vscode do predicion using pkl file [having saved model].
Q2* UI CREATION - using flask, streamlit, pkl saved model - in vscode. And finally do prediction using entered data in UI.
Documentation creation
Project presentation - group task
MI Internship - Infosys Springboard - SY