Skip to content

A collection of 15 detailed data science projects which were completed in a span of 4 months.

Notifications You must be signed in to change notification settings

sl2902/practicum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

practicum

Project description

Project Description
Sprint1 Prepare a report for a bank’s loan division. You’ll need to find out if a customer’s marital status and number of children have an impact on whether they will default on a loan. The bank already has some data on customers’ credit worthiness. Your report will be considered when building a credit score for a potential customer. A credit score is used to evaluate the ability of a potential borrower to repay their loan.
Sprint2 You're an analyst at Crankshaft List. Hundreds of free advertisements for vehicles are published on your site every day. You need to study data collected over the last few years and determine which factors influence the price of a vehicle.
Sprint3 You work as an analyst for the telecom operator Megaline. The company offers its clients two prepaid plans, Surf and Ultimate. The commercial department wants to know which of the plans brings in more revenue in order to adjust the advertising budget. You are going to carry out a preliminary analysis of the plans based on a relatively small client selection. You'll have the data on 500 Megaline clients: who the clients are, where they're from, which plan they use, and the number of calls they made and text messages they sent in 2018. Your job is to analyze clients' behavior and determine which prepaid plan brings in more revenue.
Sprint4 You work for the online store Ice, which sells video games all over the world. User and expert reviews, genres, platforms (e.g. Xbox or PlayStation), and historical data on game sales are available from open sources. You need to identify patterns that determine whether a game succeeds or not. This will allow you to spot potential big winners and plan advertising campaigns. In front of you is data going back to 2016. Let’s imagine that it’s December 2016 and you’re planning a campaign for 2017. (The important thing is to get experience working with data. It doesn't really matter whether you're forecasting 2017 sales based on data from 2016 or 2017 sales based on data from 2016.) The dataset contains the abbreviation ESRB. The Entertainment Software Rating Board evaluates a game's content and assigns an age rating such as Teen or Mature.
Sprint5 You're working as an analyst for Zuber, a new ride-sharing company that's launching in Chicago. Your task is to find patterns in the available information. You want to understand passenger preferences and the impact of external factors on rides. Working with a database, you'll analyze data from competitors and test a hypothesis about the impact of weather on ride frequency.
Sprint6 Mobile carrier Megaline has found out that many of their subscribers use legacy plans. They want to develop a model that would analyze subscribers' behavior and recommend one of Megaline's newer plans: Smart or Ultra. You have access to behavior data about subscribers who have already switched to the new plans (from the project for the Statistical Data Analysis course). For this classification task, you need to develop a model that will pick the right plan. Since you’ve already performed the data preprocessing step, you can move straight to creating the model. Develop a model with the highest possible accuracy. In this project, the threshold for accuracy is 0.75. Check the accuracy using the test dataset.
Sprint7 Beta Bank customers are leaving: little by little, chipping away every month. The bankers figured out it’s cheaper to save the existing customers rather than to attract new ones. We need to predict whether a customer will leave the bank soon. You have the data on clients’ past behavior and termination of contracts with the bank. Build a model with the maximum possible F1 score. To pass the project, you need an F1 score of at least 0.59. Check the F1 for the test set. Additionally, measure the AUC-ROC metric and compare it with the F1.
Sprint8 You work for the OilyGiant mining company. Your task is to find the best place for a new well. Steps to choose the location: 1) Collect the oil well parameters in the selected region: oil quality and volume of reserves. 2) Build a model for predicting the volume of reserves in the new wells; 3) Pick the oil wells with the highest estimated values. 4) Pick the region with the highest total profit for the selected oil wells. You have data on oil samples from three regions. Parameters of each oil well in the region are already known. Build a model that will help to pick the region with the highest profit margin. Analyze potential profit and risks using the Bootstrapping technique.
Sprint9 Prepare a prototype of a machine learning model for Zyfra. The company develops efficiency solutions for heavy industry. The model should predict the amount of gold recovered from gold ore. You have the data on extraction and purification. The model will help to optimize the production and eliminate unprofitable parameters. You need to: 1) Prepare the data; 2) Perform data analysis; 3) Develop and train a model.
Sprint10 The Sure Tomorrow insurance company wants to solve several tasks with the help of Machine Learning, and you are asked to evaluate that possibility. 1) Task 1: Find customers who are similar to a given customer. This will help the company's agents with marketing. 2) Task 2: Predict whether a new customer is likely to receive an insurance benefit. Can a prediction model do better than a dummy model? 3) Task 3: Predict the number of insurance benefits a new customer is likely to receive using a linear regression model. 4) Task 4: Protect clients' personal data without breaking the model from the previous task.
Sprint11 Rusty Bargain used car sales service is developing an app to attract new customers. In that app, you can quickly find out the market value of your car. You have access to historical data: technical specifications, trim versions, and prices. You need to build the model to determine the value. Rusty Bargain is interested in: 1) the quality of the prediction. 2) the speed of the prediction. 3) the time required for training
Sprint12 Sweet Lift Taxi company has collected historical data on taxi orders at airports. To attract more drivers during peak hours, we need to predict the amount of taxi orders for the next hour. Build a model for such a prediction. The RMSE metric on the test set should not be more than 48.
Sprint13 The Film Junky Union, a new edgy community for classic movie enthusiasts, is developing a system for filtering and categorizing movie reviews. The goal is to train a model to automatically detect negative reviews. You'll be using a dataset of IMBD movie reviews with polarity labelling to build a model for classifying positive and negative reviews. It will need to reach an F1 score of at least 0.85.
Sprint14 The supermarket chain Good Seed would like to explore whether Data Science can help them adhere to alcohol laws by making sure they do not sell alcohol to people underage. You are asked to conduct that evaluation, so as you set to work, keep the following in mind: 1) The shops are equipped with cameras in the checkout area which are triggered when a person is buying alcohol. 2) Computer vision methods can be used to determine age of a person from a photo. 3) The task then is to build and evaluate a model for verifying people's age.
Sprint15 The telecom operator Interconnect would like to be able to forecast their churn of clients. If it's discovered that a user is planning to leave, they will be offered promotional codes and special plan options. Interconnect's marketing team has collected some of their clientele's personal data, including information about their plans and contracts. ### Interconnect's services. Interconnect mainly provides two types of services: 1. Internet. The network can be set up via a telephone line (DSL, digital subscriber line) or through a fiber optic cable. a. Landline communication. The telephone can be connected to several lines simultaneously. Some other services the company provides include: i) Internet security: antivirus software (DeviceProtection) and a malicious website blocker (OnlineSecurity). ii) A dedicated technical support line (TechSupport). iii) Cloud file storage and data backup (OnlineBackup. iv) TV streaming (StreamingTV) and a movie directory (StreamingMovies). The clients can choose either a monthly payment or sign a 1- or 2-year contract. They can use various payment methods and receive an electronic invoice after a transaction.

About

A collection of 15 detailed data science projects which were completed in a span of 4 months.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published