Skip to content

To import data from multiple sources, clean and wrangle data, perform exploratory data analysis (EDA), and create meaningful data visualizations. I will then predict future trends from data by developing linear, multiple, polynomial regression models & pipelines and learn how to evaluate them.

License

PramodRawat157/Data-Analysis-with-Python---IBM-Data-Science

Repository files navigation

Header

Data Analysis with Python

📄 Summary

This course involves using Python to explore many different types of data. It covers how to prepare data for analysis, perform simple statistical analysis, create meaningful data visualizations, predict future trends from data, and more. It concludes with a final assignment predicting of the market prices of houses based on a detailed dataset. Each notebook here is incredibly detailed, and they collectively show the full process of predictive analysis. Some topics, such as data wrangling, have additional associated notebooks, due to the breadth of content covered in this course.

📑 Main Topics

  • Importing datasets

    • Understanding the data
    • Importing and exporting data in Python
  • Working with different file format

    • Thus, it is mandatory for any data scientist (or data engineer) to be aware of different file formats, common challenges in handling them and the best, most efficient ways to handle this data in real life.
    • There are various formats for a dataset, .csv, .json, .xlsx etc. The dataset can be stored in different places, on your local machine or sometimes online.
  • Data wrangling

    • Identifying and handling missing values
    • Data formatting
    • Data normalization
    • Binning
    • Indicator variables
  • Exploratory Data Analysis

    • Summarizing main characteristics of the data
    • Gaining better understanding of the data set
    • Uncovering relationships between the variables
    • Extracting important variables
  • Model Development

    • Simple and Multiple Linear Regression
    • Model Evaluation Using Visualization
    • Polynomial Regression and Pipelines
    • R-squared and MSE for In-Sample Evaluation
    • Prediction and Decision Making
  • Model Evaluation and Refinement

    • Over-fitting, under-fitting and model selection
    • Ridge regression
    • GridSearch
    • Model refinement
  • Auto_EDA_Dataprep

    • DataPrep is an open-source library available for python that lets you prepare your data using a single library with only a few lines of code.
    • DataPrep can be used to address multiple data-related problems, and the library provides numerous features through which every problem can be solved and taken care of.

🔑 Key Skills Learned

  • Using Pandas, Numpy and Scipy libraries for data manipulation
  • Using Scikit-Learn to build smart models and make predictions
  • Building machine learning regression models
  • Building data pipelines

🛠️ Tools

The following tools were used to complete this certification:

(Python, Jupyter, GitHub, IBM Watson Studio, IBM Cloud Pak)

📖 Libraries

The following Python libraries were used throughout the certification:


cognitiveclass.ai logo

🏆 Certificates

To verify the certificates, click the images to follow the links.

Data Analysis with Python Issued by Coursera Authorized by IBM This badge earner has the core skills in Data Analysis using Python. They can readily clean, visualize and summarize data using Pandas. Using Scikit-learn, the earner can develop Data Pipelines, construct Machine learning models for Regression and evaluate these models.

About

To import data from multiple sources, clean and wrangle data, perform exploratory data analysis (EDA), and create meaningful data visualizations. I will then predict future trends from data by developing linear, multiple, polynomial regression models & pipelines and learn how to evaluate them.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published