This is a Pyhton based web scraping project for machine learning portfolio. This project is associated with https://books.toscrape.com/ website which is specially design for training web scraping.
In this project I have built a mechanism to collect information about every book in the website, scraping through pagination.
Finally, I have made some conclusions about the data I have collected, using graphical representations like charts and graphs.
There are two main libraries that I have used for this project.
# this will install 'requests' library
!pip install requests
# this command will install BeutifulSoup4
!pip install bs4
- Usage of Requests library to extract text content from a webpage.
- Usage of BeautifulSoup4 library to filter-out the data we need from html/xml content.
- Preparing a DataFrame using extracted data from webpage
- Deriving some conclusions from scraped data