In this project, BeautifulSoup Library is utilised to extract phone, tablet, or other electronic device from gsmarena website by vendor. It could allow user to get noticed what the latest electronic device is. The information that is extracted includes vendor, product, its specification and its release date. In the end, output.txt will be the output file for user for further purpose.
Bascially, this project let me learn how the BeautifulSoup access and read information from the HTML. Learning by doing. In addition, when working on scraping, 'Visiting too much' problem is solved by adding Headers.
The idea behind it:
-
- Visit main page
-
- Find Vendor List
-
- For each vendor
-
- Find its product list
-
- For each product, collect all its specification
Table of Contents
- [Getting Started]
- [Prerequisites]
- [Installing]
- [Libraries]
- [Authors]
- [License]
Requires:
- Python 3.8.8
- BeautifulSoup
- json
- pandas
- requests
- urllib
Tools Required:
Visual Studio or Pycharm (Any IDE could run Python)
A few libraries needed to install to ensure that the code could run.
Say what the step will be
pip install bs4
pip install pandas
pip install requests
- Clone the repository
git clone https://github.com/JamesLi197412/web-scrapying.git
Run the code
python3 main.py
Github/Git are used for versioning/sharing.
- James Li
This project is licensed under the MIT License - see the LICENSE.md file for details