Gsmarena Scrapy

In this project, BeautifulSoup Library is utilised to extract phone, tablet, or other electronic device from gsmarena website by vendor. It could allow user to get noticed what the latest electronic device is. The information that is extracted includes vendor, product, its specification and its release date. In the end, output.txt will be the output file for user for further purpose.

Bascially, this project let me learn how the BeautifulSoup access and read information from the HTML. Learning by doing. In addition, when working on scraping, 'Visiting too much' problem is solved by adding Headers.

The idea behind it:

1. Visit main page
1. Find Vendor List
1. For each vendor
1. Find its product list
1. For each product, collect all its specification

Table of Contents

[Getting Started]
- [Prerequisites]
- [Installing]
  - [Libraries]
[Authors]
[License]

Getting Started

Requires:

Python 3.8.8
BeautifulSoup
json
pandas
requests
urllib

Prerequisites

    Tools Required:
    Visual Studio or Pycharm (Any IDE could run Python)

Installing

A few libraries needed to install to ensure that the code could run.

Say what the step will be

    pip install bs4
    pip install pandas
    pip install requests

Clone the repository

 git clone https://github.com/JamesLi197412/web-scrapying.git

Run the code

    python3 main.py

Crawler Object Structure

Versioning

Github/Git are used for versioning/sharing.

Authors

James Li

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.idea		.idea
__pycache__		__pycache__
output		output
.DS_Store		.DS_Store
README.md		README.md
code structure.png		code structure.png
crawler.py		crawler.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gsmarena Scrapy

Getting Started

Prerequisites

Installing

Crawler Object Structure

Versioning

Authors

License

About

Releases

Packages

Languages

JamesLi197412/web-scrapying-Gsmarena

Folders and files

Latest commit

History

Repository files navigation

Gsmarena Scrapy

Getting Started

Prerequisites

Installing

Crawler Object Structure

Versioning

Authors

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages