Skip to content

Latest commit

 

History

History
65 lines (40 loc) · 2.13 KB

README.rst

File metadata and controls

65 lines (40 loc) · 2.13 KB

Welcome to ECProduct's documentation!

ECProduct is the spider system for Electronic Commerce System!

It can get the info of specific product, shop(market), or category from index page, search page, product page, shop page or category page.

Please refer to the directory 'docs'(index file) for more infomation.

Configration of ECProduct

  1. Set up Database

    1. Install mariaDB server:

      # apt install mariadb-server
      
    2. Create a user and database:

      # mysql -u root
      MariaDB [(none)]> CREATE DATABASE ecproduct CHARACTER SET utf8;
      MariaDB [(none)]> CREATE USER 'ecproduct'@'localhost' IDENTIFIED BY 'ecproduct@pwd';
      MariaDB [(none)]> GRANT ALL PRIVILEGES ON ecproduct.* TO 'ecproduct'@'localhost';
      
    3. Import database data:

      $ mysql -u ecproduct -pecproduct@pwd ecproduct < database/platform.sql
      mysql -u ecproduct -pecproduct@pwd ecproduct < database/ecproduct.sql
      
    4. Modify settings.py:

      MYSQL_HOST = 'localhost'
      MYSQL_USERNAME = 'ecproduct'
      MYSQL_PASSWORD = 'ecproduct@pwd'
      MYSQL_DATABASE = 'ecproduct'
      MYSQL_CHARSET = 'utf8'
      
  2. Create the spider for electronic commerce web site:

    • The directory of the spider: <project_home_dir>/ecproduct/spiders/
    • The file name of the spider: <electronic commerce web site's domain name>.py
  3. Create input data for spider:

    • For test environment, the input data file: <project_home_dir>/input/<site's domain name>_url_test.txt
    • For product environment, the input data file: <project_home_dir>/input/<site's domain name>_url.txt

    In the file, write one url per line, and can comment it with hash '#'. Input the corresponding page's url for the specific spider.

  4. Run the spider:

    • For test environment:

      $ python main.py vvic product product -f test
      
    • For product environment:

      $ python main.py vvic product product -f product
      
    • For specific spider:

      $ scrapy crawl jd -a url=https://www.jd.com/allSort.aspx -a entrance_page=category -a data_type=category -o output/jd.jl