Skip to content

Latest commit

 

History

History
40 lines (26 loc) · 1.59 KB

README.md

File metadata and controls

40 lines (26 loc) · 1.59 KB

News_Scraper

Scraping news from inshorts websites using Python SeleniumBase.

Setup project environment with virtualenv and pip:

  • Install virtualenv using pip install --user pipenv.
  • Enter virtualenv using pipenv shell
  • Run pipenv install -r requirements.txt
  • Install pip dependencies inside the virtualenv : pipenv install dep==

Install SeleniumBase:

Features:

  • Collect NEWS from inshorts

    news_sample

  • Collect NEWS Heading, Content, Author of different categories like World, Sports, Science, Politics etc

    category

Run News Scrapper:

  • Open config.yml

    • file_name: Define csv filename to strore news data # eg. news_with_category.csv
    • url_file_name: Define csv filename to news URL # eg. url_file.csv
    • inshort_url: Set inshorts URL # https://inshorts.com/en/read
  • Run pytest test_news_scrapper.py -s --headless

  • CSV file will have following columns

    • [title, content, author, url, category]

news

  • CSV files will be stored inside dataset folder