Scrapy Articles and Flask Integration Project with BigQuery

This project integrates a Scrapy spider with a Flask API to scrape articles, store them in Google BigQuery, and provide a search functionality through a REST API.

Prerequisites

Python 3.x
Google Cloud SDK (with BigQuery API enabled)
Create a project/dataset/table in BigQuery
Google Cloud credentials JSON file

Setup

Clone the repository:

git clone https://github.com/Frankson18/scrapy_articles
cd scrapy_articles

Create and activate a virtual environment:

source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the dependencies:
```
pip install -r requirements.txt
```
Configure environment variables: Update the following variables in settings.py and app.py with your actual Google Cloud project details:
- BIGQUERY_PROJECT_ID
- BIGQUERY_DATASET_ID
- BIGQUERY_TABLE_ID
- GOOGLE_APPLICATION_CREDENTIALS (path to your JSON key file)

Running the Scrapy Spider

Navigate to the project directory:
```
cd articlescraper
```
Run the Scrapy spider:
```
scrapy crawl newsscrapper
```

Running the Flask API

Set environment variables and run the Flask app:

export FLASK_APP=app.py
export FLASK_ENV=development
flask run

On Windows (Command Prompt):

set FLASK_APP=app.py
set FLASK_ENV=development
flask run

Access the API:

 http://127.0.0.1:5000/search?q=exemple

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
articlescraper		articlescraper
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scrapy Articles and Flask Integration Project with BigQuery

Prerequisites

Setup

Running the Scrapy Spider

Running the Flask API

Results

Scrapy results

Api results

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Frankson18/scrapy_articles

Folders and files

Latest commit

History

Repository files navigation

Scrapy Articles and Flask Integration Project with BigQuery

Prerequisites

Setup

Running the Scrapy Spider

Running the Flask API

Results

Scrapy results

Api results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages