Walmart Product Scraper

This Python script scrapes product information from Walmart's website using a list of search queries. It extracts details such as price, review count, rating, and product descriptions, saving the data in a JSONL file.

Features

Scrapes product information from Walmart's search results
Uses proxy rotation to avoid IP blocks
Implements error handling and retries for robust scraping
Saves product data in JSONL format

Requirements

Python 3.6+
Required Python packages:
- requests
- beautifulsoup4
- python-dotenv

Setup

Clone this repository or download the script.
Install the required packages:

pip install requests beautifulsoup4 python-dotenv

Create a .env file in the same directory as the script with your Bright Data credentials:

BRD_USERNAME=your_username
BRD_PASSWORD=your_password

Usage

Run the script with:

python walmart_scraper.py

The script will start scraping product information based on the predefined search queries. The results will be saved in product_info.jsonl in the same directory.

Configuration

Modify the search_queries list to change or add search terms.
Adjust the BASE_HEADERS dictionary if you need to update the user agent or other headers.
The script is set to scrape up to 99 pages per search query. You can modify this limit in the main() function.

Output

The script generates a JSONL file named product_info.jsonl. Each line in this file is a JSON object containing information about a single product, including:

Price
Review count
Item ID
Average rating
Product name
Brand
Availability
Image URL
Short description

Note

This script uses proxy servers from Bright Data. Ensure you have an active subscription and correct credentials in your .env file.

Disclaimer

Web scraping may be against the terms of service of some websites. Use this script responsibly and ensure you have permission to scrape the target website.

Advanced Web Scraping Tutorial From the Creator

Code corresponding to my recent video on web scraping with Python BeautifulSoup

About/Navigation

At certain parts of the video, I reference accessing the code at different stages of completeness.

Here are the files that you may be looking for:

Initial Run - 24:50 in video
Improvements - 27:19 in video
Final Code w/ Proxies

Make sure to add a .env file when you start using the Bright Data proxies!

Shout out to Bright Data for sponsoring this video, get started using this link!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Walmart Product Scraper

Features

Requirements

Setup

Usage

Configuration

Output

Note

Disclaimer

Advanced Web Scraping Tutorial From the Creator

About/Navigation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Walmart Product Scraper

Features

Requirements

Setup

Usage

Configuration

Output

Note

Disclaimer

Advanced Web Scraping Tutorial From the Creator

About/Navigation