This Python script scrapes product information from Walmart's website using a list of search queries. It extracts details such as price, review count, rating, and product descriptions, saving the data in a JSONL file.
- Scrapes product information from Walmart's search results
- Uses proxy rotation to avoid IP blocks
- Implements error handling and retries for robust scraping
- Saves product data in JSONL format
- Python 3.6+
- Required Python packages:
- requests
- beautifulsoup4
- python-dotenv
- Clone this repository or download the script.
- Install the required packages:
pip install requests beautifulsoup4 python-dotenv
- Create a
.env
file in the same directory as the script with your Bright Data credentials:
BRD_USERNAME=your_username
BRD_PASSWORD=your_password
Run the script with:
python walmart_scraper.py
The script will start scraping product information based on the predefined search queries. The results will be saved in product_info.jsonl
in the same directory.
- Modify the
search_queries
list to change or add search terms. - Adjust the
BASE_HEADERS
dictionary if you need to update the user agent or other headers. - The script is set to scrape up to 99 pages per search query. You can modify this limit in the
main()
function.
The script generates a JSONL file named product_info.jsonl
. Each line in this file is a JSON object containing information about a single product, including:
- Price
- Review count
- Item ID
- Average rating
- Product name
- Brand
- Availability
- Image URL
- Short description
This script uses proxy servers from Bright Data. Ensure you have an active subscription and correct credentials in your .env
file.
Web scraping may be against the terms of service of some websites. Use this script responsibly and ensure you have permission to scrape the target website.
Code corresponding to my recent video on web scraping with Python BeautifulSoup
At certain parts of the video, I reference accessing the code at different stages of completeness.
Here are the files that you may be looking for:
Make sure to add a .env file when you start using the Bright Data proxies!
Shout out to Bright Data for sponsoring this video, get started using this link!