Skip to content

This Python script scrapes product information from Walmart's website using a list of search queries. It extracts details such as price, review count, rating, and product descriptions, saving the data in a JSONL file.

License

Notifications You must be signed in to change notification settings

memocappa/walmart-product-scraper-advanced

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Walmart Product Scraper

This Python script scrapes product information from Walmart's website using a list of search queries. It extracts details such as price, review count, rating, and product descriptions, saving the data in a JSONL file.

Features

  • Scrapes product information from Walmart's search results
  • Uses proxy rotation to avoid IP blocks
  • Implements error handling and retries for robust scraping
  • Saves product data in JSONL format

Requirements

  • Python 3.6+
  • Required Python packages:
    • requests
    • beautifulsoup4
    • python-dotenv

Setup

  1. Clone this repository or download the script.
  2. Install the required packages:
pip install requests beautifulsoup4 python-dotenv
  1. Create a .env file in the same directory as the script with your Bright Data credentials:
BRD_USERNAME=your_username
BRD_PASSWORD=your_password

Usage

Run the script with:

python walmart_scraper.py

The script will start scraping product information based on the predefined search queries. The results will be saved in product_info.jsonl in the same directory.

Configuration

  • Modify the search_queries list to change or add search terms.
  • Adjust the BASE_HEADERS dictionary if you need to update the user agent or other headers.
  • The script is set to scrape up to 99 pages per search query. You can modify this limit in the main() function.

Output

The script generates a JSONL file named product_info.jsonl. Each line in this file is a JSON object containing information about a single product, including:

  • Price
  • Review count
  • Item ID
  • Average rating
  • Product name
  • Brand
  • Availability
  • Image URL
  • Short description

Note

This script uses proxy servers from Bright Data. Ensure you have an active subscription and correct credentials in your .env file.

Disclaimer

Web scraping may be against the terms of service of some websites. Use this script responsibly and ensure you have permission to scrape the target website.

Advanced Web Scraping Tutorial From the Creator


Code corresponding to my recent video on web scraping with Python BeautifulSoup

About/Navigation

At certain parts of the video, I reference accessing the code at different stages of completeness.

Here are the files that you may be looking for:

Make sure to add a .env file when you start using the Bright Data proxies!

Shout out to Bright Data for sponsoring this video, get started using this link!

About

This Python script scrapes product information from Walmart's website using a list of search queries. It extracts details such as price, review count, rating, and product descriptions, saving the data in a JSONL file.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%