This project is a web scraping and data analysis tool designed to extract product information from Jumia Ghana. It includes functionality for scraping Black Friday deals and searching for specific products, with features to visualize and analyze the collected data.
- Black Friday Scraper: Automatically navigates through all pages in the Black Friday catalog and collects product details.
- Product Search: Allows users to search for specific products (e.g., "washing machine") and scrape all available results.
- Data Visualization:
- Bar charts showing the top 10 cheapest products.
- Histograms of product price distribution.
- Data Export: Saves the collected data into a CSV file for further analysis.
- Python: The primary programming language.
- Selenium: For web scraping dynamic content.
- Pandas: For data manipulation and analysis.
- Matplotlib: For data visualization.
- Python (Version 3.7 or later)
- Microsoft Edge WebDriver: Ensure it's installed and matches your Edge browser version.
- Download Microsoft Edge WebDriver
- Add the WebDriver to your system's PATH.
- Required Python Libraries:
Install the necessary dependencies by running:
pip install pandas matplotlib selenium
-
Clone this repository:
git clone https://github.com/Hdiaktoros/jumia-product-scraper.git cd jumia-product-scraper -
Ensure you have the correct version of Microsoft Edge WebDriver installed.
-
Run the project in Jupyter Notebook or directly from a Python script.
To scrape all Black Friday deals:
scrape_black_friday_products()The script will:
- Scrape product details across all pages.
- Sort the data by price (ascending).
- Save the results to
black_friday_sorted_products.csv. - Display data visualizations (e.g., bar charts for the cheapest products).
To search for specific products (e.g., "washing machine"):
search_products("washing machine")The script will:
- Scrape product details for the search query.
- Sort the data by price (ascending).
- Save the results to
washing_machine_sorted_products.csv. - Display data visualizations.
- Name: The product's name.
- Current Price: The current price of the product (GH₵).
- Initial Price: The original price before any discounts (GH₵).
- Discount: The percentage discount applied.
- Reviews: The number of reviews for the product.
- Stars: The product's star rating.
- URL: The link to the product page.
- Top 10 Cheapest Products:
- A horizontal bar chart showing the cheapest products.
- Price Distribution:
- A histogram showing the distribution of product prices.
- Handles missing data gracefully by assigning default values.
- Automatically stops scraping if no more products are found.
Contributions are welcome! If you'd like to enhance the project, feel free to submit a pull request or open an issue.
- Fork the repository.
- Create your feature branch:
git checkout -b feature/your-feature
- Commit your changes:
git commit -m "Add your feature" - Push to the branch:
git push origin feature/your-feature
- Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or feedback, feel free to contact:
- Emmanuel Frimpong Asante
- Email: [email protected]
- GitHub: Hdiaktoros

