Skip to content

Web scraper for Wildberries + simple vectorization/multimodal embedding workflow

Notifications You must be signed in to change notification settings

ivanovsdesign/information_retrieval

Repository files navigation

Information Retrieval Project 📚🔍

GitHub stars GitHub forks GitHub issues

Welcome to the Information Retrieval repository! This project focuses on web scraping from Wildberries and implementing advanced techniques for content vectorization and multimodal embeddings.

🌟 Features

  • Wildberries Scraper: Utilizes web scraping techniques to extract data from Wildberries, as detailed in wb_scraper.ipynb.
  • Content Vectorization: Implements methods to convert textual content into numerical vectors for machine learning.
  • Multimodal Embeddings: Creates embeddings that combine different types of data (text, images, etc.) for richer representations.

🛠️ Getting Started

To get started with the Information Retrieval project, follow these steps:

  1. Clone the Repository:
    git clone https://github.com/ivanovsdesign/information_retrieval.git
    
  2. Navigate to the Project Directory:
cd information_retrieval
  1. Explore the Notebooks:

    • Open wb_scraper.ipynb to learn how to scrape data from Wildberries.

    • Open wb_content_vect_colab.ipynb to understand the workflow for content vectorization and creating multimodal embeddings.

📜 Disclaimer

This project is intended for educational and research purposes. The author and contributors do not condone or support the misuse of this scraper to violate the terms of service of Wildberries. Users are solely responsible for ensuring their use of this tool complies with all applicable laws and terms of service.

🤝 Contributing

Contributions are welcome! Please read the CONTRIBUTING.md for details on how to contribute to this project.

📄 License

This project is licensed under the MIT License.

📬 Contact

For questions or feedback, please open an issue on GitHub.

🌈 Thank you for visiting the repository! If you find this project helpful, please consider starring it to show your support. Happy coding! 🚀

About

Web scraper for Wildberries + simple vectorization/multimodal embedding workflow

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published