-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Code Addition Request]: Automate Workflows through Web Scraping #736
Comments
🙌 Thank you for bringing this issue to our attention! We appreciate your input and will investigate it as soon as possible. Feel free to join our community on Discord to discuss more! |
Hi @UTSAVS26 , Thank you for assigning the issue! I’d like to kindly request an update to a level 2 label. The project involves web scraping, which requires handling both static and dynamic content using Thank you for your understanding! |
Based on work done i will change the level. |
Hi again @UTSAVS26 I am done with the work on this issue and pulled a request for the same. Please check it whenever you feel comfortable and let me know if you face any doubts regarding anything in the PR feel free to ask me. Also can you please go through it well and based on my work, I would like to request you to please give me at least level 2 for the work. Regards, |
Fixes #736 ## Pull Request for PyVerse 💡 ### Requesting to submit a pull request to the PyVerse repository. --- #### Issue Title *Add Web Scraping Workflow Automation* - [YES] I have provided the issue title. --- #### Name *Sanchit Chauhan* - [YES] I have provided my name. --- #### GitHub ID *sanchitc05* - [YES] I have provided my GitHub ID. --- #### Email ID *[email protected]* - [YES] I have provided my email ID. --- #### Identify Yourself **Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC, SWOC).** *GSSOC, HACKTOBERFEST* - [YES] I have mentioned my participant role. --- #### Closes *Closes: #736 * - [YES] I have provided the issue number. --- #### Describe the Add-ons or Changes You've Made *### **Description** This PR introduces an automated web scraping workflow to extract data from static and dynamic web pages. The solution uses `requests` and `BeautifulSoup` for static pages, and `Selenium` for dynamic content. The scraped data is logged for easy tracking and error management. This feature streamlines repetitive data collection tasks and enables automated scheduling for regular scraping. ### **Technical Implementation** - **Libraries Used**: - `requests`: Fetch web pages for static content. - `BeautifulSoup`: Parse and extract relevant data from HTML. - `Selenium`: Automate browser interaction for dynamic content. - **Logging Module**: Tracks activities and errors in `scraper.log`. - **Project Structure**: - `scraper.py`: Main script containing scraping logic. - `requirements.txt`: Dependency list for easy setup. ### **Usage** 1. Clone the repository and install dependencies: ```bash git clone https://github.com/yourusername/web_scraper.git cd web_scraper pip install -r requirements.txt ``` 2. Update `static_url` and `dynamic_url` variables in `scraper.py`. 3. Run the scraper: ```bash python scraper.py ``` 4. Check logs in `scraper.log` for activity status. ### **Benefits** - **Automates data collection**, saving time and effort. - **Handles dynamic content**, making it adaptable to complex websites. - **Error tracking** ensures smooth, continuous scraping. ### **Testing** - Successfully tested scraping both static and dynamic pages. - Verified proper logging of activities and error handling.* - [YES] I have described my changes. --- #### Type of Change **Select the type of change:** - [YES] Bug fix (non-breaking change which fixes an issue) - [YES] New feature (non-breaking change which adds functionality) - [YES] Code style update (formatting, local variables) - [YES] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [YES] This change requires a documentation update --- #### How Has This Been Tested? **Describe how your changes have been tested.** *Describe your testing process here.* - [YES] I have described my testing process. --- #### Checklist **Please confirm the following:** - [YES] My code follows the guidelines of this project. - [YES] I have performed a self-review of my code. - [YES] I have commented on my code, particularly wherever it was hard to understand. - [YES] I have made corresponding changes to the documentation. - [YES] My changes generate no new warnings. - [YES] I have added things that prove my fix is effective or that my feature works. - [NO] Any dependent changes have been merged and published in downstream modules.
✅ This issue has been closed. Thank you for your contribution! If you have any further questions or issues, feel free to join our community on Discord to discuss more! |
Have you completed your first issue?
Guidelines
Latest Merged PR Link
UppuluriKalyani/ML-Nexus#324
Project Description
Description:
I propose developing a web scraping tool to automate workflows within PyVerse. This tool will extract data (e.g., prices, news, or stock data) from static and dynamic websites, store it for analysis, and run periodically using schedulers like cron or Task Scheduler.
Tech Stack:
requests
,BeautifulSoup
,Selenium
cron
(Linux) / Task Scheduler (Windows)logging
moduleApproach:
How This Helps Users:
Please assign this issue to me so I can begin implementation.
Full Name
Sanchit Chauhan
Participant Role
GSSOC, HACKTOBERFEST
The text was updated successfully, but these errors were encountered: