[Code Addition Request]: Automate Workflows through Web Scraping #736

sanchitc05 · 2024-10-20T06:55:50Z

Have you completed your first issue?

I have completed my first issue

Guidelines

I have read the guidelines
I have the link to my latest merged PR

Latest Merged PR Link

Project Description

Description:
I propose developing a web scraping tool to automate workflows within PyVerse. This tool will extract data (e.g., prices, news, or stock data) from static and dynamic websites, store it for analysis, and run periodically using schedulers like cron or Task Scheduler.

Tech Stack:

Python Libraries: requests, BeautifulSoup, Selenium
Scheduling: cron (Linux) / Task Scheduler (Windows)
Error Handling and Logs: logging module

Approach:

Identify websites to scrape based on user needs (e.g., financial or product data).
Build modules for scraping both static and dynamic pages.
Implement automated scheduling with logging for status tracking.
Create detailed documentation and examples to guide users.

How This Helps Users:

Automates repetitive data extraction tasks.
Saves time and ensures users always have the latest data for analysis.
Encourages contributions from beginners and advanced users alike through modular code and documentation.

Please assign this issue to me so I can begin implementation.

Full Name

Sanchit Chauhan

Participant Role

GSSOC, HACKTOBERFEST

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-20T06:56:01Z

🙌 Thank you for bringing this issue to our attention! We appreciate your input and will investigate it as soon as possible.

Feel free to join our community on Discord to discuss more!

sanchitc05 · 2024-10-20T07:32:33Z

Hi @UTSAVS26 ,

Thank you for assigning the issue! I’d like to kindly request an update to a level 2 label. The project involves web scraping, which requires handling both static and dynamic content using requests, BeautifulSoup, and Selenium. Additionally, I'll implement error handling and logging to ensure robustness. Given these complexities, I believe a level 2 label would be more appropriate.

Thank you for your understanding!

UTSAVS26 · 2024-10-20T08:01:01Z

Based on work done i will change the level.

Fixes UTSAVS26#736

sanchitc05 · 2024-10-20T09:00:30Z

Hi again @UTSAVS26 I am done with the work on this issue and pulled a request for the same. Please check it whenever you feel comfortable and let me know if you face any doubts regarding anything in the PR feel free to ask me.

Also can you please go through it well and based on my work, I would like to request you to please give me at least level 2 for the work.

Regards,
Sanchit Chauhan

Fixes #736 ## Pull Request for PyVerse 💡 ### Requesting to submit a pull request to the PyVerse repository. --- #### Issue Title *Add Web Scraping Workflow Automation* - [YES] I have provided the issue title. --- #### Name *Sanchit Chauhan* - [YES] I have provided my name. --- #### GitHub ID *sanchitc05* - [YES] I have provided my GitHub ID. --- #### Email ID *[email protected]* - [YES] I have provided my email ID. --- #### Identify Yourself **Mention in which program you are contributing (e.g., WoB, GSSOC, SSOC, SWOC).** *GSSOC, HACKTOBERFEST* - [YES] I have mentioned my participant role. --- #### Closes *Closes: #736 * - [YES] I have provided the issue number. --- #### Describe the Add-ons or Changes You've Made *### **Description** This PR introduces an automated web scraping workflow to extract data from static and dynamic web pages. The solution uses `requests` and `BeautifulSoup` for static pages, and `Selenium` for dynamic content. The scraped data is logged for easy tracking and error management. This feature streamlines repetitive data collection tasks and enables automated scheduling for regular scraping. ### **Technical Implementation** - **Libraries Used**: - `requests`: Fetch web pages for static content. - `BeautifulSoup`: Parse and extract relevant data from HTML. - `Selenium`: Automate browser interaction for dynamic content. - **Logging Module**: Tracks activities and errors in `scraper.log`. - **Project Structure**: - `scraper.py`: Main script containing scraping logic. - `requirements.txt`: Dependency list for easy setup. ### **Usage** 1. Clone the repository and install dependencies: ```bash git clone https://github.com/yourusername/web_scraper.git cd web_scraper pip install -r requirements.txt ``` 2. Update `static_url` and `dynamic_url` variables in `scraper.py`. 3. Run the scraper: ```bash python scraper.py ``` 4. Check logs in `scraper.log` for activity status. ### **Benefits** - **Automates data collection**, saving time and effort. - **Handles dynamic content**, making it adaptable to complex websites. - **Error tracking** ensures smooth, continuous scraping. ### **Testing** - Successfully tested scraping both static and dynamic pages. - Verified proper logging of activities and error handling.* - [YES] I have described my changes. --- #### Type of Change **Select the type of change:** - [YES] Bug fix (non-breaking change which fixes an issue) - [YES] New feature (non-breaking change which adds functionality) - [YES] Code style update (formatting, local variables) - [YES] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [YES] This change requires a documentation update --- #### How Has This Been Tested? **Describe how your changes have been tested.** *Describe your testing process here.* - [YES] I have described my testing process. --- #### Checklist **Please confirm the following:** - [YES] My code follows the guidelines of this project. - [YES] I have performed a self-review of my code. - [YES] I have commented on my code, particularly wherever it was hard to understand. - [YES] I have made corresponding changes to the documentation. - [YES] My changes generate no new warnings. - [YES] I have added things that prove my fix is effective or that my feature works. - [NO] Any dependent changes have been merged and published in downstream modules.

github-actions · 2024-10-22T03:21:40Z

✅ This issue has been closed. Thank you for your contribution! If you have any further questions or issues, feel free to join our community on Discord to discuss more!

UTSAVS26 assigned sanchitc05 Oct 20, 2024

UTSAVS26 added Contributor Denotes issues or PRs submitted by contributors to acknowledge their participation. Status: Assigned💻 Indicates an issue has been assigned to a contributor. level1 gssoc-ext hacktoberfest labels Oct 20, 2024

sanchitc05 added a commit to sanchitc05/PyVerse that referenced this issue Oct 20, 2024

[Code Addition Request]: Automate Workflows through Web Scraping

e70c694

Fixes UTSAVS26#736

sanchitc05 mentioned this issue Oct 20, 2024

[Code Addition Request]: Automate Workflows through Web Scraping #738

Merged

UTSAVS26 closed this as completed in #738 Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Code Addition Request]: Automate Workflows through Web Scraping #736

[Code Addition Request]: Automate Workflows through Web Scraping #736

sanchitc05 commented Oct 20, 2024

github-actions bot commented Oct 20, 2024

sanchitc05 commented Oct 20, 2024

UTSAVS26 commented Oct 20, 2024

sanchitc05 commented Oct 20, 2024

github-actions bot commented Oct 22, 2024

[Code Addition Request]: Automate Workflows through Web Scraping #736

[Code Addition Request]: Automate Workflows through Web Scraping #736

Comments

sanchitc05 commented Oct 20, 2024

Have you completed your first issue?

Guidelines

Latest Merged PR Link

Project Description

Tech Stack:

Approach:

How This Helps Users:

Full Name

Participant Role

github-actions bot commented Oct 20, 2024

sanchitc05 commented Oct 20, 2024

UTSAVS26 commented Oct 20, 2024

sanchitc05 commented Oct 20, 2024

github-actions bot commented Oct 22, 2024