-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Py selenium j script click #1
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Mystique <[email protected]>
- uses both wrapper display and xfvb (need to get this right) - need to get the delay for browser wait right Signed-off-by: Mystique <[email protected]>
- Scrapes URLs & prints out the list, - Has sub routines to scrape the given page, have to iterate through to the end Signed-off-by: Mystique <[email protected]>
Signed-off-by: Mystique <[email protected]>
Wait condition works but not enough for this site. ToDo - Working on next page click Signed-off-by: Mystique <[email protected]>
- for 'Next Page' Button Click - Uses 'Execture_Script' for JS Button Click - Scraping Multiple pages - Return url List
Signed-off-by: Mystique <[email protected]>
- Scrap and return dictionary with aws Tag, sourceUrl, uri,pages crawled, crawl success - stores them in json format output file with the format 'acloudguru-<awsTag>.json' Signed-off-by: Mystique <[email protected]>
- Scrap and return dictionary with aws Tag, sourceUrl, uri,pages crawled, crawl success - stores them in json format output file with the format 'acloudguru-<awsTag>.json' Signed-off-by: Mystique <[email protected]>
Signed-off-by: Mystique <[email protected]>
- Output also commited Signed-off-by: Mystique <[email protected]>
Signed-off-by: Mystique <[email protected]>
Signed-off-by: Mystique <[email protected]>
Signed-off-by: Mystique <[email protected]>
- Added another area to scrap Signed-off-by: Mystique <[email protected]>
- Ability to read urls from file and scrap them - Each URL output is dumped to separate file with tag and date prefix Working code can be re-used as is. Signed-off-by: Mystique <[email protected]>
- Scrapy the link and store the output as JSON in "output" directory - added a date time stamp to the JSON Signed-off-by: Mystique <[email protected]>
- Added timestamp - Added the page wait load time in the url , no more hardcoding Signed-off-by: Mystique <[email protected]>
Signed-off-by: Mystique <[email protected]>
Signed-off-by: Mystique <[email protected]>
Signed-off-by: Mystique <[email protected]>
- added the available tags "hard Coded" in a dictionary, probably will iterate in future. Signed-off-by: Mystique <[email protected]>
LGTM |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Working - Able to scrap the web pages under the forums using selenium and store them under the "output" directory