Changelog

All notable changes to the Statement gem will be documented in this file.

[2.3.0] - 2024-11-17

Added

Error Handling & Reliability

Comprehensive logging system with configurable log levels (DEBUG, INFO, WARN, ERROR, FATAL)
Automatic retry logic with exponential backoff for network errors (max 2 retries)
Graceful error handling - scrapers return empty arrays instead of crashing
User-agent headers to reduce 403 Forbidden errors
Timeout handling (30 second default) for unresponsive sites

Command-Line Interface

New statement CLI executable for running scrapers from command line
Support for specifying Congress number (defaults to 119th Congress)
Filter by type: members, committees, or all
Run individual scrapers by name: --scraper shaheen
Single URL scraping mode: --url <url>
Multiple output formats: JSON, CSV, and pretty-printed text
Performance metrics tracking with --performance flag
Configurable log levels via --log-level flag
List all available scrapers with --list-scrapers

Testing & Performance Framework

Automated testing suite (bin/test_scrapers) for all 185 scrapers
Performance metrics collection (execution time, success rate, error details)
Multiple report formats: text, JSON, CSV, and HTML
Identifies slow scrapers (>5 seconds) for optimization
Tracks scraper status: success, warning, failed, error
Sample results preview in test output

Code Organization

New ScraperBase class providing common functionality:
- open_html(url, retries): Robust HTML fetching with retries
- safe_scrape(name, &block): Error-wrapped scraper execution
- parse_date(date_string, format): Safe date parsing with error handling
- build_result(...): Standardized result hash construction
Logger module for consistent logging across all components
ScraperRegistry module for future scraper metadata management
ScraperTester class for comprehensive scraper testing

Documentation

New MAINTAINER_GUIDE.md with comprehensive maintenance instructions
Performance report generation for tracking scraper health
Troubleshooting guide for common issues
Scraper pattern documentation for adding new members

Changed

Scraper class now inherits from ScraperBase
All scraper methods now use improved error handling
open_html method now includes:
- User-agent headers to reduce blocking
- Automatic retry logic
- Comprehensive error logging
- Timeout protection
Version bumped to 2.3 (from 2.2 implicitly)

Improved

Stability: Gem no longer crashes on individual scraper failures
Observability: All errors are logged with full context
Performance: Automatic identification of slow scrapers
Maintainability: Modular architecture for easier updates
Debugging: Debug log level available for troubleshooting

Fixed

Scrapers no longer crash entire process on errors
Better handling of 403 Forbidden responses
More robust date parsing with error recovery
Network timeout handling

Technical Details

New Files

lib/statement/logger.rb: Logging system (42 lines)
lib/statement/scraper_base.rb: Base class with error handling (107 lines)
lib/statement/scraper_registry.rb: Scraper registry (77 lines)
lib/statement/scraper_tester.rb: Testing framework (432 lines)
bin/statement: CLI executable (321 lines)
bin/test_scrapers: Test runner (33 lines)
MAINTAINER_GUIDE.md: Comprehensive maintenance documentation
CHANGELOG.md: This file

Modified Files

lib/statement.rb: Added new module requires
lib/statement/scraper.rb: Inherits from ScraperBase, improved open_html
statement.gemspec: Updated to include new executables

Known Issues

Many congressional websites now block automated scrapers (403 Forbidden errors)
- This is expected behavior and handled gracefully
- Consider using RSS feeds where available
2 scrapers not yet implemented: emmer, porter
- Listed in member_methods but methods not defined
- Will be added in future update

Migration Guide

For users upgrading from version 2.2 or earlier:

No breaking changes - All existing code continues to work
New features are opt-in - Logging is automatic but configurable
CLI is optional - Can still use gem programmatically as before

Optional: Configure logging in your application:

# Set custom log level
Statement::Logger.setup(log_level: Logger::INFO)

# Or disable logging output
Statement::Logger.setup(output: File.open('/dev/null', 'w'))

Performance Impact

Average scraper execution time: ~3.4 seconds (includes retries)
Retry logic adds ~2-4 seconds for failed requests
Logging overhead: negligible (<10ms per scraper)
Memory usage: unchanged

Testing

All changes tested with:

Ruby 3.3.6
185 scraper methods (129 members, 56 committees)
Multiple site structures and response codes
Various error conditions (timeout, 403, 404, network errors)

Credits

Updates and improvements by Derek Willis with assistance from Claude (Anthropic).

Recommendations for Future Versions

Move to RSS feeds primarily: More reliable than HTML scraping
Add member metadata: Include name, party, state, district
Content extraction: Pull full text and summaries
Caching layer: Reduce repeated requests
Rate limiting: Respect congressional website servers
OAuth integration: For protected APIs
Database backend: Store historical data
API mode: Provide JSON API for scraped data

[2.2.0] and Earlier

See git history for changes in previous versions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

[2.3.0] - 2024-11-17

Added

Error Handling & Reliability

Command-Line Interface

Testing & Performance Framework

Code Organization

Documentation

Changed

Improved

Fixed

Technical Details

New Files

Modified Files

Known Issues

Migration Guide

Performance Impact

Testing

Credits

Recommendations for Future Versions

[2.2.0] and Earlier

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[2.3.0] - 2024-11-17

Added

Error Handling & Reliability

Command-Line Interface

Testing & Performance Framework

Code Organization

Documentation

Changed

Improved

Fixed

Technical Details

New Files

Modified Files

Known Issues

Migration Guide

Performance Impact

Testing

Credits

Recommendations for Future Versions

[2.2.0] and Earlier