Workflows¶
This document describes the internal workflows and processing patterns used by OPAL parsers. For comprehensive visual diagrams showing the complete data flow, architecture, and decision trees, see Visual Flow Diagrams.
Overview¶
OPAL uses several interconnected workflows:
- Parser Selection: Automatic detection based on URL patterns
- URL Collection: Site-specific pagination and link discovery
- Data Extraction: HTML parsing and content extraction
- Court Searching: Advanced portal navigation and filtering
- Error Handling: Graceful fallbacks and retry logic
Key Workflow Components¶
1. Court ID Discovery¶
The CourtSearchBuilder automatically discovers available courts:
- Navigation: Load the court portal search page
- Element Detection: Locate court selection dropdowns
- Data Extraction: Parse option elements for court names and IDs
- Fallback Strategy: Use known court IDs if discovery fails
2. Search URL Building¶
For court extractor operations:
- Parameter Validation: Verify court type, dates, filters
- URL Construction: Build search URLs with proper parameters
- Session Management: Handle portal session requirements
- Result Pagination: Generate URLs for all result pages
3. Data Processing Pipeline¶
For all parser types:
- URL Collection: Gather all target URLs for processing
- Content Extraction: Parse HTML and extract structured data
- Data Validation: Verify extracted fields meet requirements
- Output Generation: Format data as JSON/CSV with timestamps
Implementation Details¶
Parser Factory Pattern¶
def get_parser(url, parser_name):
"""Select appropriate parser based on URL and name"""
if 'appealscourts.gov' in url:
return ParserAppealsAL()
elif '1819news.com' in url:
return Parser1819()
# ... other parsers
Error Recovery¶
- Network Issues: Exponential backoff with retry limits
- Element Detection: Alternative selector strategies
- Browser Crashes: Automatic driver restart
- Rate Limiting: Adaptive delay mechanisms
Visual References¶
For complete visual representations of these workflows:
- 📊 Visual Flow Diagrams - Comprehensive system diagrams
- 🔧 Architecture Overview - System components and relationships
- ⚠️ Error Handling - Error recovery strategies
Development Guidelines¶
When extending workflows:
- Follow the established parser inheritance pattern
- Implement proper error handling with fallbacks
- Add appropriate logging for debugging
- Consider rate limiting for respectful scraping
- Update visual diagrams when adding new flows
Testing Workflows¶
# Test court discovery
python -c "from opal.configurable_court_extractor import CourtSearchBuilder; CourtSearchBuilder().discover_court_ids()"
# Test parser selection
python -c "from opal.main import get_parser; print(get_parser('https://1819news.com/', 'Parser1819'))"
# Validate search URL building
python -m opal.configurable_court_extractor --court civil --date-period 7d --dry-run