Workflows¶

This document describes the internal workflows and processing patterns used by OPAL parsers. For comprehensive visual diagrams showing the complete data flow, architecture, and decision trees, see Visual Flow Diagrams.

Overview¶

OPAL uses several interconnected workflows:

Parser Selection: Automatic detection based on URL patterns
URL Collection: Site-specific pagination and link discovery
Data Extraction: HTML parsing and content extraction
Court Searching: Advanced portal navigation and filtering
Error Handling: Graceful fallbacks and retry logic

Key Workflow Components¶

1. Court ID Discovery¶

The CourtSearchBuilder automatically discovers available courts:

Navigation: Load the court portal search page
Element Detection: Locate court selection dropdowns
Data Extraction: Parse option elements for court names and IDs
Fallback Strategy: Use known court IDs if discovery fails

2. Search URL Building¶

For court extractor operations:

Parameter Validation: Verify court type, dates, filters
URL Construction: Build search URLs with proper parameters
Session Management: Handle portal session requirements
Result Pagination: Generate URLs for all result pages

3. Data Processing Pipeline¶

For all parser types:

URL Collection: Gather all target URLs for processing
Content Extraction: Parse HTML and extract structured data
Data Validation: Verify extracted fields meet requirements
Output Generation: Format data as JSON/CSV with timestamps

Implementation Details¶

Parser Factory Pattern¶

def get_parser(url, parser_name):
    """Select appropriate parser based on URL and name"""
    if 'appealscourts.gov' in url:
        return ParserAppealsAL()
    elif '1819news.com' in url:
        return Parser1819()
    # ... other parsers

Error Recovery¶

Network Issues: Exponential backoff with retry limits
Element Detection: Alternative selector strategies
Browser Crashes: Automatic driver restart
Rate Limiting: Adaptive delay mechanisms

Visual References¶

For complete visual representations of these workflows:

📊 Visual Flow Diagrams - Comprehensive system diagrams
🔧 Architecture Overview - System components and relationships
⚠️ Error Handling - Error recovery strategies

Development Guidelines¶

When extending workflows:

Follow the established parser inheritance pattern
Implement proper error handling with fallbacks
Add appropriate logging for debugging
Consider rate limiting for respectful scraping
Update visual diagrams when adding new flows

Testing Workflows¶

# Test court discovery
python -c "from opal.configurable_court_extractor import CourtSearchBuilder; CourtSearchBuilder().discover_court_ids()"

# Test parser selection  
python -c "from opal.main import get_parser; print(get_parser('https://1819news.com/', 'Parser1819'))"

# Validate search URL building
python -m opal.configurable_court_extractor --court civil --date-period 7d --dry-run