Configurable Court Extractor Design¶

Problem Statement¶

A former version of extract_all_court_cases.py had hardcoded search parameters in the URL, making it inflexible for different search criteria. Users couldn't dynamically change: - Date ranges - Case number filters - Case title filters - Whether to include/exclude closed cases

Solution Overview¶

I designed a configurable court extractor that separates URL construction from data extraction, allowing users to specify search parameters via command line arguments or function parameters.

Architecture¶

1. CourtSearchBuilder Class¶

Purpose: Encapsulates the complex URL building logic for Alabama Appeals Court searches.

Why I designed it this way:¶

Separation of Concerns: URL building is separate from data extraction
Maintainability: Changes to URL structure only affect one class
Reusability: Can be used by different scripts or tools
Readability: Clear methods for each search parameter

class CourtSearchBuilder:
    def __init__(self):
        self.base_url = "https://publicportal.alappeals.gov/portal/search/case/results"
        self.court_id = "68f021c4-6a44-4735-9a76-5360b2e8af13"
        self.reset_params()

2. Key Methods Explained¶

`set_date_range()`¶

Purpose: Handle different date range options Design rationale: - Supports both predefined periods (-1y, -6m) and custom date ranges - Automatically converts dates to the portal's expected format (*2f encoding) - Provides sensible defaults

`build_criteria_string()`¶

Purpose: Construct the complex URL-encoded criteria parameter Design rationale: - Handles the intricate URL encoding required by the portal - Builds the nested parameter structure programmatically - Reduces human error in URL construction

`build_url()`¶

Purpose: Create complete search URLs with pagination Design rationale: - Updates page numbers dynamically - Maintains other search parameters across pages - Returns ready-to-use URLs

Configuration Options¶

Court Selection¶

# Available courts
courts = {
    'civil': 'Alabama Civil Court of Appeals',
    'criminal': 'Alabama Court of Criminal Appeals', 
    'supreme': 'Alabama Supreme Court'
}

# Select court
search_builder.set_court('civil')  # or 'criminal', 'supreme'

Case Number Formats¶

# Open-ended search
search_builder.set_case_number_filter('2024-001')

# Court-specific formats
search_builder.set_case_number_filter('CL-2024-0001')  # Civil Appeals
search_builder.set_case_number_filter('CR-2024-0001')  # Criminal Appeals  
search_builder.set_case_number_filter('SC-2024-0001')  # Supreme Court

Case Categories¶

# For Civil Appeals and Criminal Appeals
categories = ['Appeal', 'Certiorari', 'Original Proceeding', 'Petition']

# For Supreme Court (includes additional option)
supreme_categories = ['Appeal', 'Certiorari', 'Original Proceeding', 'Petition', 'Certified Question']

# Set category
search_builder.set_case_category('Appeal')

Date Filters¶

# Predefined periods (matching portal options)
search_builder.set_date_range(period='7d')   # Last 7 days
search_builder.set_date_range(period='1m')   # Last month
search_builder.set_date_range(period='3m')   # Last 3 months
search_builder.set_date_range(period='6m')   # Last 6 months
search_builder.set_date_range(period='1y')   # Last year

# Custom date range
search_builder.set_date_range('2024-01-01', '2024-12-31', 'custom')

Case Title and Status Filters¶

# Filter by case title (partial match)
search_builder.set_case_title_filter('Smith v Jones')

# Exclude closed cases
search_builder.set_exclude_closed(True)

Command Line Interface¶

Why I included CLI arguments: - User-friendly: No need to modify code for different searches - Scriptable: Can be integrated into automated workflows - Documented: Built-in help shows all options

Usage Examples¶

Option 1: Use Built-in Search Parameters (Recommended)¶

# Extract all cases from last year (default from all courts)
python configurable_court_extractor.py

# Extract cases from Alabama Supreme Court only
python configurable_court_extractor.py --court supreme

# Extract cases from last 7 days from Criminal Appeals
python configurable_court_extractor.py --court criminal --date-period 7d

# Extract Appeal cases from Civil Court
python configurable_court_extractor.py --court civil --case-category Appeal

# Extract cases with custom date range from Supreme Court
python configurable_court_extractor.py --court supreme --date-period custom --start-date 2024-01-01 --end-date 2024-06-30

# Filter by specific case number format
python configurable_court_extractor.py --court civil --case-number "CL-2024-"

# Filter by case title in Criminal Appeals
python configurable_court_extractor.py --court criminal --case-title "State v"

# Exclude closed cases from Supreme Court
python configurable_court_extractor.py --court supreme --exclude-closed

# Extract Certified Questions from Supreme Court (unique to Supreme Court)
python configurable_court_extractor.py --court supreme --case-category "Certified Question"

# Comprehensive search with multiple filters
python configurable_court_extractor.py --court civil --case-category Appeal --date-period 3m --exclude-closed --output-prefix "civil_appeals_q1"

Option 2: Use Pre-built URL with Embedded Search Terms¶

⚠️ WARNING: Custom URLs are temporary and session-based. They may stop working when the website session expires.

# Use your existing URL with search terms already embedded
python configurable_court_extractor.py --url "https://publicportal.alappeals.gov/portal/search/case/results?criteria=~%28advanced~false~courtID~%2768f021c4-6a44-4735-9a76-5360b2e8af13~page~%28size~25~number~0~totalElements~0~totalPages~0%29~sort~%28sortBy~%27caseHeader.filedDate~sortDesc~true%29~case~%28caseCategoryID~1000000~caseNumberQueryTypeID~10463~caseTitleQueryTypeID~300054~filedDateChoice~%27-1y~filedDateStart~%2706%2a2f11%2a2f2024~filedDateEnd~%2706%2a2f11%2a2f2025~excludeClosed~false%29%29"

# Use custom URL with limited pages and custom output prefix
python configurable_court_extractor.py --url "https://publicportal.alappeals.gov/portal/search/case/results?criteria=..." --max-pages 5 --output-prefix "my_custom_search"

# Any URL from the portal search interface works
python configurable_court_extractor.py --url "https://publicportal.alappeals.gov/portal/search/case/results?criteria=YOUR_CUSTOM_SEARCH_CRITERIA"

Hybrid Approach¶

# You can also programmatically call the function with a custom URL
from configurable_court_extractor import extract_court_cases_with_params

# Use your existing URL
your_url = "https://publicportal.alappeals.gov/portal/search/case/results?criteria=..."
result = extract_court_cases_with_params(custom_url=your_url, max_pages=10)

Dynamic Court ID Discovery¶

The Problem with Dynamic IDs¶

Modern web applications often generate session-specific or dynamic identifiers that change between visits. The Alabama Appeals Court portal appears to use dynamic court IDs that are assigned during the user's session rather than being static, predictable values.

Solution¶

Chosen Solution: Automatic Discovery The discover_court_ids() method navigates to the court's search interface and programmatically extracts the current court IDs by:

Loading the search page - Navigates to the main case search interface
Inspecting form elements - Locates the court selection dropdown or form elements
Extracting ID mappings - Parses the HTML to find court names and their corresponding dynamic IDs
Caching for session - Stores the discovered IDs for the duration of the session

Option 2: Manual Discovery If automatic discovery fails, users can:

Inspect browser network traffic - Use browser developer tools to monitor the search requests
Extract court ID from URL - Copy a working search URL and extract the court ID parameter
Set manually - Use set_court_id_manually() to override the discovered ID

Option 3: URL Bypass (Fallback) When court ID discovery completely fails, users can:

Use browser to build URL - Manually configure search on the website
Copy complete URL - Get the full URL with embedded parameters
Use --url option - Pass the pre-built URL directly, bypassing all parameter building

Implementation Benefits¶

Resilient to changes - Automatically adapts to new court ID schemes
Fallback options - Multiple strategies when automatic discovery fails
User-friendly - Handles complexity behind the scenes
Transparent - Shows discovered IDs to user for verification

Usage Examples with Dynamic IDs¶

# Let the system discover court IDs automatically
python configurable_court_extractor.py --court civil --date-period 1m

# If discovery fails, fall back to custom URL
python configurable_court_extractor.py --url "https://publicportal.alappeals.gov/portal/search/case/results?criteria=..."

# For debugging: manually set a court ID
search_builder = CourtSearchBuilder()
search_builder.set_court_id_manually('civil', 'discovered-session-id-12345')

Technical Implementation Details¶

URL Encoding Strategy¶

The Alabama Appeals Court portal uses a complex nested URL structure:

?criteria=~%28advanced~false~courtID~%27{court_id}~page~%28...%29~sort~%28...%29~case~%28...%29%29

My approach:

Build parameters as nested dictionaries
Convert to the portal's specific encoding format
Handle special characters and escaping automatically

Error Handling¶

Graceful degradation: - If total page count can't be determined, process incrementally - Continue processing if individual pages fail - Provide detailed error messages with stack traces

Performance Considerations¶

Rate limiting: - Configurable delays between requests - Respectful of server resources

Memory efficiency: - Process pages incrementally - Don't load all data into memory at once

Progress reporting: - Real-time feedback on processing status - Clear indication of completion

Advantages Over Former Implementation¶

1. Flexibility¶

Before: Fixed search parameters in hardcoded URL
After: Configurable search criteria via parameters OR custom URLs

2. Maintainability¶

Before: URL changes require code modification
After: URL structure centralized in builder class with dynamic discovery

3. Usability¶

Before: Developers need to understand complex URL structure
After: Simple method calls and CLI arguments

4. Reusability¶

Before: Single-purpose script
After: Reusable components for different use cases

5. Documentation¶

Before: Search parameters hidden in URL
After: Clear parameter documentation and examples

6. Resilience to Changes¶

Before: Hardcoded court IDs break when website changes
After: Automatic discovery adapts to dynamic court ID schemes

7. Multiple Fallback Options¶

Before: Script fails completely if URL structure changes
After: Automatic discovery → manual discovery → custom URL bypass

Integration with Existing Code¶

The new extractor coexists with the current implementation: - Uses the same ParserAppealsAL class - Produces the same JSON/CSV output format - Follows the same error handling patterns

Future Enhancements¶

Advanced Features¶

Save/load search configurations
Scheduled extractions
Differential updates (only new cases)
Export to additional formats (Excel, XML)

Performance Improvements¶

Parallel page processing
Caching of search results
Resume interrupted extractions

Code Structure¶

configurable_court_extractor.py
├── CourtSearchBuilder class
│   ├── Parameter management methods
│   ├── URL building methods
│   └── Validation methods
├── extract_court_cases_with_params() function
│   ├── Search execution logic
│   ├── Progress reporting
│   └── Output generation
└── main() function
    ├── CLI argument parsing
    ├── Parameter validation
    └── Function orchestration

Why This Design is Better¶

Single Responsibility: Each class/function has one clear purpose
Open/Closed Principle: Easy to extend without modifying existing code
DRY (Don't Repeat Yourself): URL logic is centralized
User-Centered: Designed around user needs, not technical constraints
Testable: Components can be unit tested independently
Documented: Self-documenting code with clear method names

This design transforms a rigid, single-purpose script into a flexible, user-friendly tool that can adapt to various research needs while maintaining the reliability and performance of the original implementation.

Implementation Reference¶

Note: The complete implementation code has been moved to the reference documentation for better organization. Please refer to the Configurable Court Extractor Reference for the full source code, API documentation, and detailed implementation examples.

Key Design Decisions Explained¶

Builder Pattern: Separates URL construction complexity from business logic
Dual-Mode Operation: Supports both parameter-based search and pre-built URL input
Parameter Validation: Ensures required combinations are provided (custom dates)
Progressive Enhancement: Starts with defaults, allows selective customization
Error Recovery: Graceful handling when page counts can't be determined
Output Consistency: Maintains same format as original extractor
User Feedback: Real-time progress and parameter confirmation
URL Flexibility: Custom URLs override all search parameters for maximum flexibility

Summary¶

This design transforms a rigid, single-purpose script into a flexible, user-friendly tool that can adapt to various research needs while maintaining the reliability and performance of the original implementation. The modular architecture ensures that each component has a single responsibility, making the system easy to extend, test, and maintain.