Class Reference¶
This section provides comprehensive documentation for all core classes in the OPAL system. Each class is designed with specific responsibilities following object-oriented principles.
Core Classes Overview¶
Parser Classes¶
Class | Module | Description |
---|---|---|
BaseParser | opal.base_parser |
Abstract base class for all parsers |
ParserAppealsAL | opal.parser_module |
Alabama Appeals Court parser |
Parser1819 | opal.parser_module |
1819 News parser |
ParserDailyNews | opal.parser_module |
Daily news sites parser |
Extractor Classes¶
Class | Module | Description |
---|---|---|
ConfigurableCourtExtractor | opal.configurable_court_extractor |
Flexible court data extractor |
CourtExtractor | opal.court_case_parser |
Core extraction logic |
URLPaginator | opal.court_url_paginator |
Handles pagination |
Data Structures¶
Class | Module | Description |
---|---|---|
CourtCase | opal.data_structures |
Court case data model |
ParseResult | opal.data_structures |
Parser result container |
ExtractorConfig | opal.data_structures |
Configuration model |
Class Hierarchy¶
BaseParser (Abstract)
├── ParserAppealsAL
├── Parser1819
└── ParserDailyNews
Extractor (Abstract)
├── ConfigurableCourtExtractor
└── CourtExtractor
DataModel (Abstract)
├── CourtCase
├── ParseResult
└── ExtractorConfig
Design Patterns¶
Factory Pattern¶
Used for creating parser instances based on command-line arguments:
def create_parser(parser_type: str) -> BaseParser:
parsers = {
'appeals-al': ParserAppealsAL,
'parser-1819': Parser1819,
'daily-news': ParserDailyNews
}
return parsers[parser_type]()
Strategy Pattern¶
Extractors use different strategies for different content types:
class ExtractionStrategy:
def extract(self, soup: BeautifulSoup) -> dict:
pass
class TableExtractionStrategy(ExtractionStrategy):
def extract(self, soup: BeautifulSoup) -> dict:
# Table-specific extraction
pass
Template Method Pattern¶
BaseParser defines the parsing workflow:
class BaseParser:
def parse(self):
self.setup()
data = self.extract_data()
self.validate(data)
return self.format_output(data)
Common Interfaces¶
Parser Interface¶
All parsers implement these methods:
parse(start_date, end_date)
- Main parsing methodvalidate_date_range(start_date, end_date)
- Date validationformat_output(data)
- Output formattingsave_results(data, format, filename)
- Result persistence
Extractor Interface¶
All extractors implement:
extract(url)
- Extract data from URLparse_content(html)
- Parse HTML contentvalidate_data(data)
- Validate extracted datatransform_data(data)
- Transform to standard format
Usage Examples¶
Creating a Parser¶
from opal.parser_module import ParserAppealsAL
parser = ParserAppealsAL()
results = parser.parse(
start_date="2024-01-01",
end_date="2024-12-31"
)
Using an Extractor¶
from opal.configurable_court_extractor import ConfigurableCourtExtractor
extractor = ConfigurableCourtExtractor(config)
court_cases = extractor.extract("https://example.com/cases")
Extension Points¶
Custom Parser Creation¶
- Inherit from
BaseParser
- Override required methods
- Register in parser factory
- Add command-line integration
Custom Extractor Creation¶
- Inherit from base extractor
- Implement extraction logic
- Define data mappings
- Add validation rules
Best Practices¶
Error Handling¶
- Use specific exception types
- Provide meaningful error messages
- Implement retry logic for network issues
- Log errors appropriately
Performance¶
- Implement caching where appropriate
- Use connection pooling
- Respect rate limits
- Optimize parsing algorithms
Testing¶
- Mock external dependencies
- Test edge cases
- Validate data transformations
- Ensure thread safety
Next Steps¶
- Review individual class documentation
- Learn about creating custom parsers
- Explore architecture patterns