Common Use Cases¶

This guide shows you how to use OPAL for specific real-world scenarios. Each use case includes the exact commands, expected outputs, and practical tips.

News Monitoring Use Cases¶

Use Case 1: Track Weekly Political News¶

Scenario: You want to monitor Alabama political news from both major news sources for the past week.

Commands:

# Scrape 1819 News for political articles
python -m opal --url https://1819news.com/ --parser Parser1819 --suffix /news/item --max_pages 10 --output weekly_1819_politics.json

# Scrape Alabama Daily News for political articles  
python -m opal --url https://www.aldailynews.com/ --parser ParserDailyNews --suffix /news/item --max_pages 10 --output weekly_daily_politics.json

Expected Output: - Two JSON files with 50-100 articles each - Articles from the past week (news sites typically show recent content first) - Processing time: 5-10 minutes per source

Pro Tips: - Run these commands on the same day each week for consistency - Use --max_pages 5 for faster results if you just want recent highlights - Save files with dates: weekly_1819_2024-01-15.json

Use Case 2: Research Specific Topics¶

Scenario: You're researching coverage of education policy in Alabama news.

Commands:

# Collect all recent articles from both sources
python -m opal --url https://1819news.com/ --parser Parser1819 --max_pages 20 --output education_research_1819.json

python -m opal --url https://www.aldailynews.com/ --parser ParserDailyNews --max_pages 20 --output education_research_daily.json

Post-Processing (filter for education topics):

import json

# Load the data
with open('education_research_1819.json', 'r') as f:
    data = json.load(f)

# Filter for education-related articles
education_keywords = ['education', 'school', 'teacher', 'student', 'classroom', 'university', 'college']
education_articles = []

for article in data['articles']:
    title_lower = article['title'].lower()
    content_lower = article['content'].lower()

    if any(keyword in title_lower or keyword in content_lower for keyword in education_keywords):
        education_articles.append(article)

print(f"Found {len(education_articles)} education-related articles out of {len(data['articles'])} total")

# Save filtered results
filtered_data = {
    'articles': education_articles,
    'metadata': data['metadata'],
    'filter_applied': 'education keywords',
    'original_count': len(data['articles']),
    'filtered_count': len(education_articles)
}

with open('education_articles_filtered.json', 'w') as f:
    json.dump(filtered_data, f, indent=2)

Expected Output: - 200-500 total articles collected - 20-50 education-related articles after filtering - Research-ready dataset for analysis

Use Case 3: Daily News Digest¶

Scenario: You want a daily digest of top stories from Alabama news sources.

Commands:

# Get just the latest articles (first 2-3 pages usually contain today's news)
python -m opal --url https://1819news.com/ --parser Parser1819 --suffix /news/item --max_pages 3 --output daily_digest_1819.json

python -m opal --url https://www.aldailynews.com/ --parser ParserDailyNews --suffix /news/item --max_pages 3 --output daily_digest_daily.json

Automation Setup (Linux/Mac):

# Create a daily script
cat << 'EOF' > daily_digest.sh
#!/bin/bash
DATE=$(date +%Y-%m-%d)
cd /path/to/opal_project

# Activate virtual environment
source venv/bin/activate

# Run daily scrapes
python -m opal --url https://1819news.com/ --parser Parser1819 --suffix /news/item --max_pages 3 --output "daily_${DATE}_1819.json"
python -m opal --url https://www.aldailynews.com/ --parser ParserDailyNews --suffix /news/item --max_pages 3 --output "daily_${DATE}_daily.json"

echo "Daily digest complete for $DATE"
EOF

# Make executable and add to cron
chmod +x daily_digest.sh

# Add to crontab (runs every day at 8 AM)
echo "0 8 * * * /path/to/daily_digest.sh" | crontab -

Expected Output: - 20-40 articles per source - Processing time: 2-3 minutes total - Consistent daily data collection

Court Monitoring Use Cases¶

Use Case 4: Weekly Court Case Review¶

Scenario: You want to track new court cases filed each week.

Commands:

# Basic weekly court scraping
python -m opal --url https://publicportal.alappeals.gov/portal/search/case/results --parser ParserAppealsAL --max_pages 5 --output weekly_court_cases.json

# Or use the configurable extractor for more control
python -m opal.configurable_court_extractor --court civil --date-period 7d --exclude-closed --max-pages 10 --output-prefix weekly_civil

Expected Output: - 100-300 court cases - Both JSON and CSV formats - Cases from the past week - Processing time: 10-15 minutes (court scraping is slower due to JavaScript)

Use Case 5: Research Specific Case Types¶

Scenario: You're researching civil appeals related to business disputes.

Commands:

# Search civil court for appeals
python -m opal.configurable_court_extractor \
    --court civil \
    --date-period 6m \
    --case-category Appeal \
    --exclude-closed \
    --max-pages 15 \
    --output-prefix business_appeals_research

Post-Processing (filter for business cases):

import json

# Load court case data
with open('business_appeals_research_civil.json', 'r') as f:
    data = json.load(f)

# Filter for business-related cases
business_keywords = ['llc', 'corp', 'company', 'business', 'contract', 'commercial', 'partnership']
business_cases = []

for case in data['cases']:
    title_lower = case['case_title'].lower()

    if any(keyword in title_lower for keyword in business_keywords):
        business_cases.append(case)

print(f"Found {len(business_cases)} business-related cases out of {len(data['cases'])} total")

# Save filtered results
filtered_data = {
    'cases': business_cases,
    'metadata': data.get('metadata', {}),
    'filter_applied': 'business keywords',
    'search_parameters': data.get('search_parameters', {}),
    'original_count': len(data['cases']),
    'filtered_count': len(business_cases)
}

with open('business_cases_filtered.json', 'w') as f:
    json.dump(filtered_data, f, indent=2)

Expected Output: - 500-1000 total civil appeals - 50-150 business-related cases after filtering - Detailed case information for legal research

Use Case 6: Monitor Specific Courts¶

Scenario: You need to track all activity in the Criminal Appeals Court.

Commands:

# Criminal court cases from the last month
python -m opal.configurable_court_extractor \
    --court criminal \
    --date-period 1m \
    --max-pages 20 \
    --output-prefix monthly_criminal

Expected Output: - 200-600 criminal cases - All case types (appeals, petitions, etc.) - Complete case information - Processing time: 15-25 minutes

Analysis and Research Use Cases¶

Use Case 7: Comparative News Analysis¶

Scenario: Compare how different news sources cover the same topics.

Commands:

# Collect comprehensive data from both sources
python -m opal --url https://1819news.com/ --parser Parser1819 --max_pages 25 --output analysis_1819_full.json
python -m opal --url https://www.aldailynews.com/ --parser ParserDailyNews --max_pages 25 --output analysis_daily_full.json

Analysis Script:

import json
from collections import Counter
import re

def analyze_news_coverage(file1, file2, source1_name, source2_name):
    # Load both datasets
    with open(file1, 'r') as f:
        data1 = json.load(f)
    with open(file2, 'r') as f:
        data2 = json.load(f)

    # Extract common keywords
    def extract_keywords(articles):
        all_text = ' '.join([article['title'] + ' ' + article['content'] for article in articles])
        words = re.findall(r'\b[a-zA-Z]{4,}\b', all_text.lower())
        return Counter(words)

    keywords1 = extract_keywords(data1['articles'])
    keywords2 = extract_keywords(data2['articles'])

    # Find common topics
    common_keywords = set(keywords1.keys()) & set(keywords2.keys())

    print(f"\n=== NEWS COVERAGE COMPARISON ===")
    print(f"{source1_name}: {len(data1['articles'])} articles")
    print(f"{source2_name}: {len(data2['articles'])} articles")
    print(f"Common topics: {len(common_keywords)}")

    # Top topics by source
    print(f"\nTop topics in {source1_name}:")
    for word, count in keywords1.most_common(10):
        print(f"  {word}: {count}")

    print(f"\nTop topics in {source2_name}:")
    for word, count in keywords2.most_common(10):
        print(f"  {word}: {count}")

# Run the analysis
analyze_news_coverage('analysis_1819_full.json', 'analysis_daily_full.json', '1819 News', 'Alabama Daily News')

Expected Output: - 500-1000 articles per source - Keyword frequency analysis - Topic comparison between sources - Research insights

Use Case 8: Long-term Trend Monitoring¶

Scenario: Track political discourse trends over time.

Setup: Run this monthly for trend analysis:

# Monthly data collection script
#!/bin/bash
MONTH=$(date +%Y-%m)

# Create monthly folder
mkdir -p "monthly_data/$MONTH"
cd "monthly_data/$MONTH"

# Collect comprehensive data
python -m opal --url https://1819news.com/ --parser Parser1819 --max_pages 50 --output "1819_${MONTH}.json"
python -m opal --url https://www.aldailynews.com/ --parser ParserDailyNews --max_pages 50 --output "daily_${MONTH}.json"

# Court data
python -m opal.configurable_court_extractor --court civil --date-period 1m --max-pages 20 --output-prefix "court_${MONTH}"

echo "Monthly data collection complete for $MONTH"

Trend Analysis:

import json
import os
from datetime import datetime
import matplotlib.pyplot as plt

def analyze_monthly_trends(data_folder):
    monthly_data = {}

    # Load all monthly files
    for month_folder in os.listdir(data_folder):
        month_path = os.path.join(data_folder, month_folder)
        if os.path.isdir(month_path):
            # Load 1819 News data
            file_path = os.path.join(month_path, f"1819_{month_folder}.json")
            if os.path.exists(file_path):
                with open(file_path, 'r') as f:
                    data = json.load(f)
                    monthly_data[month_folder] = len(data['articles'])

    # Create trend chart
    months = sorted(monthly_data.keys())
    article_counts = [monthly_data[month] for month in months]

    plt.figure(figsize=(12, 6))
    plt.plot(months, article_counts, marker='o')
    plt.title('Monthly Article Count Trends')
    plt.xlabel('Month')
    plt.ylabel('Number of Articles')
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.savefig('monthly_trends.png')
    plt.show()

    print(f"Trend analysis complete. Chart saved as 'monthly_trends.png'")

# Run trend analysis
analyze_monthly_trends('monthly_data')

Performance and Efficiency Tips¶

Optimization Strategies¶

Start Small: Always test with --max_pages 2-3 first
Peak Hours: Avoid scraping during business hours (9 AM - 5 PM CST) for better performance
Batch Processing: Run multiple scrapers in sequence, not parallel
Storage: Use dated folders to organize output files

Resource Management¶

# Good practice: organized data collection
DATE=$(date +%Y-%m-%d)
mkdir -p "data/$DATE"
cd "data/$DATE"

# Run scrapers with reasonable limits
python -m opal --url https://1819news.com/ --parser Parser1819 --max_pages 10 --output "1819_${DATE}.json"
# Wait between scrapers to be respectful
sleep 30
python -m opal --url https://www.aldailynews.com/ --parser ParserDailyNews --max_pages 10 --output "daily_${DATE}.json"

Error Recovery¶

# Robust scraping with retry logic
#!/bin/bash
MAX_RETRIES=3
RETRY_COUNT=0

while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
    if python -m opal --url https://1819news.com/ --parser Parser1819 --max_pages 5 --output "test_output.json"; then
        echo "Scraping successful!"
        break
    else
        RETRY_COUNT=$((RETRY_COUNT + 1))
        echo "Attempt $RETRY_COUNT failed. Retrying in 60 seconds..."
        sleep 60
    fi
done

if [ $RETRY_COUNT -eq $MAX_RETRIES ]; then
    echo "Scraping failed after $MAX_RETRIES attempts"
    exit 1
fi

Next Steps¶

After implementing these use cases:

Automate Regular Collection: Set up cron jobs or scheduled tasks
Data Analysis: Use Python libraries like pandas for deeper analysis
Visualization: Create charts and graphs from your collected data
Integration: Connect OPAL data to databases or other analysis tools

For troubleshooting specific issues, see Understanding Errors.

For working with the collected data, see Working with Output Data.