Skip to content

Output Examples

This page shows you exactly what data OPAL produces when scraping different sources. Understanding the output format helps you plan how to use the data.

Output File Naming

OPAL automatically names output files with timestamps: - Format: YYYY-MM-DD_ParserName.json - Example: 2024-01-15_Parser1819.json - CSV files (court data): YYYY-MM-DD_HH-MM-SS_court_cases_[court_type].csv

News Article Output

1819 News Example

When you run:

python -m opal --url https://1819news.com/ --parser Parser1819 --suffix /news/item --max_pages 2

You get a JSON file like this:

{
  "articles": [
    {
      "title": "Alabama lawmakers consider education reform bill",
      "author": "Jane Smith",
      "date": "January 15, 2024",
      "line_count": 45,
      "content": "Full article text appears here...\n\nThe article continues with multiple paragraphs...\n\nAll the content from the webpage is captured."
    },
    {
      "title": "Local community rallies to support food bank",
      "author": "John Doe", 
      "date": "January 14, 2024",
      "line_count": 32,
      "content": "The complete article text...\n\nEvery paragraph is preserved..."
    }
  ],
  "metadata": {
    "source": "https://1819news.com/",
    "parser": "Parser1819",
    "total_articles": 2,
    "scrape_date": "2024-01-15T10:30:45"
  }
}

Alabama Daily News Example

When you run:

python -m opal --url https://www.aldailynews.com/ --parser ParserDailyNews --suffix /articles --max_pages 1

Output structure is identical, but content comes from Alabama Daily News:

{
  "articles": [
    {
      "title": "Birmingham announces new business development initiative",
      "author": "Sarah Johnson",
      "date": "January 15, 2024", 
      "line_count": 38,
      "content": "Birmingham city officials announced today...\n\nThe full article text appears here..."
    }
  ],
  "metadata": {
    "source": "https://www.aldailynews.com/",
    "parser": "ParserDailyNews",
    "total_articles": 1,
    "scrape_date": "2024-01-15T11:15:22"
  }
}

Understanding News Output Fields

Field Description Example
title Article headline "Alabama lawmakers consider..."
author Article author "Jane Smith"
date Publication date "January 15, 2024"
line_count Number of lines in content 45
content Full article text with line breaks "Full article text..."

Court Case Output

JSON Format

When you run:

python -m opal --url https://publicportal.alappeals.gov/portal/search/case/results --parser ParserAppealsAL

You get both JSON and CSV files. Here's the JSON structure:

{
  "status": "success",
  "court": "civil",
  "extraction_time": "2024-01-15 14:30:22",
  "total_cases": 150,
  "pages_processed": 5,
  "cases": [
    {
      "court": "Court of Civil Appeals",
      "case_number": {
        "text": "CL-2024-0001",
        "link": "https://publicportal.alappeals.gov/portal/home/case/caseid/CL-2024-0001"
      },
      "case_title": "Smith v. Jones Construction Company, LLC",
      "classification": "Appeal",
      "filed_date": "01/10/2024",
      "status": "Pending"
    },
    {
      "court": "Court of Civil Appeals",
      "case_number": {
        "text": "CL-2024-0002", 
        "link": "https://publicportal.alappeals.gov/portal/home/case/caseid/CL-2024-0002"
      },
      "case_title": "Johnson Family Trust v. State of Alabama Department of Revenue",
      "classification": "Petition",
      "filed_date": "01/11/2024",
      "status": "Active"
    }
  ],
  "search_parameters": {
    "court_type": "civil",
    "date_range": "last_30_days",
    "exclude_closed": true
  }
}

CSV Format

The same data in CSV format (easier for Excel):

Court,Case Number,Case Title,Classification,Filed Date,Status,Case Link
Court of Civil Appeals,CL-2024-0001,"Smith v. Jones Construction Company, LLC",Appeal,01/10/2024,Pending,https://publicportal.alappeals.gov/portal/home/case/caseid/CL-2024-0001
Court of Civil Appeals,CL-2024-0002,"Johnson Family Trust v. State of Alabama Department of Revenue",Petition,01/11/2024,Active,https://publicportal.alappeals.gov/portal/home/case/caseid/CL-2024-0002

Understanding Court Output Fields

Field Description Example
court Which court "Court of Civil Appeals"
case_number Case identifier with link {"text": "CL-2024-0001", "link": "..."}
case_title Full case name "Smith v. Jones Construction..."
classification Type of case "Appeal", "Petition", "Writ"
filed_date Date case was filed "01/10/2024"
status Current case status "Active", "Pending", "Closed"

Working with Output Files

Opening JSON Files

In a Text Editor: - Right-click the file → Open with → Notepad (Windows) or TextEdit (Mac) - For better formatting, use Notepad++ or VS Code

In a Web Browser: - Drag the JSON file into Chrome or Firefox - Many browsers will format it nicely

In Python:

import json

# Read the file
with open('2024-01-15_Parser1819.json', 'r') as file:
    data = json.load(file)

# Access the data
for article in data['articles']:
    print(f"Title: {article['title']}")
    print(f"Author: {article['author']}")
    print(f"Date: {article['date']}")
    print("---")

Opening CSV Files

In Excel: 1. Double-click the CSV file 2. Excel will open it automatically 3. Columns will be properly separated

In Google Sheets: 1. Go to sheets.google.com 2. File → Import → Upload 3. Select your CSV file

Common Output Scenarios

Scenario 1: No Articles Found

{
  "articles": [],
  "metadata": {
    "source": "https://example.com/",
    "parser": "Parser1819",
    "total_articles": 0,
    "scrape_date": "2024-01-15T10:30:45",
    "note": "No articles found matching the criteria"
  }
}

Scenario 2: Partial Data (Missing Author)

{
  "articles": [
    {
      "title": "Breaking News Article",
      "author": "Unknown",
      "date": "January 15, 2024",
      "line_count": 25,
      "content": "Article content here..."
    }
  ]
}

Scenario 3: Error During Scraping

{
  "status": "partial_success",
  "message": "Completed 3 of 5 pages before encountering error",
  "cases": [...],
  "error_details": "Connection timeout on page 4"
}

File Size Expectations

  • News Articles:
  • ~1-3 KB per article
  • 100 articles ≈ 200-300 KB

  • Court Cases:

  • ~500 bytes per case
  • 1000 cases ≈ 500 KB

Next Steps

Now that you understand the output format: 1. Try the Quick Start Tutorial to generate your own output 2. Learn about Working with Output Data 3. Explore Common Use Cases for practical applications