Quick Start Tutorial¶

Welcome! This hands-on tutorial will walk you through your first OPAL scraping tasks. By the end, you'll have successfully scraped both news articles and court cases.

Before You Begin¶

Make sure you've completed the Complete Setup Guide. You should have: - ✅ Python installed - ✅ Virtual environment activated (you see (venv) in your terminal) - ✅ OPAL installed - ✅ Google Chrome installed (for court scraping)

Tutorial 1: Your First News Scrape¶

Let's start by scraping a few articles from 1819 News.

Step 1: Understand the Command¶

Here's what we'll run:

python -m opal --url https://1819news.com/ --parser Parser1819 --suffix /news/item --max_pages 2

Let's break this down: - python -m opal - Runs OPAL as a module - --url https://1819news.com/ - The website to scrape - --parser Parser1819 - Which parser to use (specific to this news site) - --suffix /news/item - Only scrape URLs containing this pattern (helps identify articles) - --max_pages 2 - Limit to 2 pages (keeps it quick for testing)

Step 2: Run the Command¶

Make sure your virtual environment is activated
Copy and paste the command above
Press Enter

Step 3: What You'll See¶

Starting OPAL web scraper...
Using Parser1819 for https://1819news.com/
Collecting article URLs...
Found 15 article URLs
Processing articles...
[1/15] Scraping: https://1819news.com/news/item/...
[2/15] Scraping: https://1819news.com/news/item/...
...
Scraping complete!
Output saved to: 2024-01-15_Parser1819.json
Total articles scraped: 15

Step 4: Check Your Output¶

Look in your project folder for a file like 2024-01-15_Parser1819.json
Open it with a text editor
You'll see structured data for each article

Tip: If the file looks messy, try opening it in a web browser for better formatting!

Common Issues and Solutions¶

"No articles found" - The website might have changed its structure - Try without the --suffix parameter - Check if the website is accessible in your browser

"Connection error" - Check your internet connection - The website might be temporarily down - Try again in a few minutes

Tutorial 2: Scraping Court Cases¶

Now let's scrape some court case data. This uses Selenium, so it might take a bit longer.

Step 1: Basic Court Scrape¶

Run this command:

python -m opal --url https://publicportal.alappeals.gov/portal/search/case/results --parser ParserAppealsAL --max_pages 2

Parameters explained: - --parser ParserAppealsAL - Uses the Alabama Appeals Court parser - --max_pages 2 - Limits to 2 pages of results

Step 2: What Happens During Court Scraping¶

You'll see:

Starting OPAL web scraper...
Initializing Chrome browser (headless mode)...
Loading court portal...
Waiting for page to render...
Found 30 cases per page
Processing page 1 of 2...
Processing page 2 of 2...
Saving results...
Output saved to: 2024-01-15_court_cases.json
CSV output saved to: 2024-01-15_143022_court_cases_all.csv
Total cases scraped: 60

Note: Court scraping is slower because: - It launches a real Chrome browser (in hidden mode) - It waits for JavaScript to load - The court website has rate limiting

Step 3: Check Both Output Files¶

You now have two files: 1. JSON file: Complete data with all details 2. CSV file: Same data in spreadsheet format

Open the CSV file in Excel to see a nice table of court cases!

Tutorial 3: Advanced Court Search¶

Let's search for specific types of cases using the configurable extractor.

Step 1: Search Recent Civil Appeals¶

python -m opal.configurable_court_extractor --court civil --date-period 7d --exclude-closed

This searches for: - Civil court cases only - Filed in the last 7 days - Excluding closed cases

Step 2: Understanding Date Periods¶

You can use these date period options: - 7d - Last 7 days - 1m - Last month - 3m - Last 3 months - 6m - Last 6 months - 1y - Last year

Step 3: Custom Date Range¶

For specific dates:

python -m opal.configurable_court_extractor --court criminal --start-date 2024-01-01 --end-date 2024-01-15

Understanding Success¶

You know your scrape was successful when: 1. ✅ No error messages appear 2. ✅ Output files are created 3. ✅ The files contain data (not empty) 4. ✅ File sizes are reasonable (>1KB)

Practice Exercises¶

Try these on your own:

Exercise 1: Scrape More Articles¶

python -m opal --url https://1819news.com/ --parser Parser1819 --max_pages 5

- How many articles did you get? - How long did it take?

Exercise 2: Different News Source¶

python -m opal --url https://www.aldailynews.com/ --parser ParserDailyNews --max_pages 3

- Compare the output structure - Are the fields the same?

Exercise 3: Search Specific Court Cases¶

python -m opal.configurable_court_extractor --court supreme --date-period 1m --max-pages 3

- How many Supreme Court cases did you find? - What classifications do you see?

What's Next?¶

Congratulations! You've successfully: - ✅ Scraped news articles from two sources - ✅ Extracted court case data - ✅ Used advanced search parameters - ✅ Generated both JSON and CSV output

Next Steps:¶

Review the Output Examples to better understand your data
Learn about Common Use Cases
Set up Automated Daily Scraping

Pro Tips:¶

Start with small --max_pages values while learning
Always check the first few results before scraping everything
Save important command variations in a text file for reuse
Court scraping is slower - be patient!

Need Help?¶

If something isn't working: 1. Make sure your virtual environment is activated 2. Check your internet connection 3. Verify the website is accessible in your browser 4. Review error messages carefully - they often explain the issue 5. Try with --max_pages 1 first to isolate problems

Remember: Every expert was once a beginner. Keep experimenting, and you'll be a pro in no time!