ParserDailyNews API Reference¶
Parser for Alabama Daily News website.
Class Definition¶
class ParserDailyNews(BaseParser):
def __init__(self, url="https://www.aldailynews.com/", suffix="/news/item", max_pages=5)
Constructor Parameters¶
Parameter | Type | Default | Description |
---|---|---|---|
url |
str | "https://www.aldailynews.com/" |
Base URL for Alabama Daily News |
suffix |
str | "/news/item" |
URL suffix for article links |
max_pages |
int | 5 |
Maximum pages to scrape |
Methods¶
extract_article_data()¶
Extracts articles from Alabama Daily News.Returns: List of article dictionaries with keys:
- title
: Article headline
- content
: Full article text
- date
: Publication date
- author
: Article author
- url
: Article URL
- category
: Article category
get_article_links()¶
Extracts article URLs from an Alabama Daily News page.Parameters:
- page_url
: URL of the page to scrape
Returns: List of article URLs
parse_article()¶
Parses individual Alabama Daily News article.Parameters:
- article_url
: URL of the article
Returns: Dictionary with article data
Usage Example¶
from opal.ParserDailyNews import ParserDailyNews
# Create parser instance
parser = ParserDailyNews(
url="https://www.aldailynews.com/",
suffix="/news/item",
max_pages=5
)
# Extract articles
articles = parser.extract_article_data()
# Save to JSON
parser.save_to_json(articles, "adn_articles.json")
# Process articles
for article in articles:
print(f"Title: {article['title']}")
print(f"Category: {article['category']}")
print(f"Date: {article['date']}")
Output Format¶
{
"title": "Governor Announces Infrastructure Plan",
"content": "Full article text...",
"date": "2024-01-15",
"author": "Jane Doe",
"url": "https://www.aldailynews.com/news/item/infrastructure-plan",
"category": "Politics"
}
Notes¶
- Handles Alabama Daily News specific HTML structure
- Extracts category information when available
- Supports pagination through page parameters
- Includes robust error handling for missing elements