Pro Football Reference¶
The PFR SDK scrapes Pro Football Reference pages for historical NFL data. It uses a headless browser via Browserless to render JavaScript-heavy pages and then parses the HTML into Pydantic models.
Setup¶
Browserless Configuration¶
The PFR client requires a Browserless instance to render pages. Set the following environment variables:
export BROWSERLESS_HOST="your-browserless-host.example.com"
export BROWSERLESS_TOKEN="your_browserless_api_token"
Then create the client:
Note
PFR does not require authentication — Pro Football Reference is a public website. However, it does require a Browserless instance because PFR uses JavaScript to render some tables.
Custom Browserless Config¶
You can customize the Browserless connection settings:
from griddy.pfr import GriddyPFR
from griddy.pfr.backends import BrowserlessConfig
pfr = GriddyPFR(
browserless_config=BrowserlessConfig(
proxy="residential",
request_timeout=60000,
ttl=30000,
),
)
| Config Field | Default | Description |
|---|---|---|
proxy |
"residential" |
Proxy type for Browserless requests |
request_timeout |
60000 |
Request timeout in milliseconds |
ttl |
30000 |
Time-to-live for browser sessions |
Game Details¶
Fetch full box score data for a specific game:
game = pfr.games.get_game_details(game_id="202502090kan")
print(f"Score: {game.home_score} - {game.away_score}")
The game_id is the PFR game identifier, which follows the format YYYYMMDD0<team_abbr> (e.g., 202502090kan for the Super Bowl on February 9, 2025 at Kansas City).
Season Schedule¶
Retrieve the complete schedule for a season:
schedule = pfr.schedule.get_season_schedule(season=2024)
for game in schedule:
print(f"Week {game.week}: {game.away_team} @ {game.home_team}")
Player Profiles¶
Look up player profiles by their PFR player ID:
Player IDs use PFR's format: first four letters of last name + first two of first name + a two-digit disambiguator (e.g., MahoPa00 for Patrick Mahomes).
Team Data¶
Team Season Stats¶
Team Franchise History¶
Team abbreviations use PFR's format (e.g., kan for Kansas City, phi for Philadelphia, nwe for New England).
Other Endpoints¶
The PFR SDK provides 17 endpoint categories:
| Sub-SDK | Description |
|---|---|
pfr.awards |
NFL awards data |
pfr.coaches |
Coach records and history |
pfr.draft |
Historical draft data |
pfr.executives |
Front office personnel |
pfr.fantasy |
Fantasy football data |
pfr.frivolities |
Fun stats and records |
pfr.games |
Game box scores |
pfr.hof |
Hall of Fame data |
pfr.leaders |
Statistical leaders |
pfr.officials |
Game officials |
pfr.players |
Player profiles |
pfr.probowl |
Pro Bowl rosters |
pfr.schedule |
Season schedules |
pfr.schools |
College/school data |
pfr.seasons |
Season-level summaries |
pfr.stadiums |
Stadium information |
pfr.superbowl |
Super Bowl data |
pfr.teams |
Team stats and history |
How It Works¶
The PFR scraping pipeline follows these steps:
- URL Construction — The endpoint builds a URL from a path template and parameters
- HTML Fetching — Browserless renders the page using a residential proxy and returns the fully-rendered HTML
- Preprocessing — Hidden
<table>elements (wrapped in HTML comments by PFR) are unmasked - Parsing — A dedicated parser extracts structured data from the HTML using BeautifulSoup
- Validation — The parsed data is validated into Pydantic models via
model_validate()
Each endpoint has a dedicated parser in griddy.pfr.parsers tailored to the specific HTML structure of that PFR page.
Rate Limiting¶
Pro Football Reference has rate limiting in place. The SDK does not implement automatic rate limiting, so keep these guidelines in mind:
- Space requests apart when making many consecutive calls
- Use reasonable timeouts to avoid overloading the service
- Cache responses locally when working with historical data that doesn't change
import time
seasons = [2020, 2021, 2022, 2023, 2024]
schedules = []
for season in seasons:
schedule = pfr.schedule.get_season_schedule(season=season)
schedules.append(schedule)
time.sleep(2) # Be respectful of PFR's servers
Error Handling¶
PFR-specific errors live in griddy.pfr.errors:
from griddy.pfr.errors import ParsingError, NoResponseError
try:
game = pfr.games.get_game_details(game_id="202502090kan")
except ParsingError as e:
print(f"Failed to parse page at {e.url}: {e}")
except NoResponseError:
print("Browserless returned an empty response")
| Exception | Description |
|---|---|
GriddyPFRError |
Base exception for all PFR errors |
GriddyPFRDefaultError |
Default PFR scraping error |
ParsingError |
HTML parser failure (includes the url field) |
NoResponseError |
Empty response from Browserless |
ResponseValidationError |
Pydantic model validation failure |
Context Manager¶
Clean up Browserless resources when done: