
Building a sports-odds pipeline with oddsapiR
Saiem Gilani
Source:vignettes/articles/building-an-odds-pipeline.Rmd
building-an-odds-pipeline.RmdThis article walks through a realistic end-to-end pipeline for
collecting, tidying, analysing, and persisting sports-odds data using
oddsapiR. Every chunk that calls the live API is
eval = FALSE; prose and comments describe realistic output
shapes.
The companion beginner guide lives at oddsapiR: Getting started.
library(oddsapiR)
library(dplyr)
library(tidyr)
library(lubridate)
library(ggplot2)
library(gt)
library(DBI)
library(RSQLite)1. Setup — key handling and budgeting your quota
Before pulling any odds, check that a key is configured and inspect
the current quota balance. toa_requests()
hits the free /v4/sports endpoint, which does not consume a
credit, and returns your balance as a one-row tibble.
# Confirm the key is present (errors loudly if not)
check_toa_key()
# Budget check: how many credits remain before we start pulling?
budget_before <- toa_requests()
budget_before
# # A tibble: 1 x 2
# requests_remaining requests_used
# <int> <int>
# 480 20After each pull, call toa_quota()
to read the headers cached from the most recent response without making
a new network request:
toa_quota()
# # A tibble: 1 x 3
# requests_remaining requests_used requests_last
# <int> <int> <int>
# 477 23 3The requests_last column tells you exactly how many
credits the last call consumed. This is the correct way to audit costs
during development: pull once, inspect toa_quota(), then
multiply out your full pipeline’s total cost before scheduling it.
2. Discover — find active sports and upcoming events
Find active sport keys
toa_sports()
lists every sport the API covers. It is free. Filter to
active == TRUE for sports that are currently in season:
all_sports <- toa_sports(all_sports = TRUE)
# Sports currently in season
in_season <- all_sports %>%
filter(active) %>%
select(key, group, title)
in_season
# key group title
# basketball_nba Basketball NBA
# americanfootball_nfl American Football NFL
# soccer_epl Soccer EPL
# icehockey_nhl Ice Hockey NHL
# ...The key column is the sport_key argument
for every downstream function. The bundled dataset toa_sports_keys
provides the same mapping without a network call:
head(oddsapiR::toa_sports_keys, 10)
#> ── Sports coverage data from the-odds-api.com ──────── oddsapiR 1.0.0 ──
#> ℹ Data updated: 2022-06-17 05:54:44 UTC
#> # A tibble: 10 × 5
#> key group title description has_outrights
#> <chr> <chr> <chr> <chr> <lgl>
#> 1 americanfootball_ncaaf Amer… NCAAF US College… FALSE
#> 2 americanfootball_nfl Amer… NFL US Football FALSE
#> 3 americanfootball_nfl_super_bow… Amer… NFL … Super Bowl… TRUE
#> 4 aussierules_afl Auss… AFL Aussie Foo… FALSE
#> 5 baseball_mlb Base… MLB Major Leag… FALSE
#> 6 baseball_mlb_world_series_winn… Base… MLB … World Seri… TRUE
#> 7 basketball_euroleague Bask… Bask… Basketball… FALSE
#> 8 basketball_nba Bask… NBA US Basketb… FALSE
#> 9 basketball_nba_championship_wi… Bask… NBA … Championsh… TRUE
#> 10 basketball_ncaab Bask… NCAAB US College… FALSEList upcoming events for a sport
toa_sports_events()
lists in-play and pre-match events without odds. Also free. The
id column is the event_id used by
toa_event_odds() and toa_event_markets().
# Events for the next 24 hours
now <- format(Sys.time(), "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
tomorrow <- format(Sys.time() + 86400, "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
nba_events <- toa_sports_events(
sport_key = "basketball_nba",
date_format = "iso",
commence_time_from = now,
commence_time_to = tomorrow
)
nba_events
# # A tibble: 6 x 6
# id sport_key sport_title commence_time home_team away_team
# <chr> <chr> <chr> <chr> <chr> <chr>
# 1 48db9c3293a52baab881d95d38f37a98 basketball_nba NBA 2025-01-15T00:10:00Z Los Angeles Lakers Boston Celtics
# 2 9a1c3f2e8b7d4e5f6c0a2b3d4e5f6a7b basketball_nba NBA 2025-01-15T01:00:00Z Golden State Warriors Milwaukee Bucks
# ...List participants (teams) for a sport
toa_sports_participants()
returns the whitelist of teams or individual competitors. Cost: 1
credit.
nba_teams <- toa_sports_participants(sport_key = "basketball_nba")
nba_teams
# # A tibble: 30 x 3
# sport_key id full_name
# <chr> <chr> <chr>
# 1 basketball_nba par_01hqmkq6fdf1pvq7jgdd7hdmpf Los Angeles Lakers
# 2 basketball_nba par_01hqmkq6fdf1pvq7jgdd7hdmpe Boston Celtics
# ...3. Pull odds across bookmakers
Featured markets for an entire sport
toa_sports_odds()
returns a long-format tibble — one row per (event,
bookmaker, market, outcome). The cost equals
markets × regions.
# Pull moneyline + spreads + totals from US books
# Cost: 3 markets x 1 region = 3 credits
nba_odds <- toa_sports_odds(
sport_key = "basketball_nba",
regions = "us",
markets = "h2h,spreads,totals",
odds_format = "decimal",
date_format = "iso"
)
glimpse(nba_odds)
# Rows: ~360 (6 games x ~5 bookmakers x 3 markets x 2 outcomes per market)
# Columns: id, sport_key, sport_title, commence_time, home_team, away_team,
# bookmaker_key, bookmaker, bookmaker_last_update,
# market_key, market_last_update, outcomes_name, outcomes_price,
# outcomes_point
toa_quota()
# requests_remaining = 477, requests_used = 23, requests_last = 3Detailed props for a single game
toa_event_markets()
lists every market key a bookmaker has opened for a game (1 credit).
Then toa_event_odds()
fetches the odds for specific market keys. Cost = unique markets
returned × regions.
game_id <- "48db9c3293a52baab881d95d38f37a98" # from toa_sports_events()
# Discover available markets for this game (1 credit)
available_markets <- toa_event_markets(
sport_key = "basketball_nba",
event_id = game_id,
regions = "us"
)
available_markets %>%
distinct(bookmaker, market_key) %>%
arrange(bookmaker, market_key)
# bookmaker market_key
# DraftKings h2h
# DraftKings player_points
# DraftKings player_rebounds
# DraftKings spreads
# DraftKings totals
# FanDuel h2h
# FanDuel player_points
# ...
# Fetch player-points props (cost: 1 market x 1 region = 1 credit)
player_pts <- toa_event_odds(
sport_key = "basketball_nba",
event_id = game_id,
regions = "us",
markets = "player_points",
odds_format = "decimal",
date_format = "iso"
)
player_pts %>%
select(bookmaker, outcomes_name, outcomes_description, outcomes_price, outcomes_point) %>%
head(10)
# bookmaker outcomes_name outcomes_description outcomes_price outcomes_point
# DraftKings Over LeBron James 1.91 25.5
# DraftKings Under LeBron James 1.91 25.5
# FanDuel Over LeBron James 1.87 25.5
# FanDuel Under LeBron James 1.95 25.54. Tidy and analyse: de-vig, implied probability, line shopping
The raw data from toa_sports_odds() is already tidy (one
row per outcome), but some light wrangling makes analysis easier.
De-duplicate and compute implied probability
For two-sided markets (spreads, totals) each game has two
rows per bookmaker: one for each side. To pivot wider and
compute implied probability from decimal odds (1 / price),
remove the overround (vig) by dividing each probability by the sum over
all outcomes:
# Work with h2h (moneyline) only
h2h <- nba_odds %>%
filter(market_key == "h2h")
# Implied probability per outcome, before removing vig
h2h_prob <- h2h %>%
group_by(id, bookmaker_key) %>%
mutate(
implied_prob_raw = 1 / outcomes_price,
total_overround = sum(implied_prob_raw),
# Remove vig: normalise so probs sum to 1
implied_prob_fair = implied_prob_raw / total_overround
) %>%
ungroup()
h2h_prob %>%
select(home_team, away_team, bookmaker, outcomes_name,
outcomes_price, implied_prob_raw, implied_prob_fair) %>%
head(6)
# home_team away_team bookmaker outcomes_name price raw_prob fair_prob
# Los Angeles Lakers Boston Celtics DraftKings Los Angeles Lakers 2.05 0.488 0.490
# Los Angeles Lakers Boston Celtics DraftKings Boston Celtics 1.90 0.526 0.510
# Los Angeles Lakers Boston Celtics FanDuel Los Angeles Lakers 2.00 0.500 0.499
# Los Angeles Lakers Boston Celtics FanDuel Boston Celtics 1.87 0.535 0.501The total_overround column tells you each bookmaker’s
vig. A value of 1.05 means the bookmaker takes a 5% margin.
Line shopping: best available price per outcome
best_h2h <- h2h_prob %>%
group_by(id, home_team, away_team, outcomes_name) %>%
slice_max(outcomes_price, n = 1, with_ties = FALSE) %>%
select(home_team, away_team, outcomes_name, bookmaker, outcomes_price, implied_prob_fair) %>%
ungroup() %>%
arrange(home_team, outcomes_name)
best_h2h
# home_team away_team outcomes_name bookmaker price fair_prob
# Los Angeles Lakers Boston Celtics Boston Celtics Bet365 1.95 0.505
# Los Angeles Lakers Boston Celtics Los Angeles Lakers DraftKings 2.10 0.492Handling spreads and totals
Spreads and totals each have two rows per (event, bookmaker). The
outcomes_point column carries the handicap or total
line:
spreads <- nba_odds %>%
filter(market_key == "spreads") %>%
# Label each side more clearly
mutate(side = if_else(outcomes_point < 0, "favourite", "underdog"))
# Best spread price per team across bookmakers
best_spreads <- spreads %>%
group_by(id, home_team, away_team, outcomes_name) %>%
slice_max(outcomes_price, n = 1, with_ties = FALSE) %>%
select(home_team, away_team, outcomes_name, outcomes_point, bookmaker, outcomes_price)
best_spreads
# home_team away_team outcomes_name outcomes_point bookmaker price
# Los Angeles Lakers Boston Celtics Los Angeles Lakers -3.5 BetMGM 1.95
# Los Angeles Lakers Boston Celtics Boston Celtics 3.5 FanDuel 1.955. Historical odds — tracking line movement
The historical endpoints require a paid plan. They use a snapshot
model: supply an ISO 8601 date and get back the data state
at the closest snapshot at or before that time.
Fetch a single historical snapshot
# Spreads for an NBA game 48 hours before tip-off
hist_odds_48h <- toa_sports_odds_history(
sport_key = "basketball_nba",
date = "2024-01-15T00:00:00Z",
regions = "us",
markets = "spreads",
odds_format = "decimal",
date_format = "iso"
)
hist_odds_48h %>%
select(timestamp, previous_timestamp, next_timestamp,
home_team, away_team, bookmaker, outcomes_name,
outcomes_price, outcomes_point) %>%
head(4)
# timestamp prev_ts next_ts home_team ...
# 2024-01-14T23:58:00Z 2024-01-14T23:53:00Z 2024-01-15T00:03:00Z Los Angeles LakersPage through snapshots to build a line-movement dataset
Use the next_timestamp from one response as the
date parameter of the next call to walk forward in time.
The cost for toa_sports_odds_history() is 10× the standard
rate (10 × markets × regions per call), so plan accordingly.
# Build a 6-snapshot series for a specific event
event_id <- "93af4b300a4c0dded909234ea32e9abd"
start_date <- "2024-01-13T12:00:00Z"
n_snapshots <- 6
snapshots <- list()
current_date <- start_date
for (i in seq_len(n_snapshots)) {
snap <- toa_event_odds_history(
sport_key = "basketball_nba",
event_id = event_id,
date = current_date,
regions = "us",
markets = "spreads",
odds_format = "decimal",
date_format = "iso"
)
if (nrow(snap) == 0) break
snapshots[[i]] <- snap
current_date <- snap$next_timestamp[[1]]
# Be kind to the API rate limit
Sys.sleep(0.5)
}
line_movement <- bind_rows(snapshots) %>%
filter(market_key == "spreads", bookmaker_key == "draftkings") %>%
select(timestamp, home_team, away_team, outcomes_name,
outcomes_price, outcomes_point)
line_movement
# timestamp home_team away_team outcomes_name price point
# 2024-01-13T12:00:00Z Los Angeles Lakers Boston Celtics Los Angeles Lakers 1.91 -4.5
# 2024-01-13T17:00:00Z Los Angeles Lakers Boston Celtics Los Angeles Lakers 1.91 -5.0
# 2024-01-14T00:00:00Z Los Angeles Lakers Boston Celtics Los Angeles Lakers 1.95 -5.5
# ...6. Persist and schedule — SQLite storage + quota-aware scheduling
Store snapshots to SQLite
Use DBI + RSQLite to keep a running
archive. Key each row by pulled_at so you can track when
each snapshot was collected.
# --- one-time setup ---
con <- dbConnect(RSQLite::SQLite(), "odds_archive.sqlite")
# Create the table if it doesn't exist yet
dbExecute(con, "
CREATE TABLE IF NOT EXISTS odds_snapshots (
pulled_at TEXT,
snapshot_ts TEXT,
event_id TEXT,
sport_key TEXT,
sport_title TEXT,
commence_time TEXT,
home_team TEXT,
away_team TEXT,
bookmaker_key TEXT,
bookmaker TEXT,
market_key TEXT,
outcomes_name TEXT,
outcomes_price REAL,
outcomes_point REAL
)
")
# --- called on each scheduled pull ---
pull_and_store <- function(sport_key, regions = "us", markets = "h2h,spreads") {
pulled_at <- format(Sys.time(), "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
raw <- toa_sports_odds(
sport_key = sport_key,
regions = regions,
markets = markets,
odds_format = "decimal",
date_format = "iso"
)
if (nrow(raw) == 0) return(invisible(NULL))
to_store <- raw %>%
mutate(pulled_at = pulled_at) %>%
select(
pulled_at, snapshot_ts = bookmaker_last_update,
event_id = id, sport_key, sport_title,
commence_time, home_team, away_team,
bookmaker_key, bookmaker, market_key,
outcomes_name, outcomes_price,
# outcomes_point may be NA for h2h
outcomes_point = dplyr::any_of("outcomes_point")
)
dbAppendTable(con, "odds_snapshots", to_store)
cli::cli_alert_success("Stored {nrow(to_store)} rows at {pulled_at}.")
}
# Run the pull
pull_and_store("basketball_nba")
# When done for the session, close the connection
dbDisconnect(con)Quota-aware scheduling
The Odds API’s quota resets monthly. Before each scheduled pull, check your remaining balance and bail early if you are running low:
safe_pull <- function(sport_key, min_remaining = 50, ...) {
budget <- toa_requests()
if (budget$requests_remaining < min_remaining) {
cli::cli_alert_warning(
"Only {budget$requests_remaining} credits remain -- skipping pull."
)
return(invisible(NULL))
}
pull_and_store(sport_key, ...)
}For cron-based scheduling (e.g. via cronR on Linux/macOS
or Task Scheduler on Windows), a pull every 5 minutes for two markets
across one region costs 2 credits per call. At 12 calls/hour × 24h × 30
days = 8,640 calls per month. Plan your pull frequency according to your
plan’s monthly quota.
7. Visualize — best lines table and line-movement chart
Best-lines summary table with gt
best_lines_table <- best_h2h %>%
select(
Game = home_team,
Opponent = away_team,
Side = outcomes_name,
Book = bookmaker,
`Best Price` = outcomes_price,
`Fair Prob` = implied_prob_fair
) %>%
mutate(`Fair Prob` = scales::percent(`Fair Prob`, accuracy = 0.1)) %>%
gt() %>%
tab_header(
title = "Best Available Moneyline (Decimal)",
subtitle = paste("Pulled:", format(Sys.time(), "%Y-%m-%d %H:%M UTC", tz = "UTC"))
) %>%
fmt_number(columns = `Best Price`, decimals = 2) %>%
cols_align("center", columns = c(`Best Price`, `Fair Prob`))
best_lines_tableLine-movement chart with ggplot2
line_movement_chart <- line_movement %>%
mutate(timestamp = as.POSIXct(timestamp, format = "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")) %>%
ggplot(aes(x = timestamp, y = outcomes_point,
colour = outcomes_name, group = outcomes_name)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
scale_colour_brewer(palette = "Set1", name = "Side") +
scale_x_datetime(date_labels = "%b %d\n%H:%M", date_breaks = "12 hours") +
labs(
title = "DraftKings Spread Line Movement",
subtitle = paste(
unique(line_movement$home_team),
"vs.",
unique(line_movement$away_team)
),
x = "Snapshot Time (UTC)",
y = "Spread Point (negative = favourite)",
caption = "Source: The Odds API via oddsapiR"
) +
theme_minimal(base_size = 13) +
theme(legend.position = "bottom")
line_movement_chartClosing notes: SportsDataverse
oddsapiR is part of the SportsDataverse, a collection of
open-source R and Python packages for sports analytics. Companion
packages include:
- cfbfastR — college football play-by-play
- hoopR — men’s basketball (NBA + MBB)
- wehoop — women’s basketball (WNBA + WBB)
- fastRhockey — hockey (NHL + PHF/PWHL)
- baseballr — baseball (MLB + NCAA)
Each package follows the same tidy-data conventions, making it
straightforward to join odds data from oddsapiR with
play-by-play or box-score data from the other packages.
File issues and feature requests at the oddsapiR GitHub repository.