Support: [email protected]
This repository is an R package to easily stream StatsBomb data into R using your log in credentials for the API or free data from our GitHub page. API access is for paying customers only
This package offers a parallel option to most computationally expensive functions. However, it is currently only designed for Windows.
- Please first make sure you are on version 3.6.2 or later of R before attempting to install
- If not yet installed into R, run:
install.packages("devtools")
- Then, install this R package as:
devtools::install_github("statsbomb/StatsBombR")
- Finally,
library(StatsBombR)
This package depends on several other packages in order for all functions to run. Therefore, if you have problems with any functions or with installing the package, it is likely due to package dependencies.
Welcome to the Free Data Offerings from StatsBomb Services.
This package is reading in the open access dat found on https://github.com/statsbomb/open-data. Below you will find a list of the functions used to quickly read in all open data currently available. Check back often as new data is regularly added.
StatsBomb are committed to sharing new data and research publicly to enhance understanding of the game of Football. We want to actively encourage new research and analysis at all levels. Therefore we have made certain leagues of StatsBomb Data freely available for public use for research projects and genuine interest in football analytics.
StatsBomb are hoping that by making data freely available, we will extend the wider football analytics community and attract new talent to the industry. We would like to collect some basic personal information about users of our data. By giving us your email address, it means we will let you know when we make more data, tutorials and research available. We will store the information in accordance with our Privacy Policy and the GDPR.
Whilst we are keen to share data and facilitate research, we also urge you to be responsible with the data. Please register your details on https://www.statsbomb.com/resource-centre and read our User Agreement carefully.
By using this repository, you are agreeing to the user agreement.
If you publish, share or distribute any research, analysis or insights based on this data, please state the data source as StatsBomb and use our logo.
StatsBombData <- free_allevents()
FreeCompetitions()
or, for use in other functions, store it as a data frame object:
Comp <- FreeCompetitions()
Matches <- FreeMatches(Comp)
get.matchFree(Matches[1,])
It is important to note, that the argument here is the entire row returns from "FreeMatches", this is because there is information from each match observation that is needed in the get.matchFree
function.
API access is for paying customers only
competitions <- competitions(username, password)
matches <- get.matches(username, password, season_id, competition_id)
- Pull Competitions From the API:
comps <- competitions(username, password)
- Filter for the competitions you want:
EuropeComps <- comps %>% filter(country_name == "Europe")
- Create a matrix of the competition and season ids:
competitionmatrix <- as.matrix(EuropeComps[,1:2])
- Pull all of the matches:
Matches <- MultiCompMatches(username, password, competitionmatrix)
StatsBombData <- get.events(username, password, match_id)
Note: A previous version of this function was named get.match(), get.match() is now deprecated).
- Create a vector of match IDs:
matchids <- matchesvector(username, password, season_id, competition_id)
StatsBombData <- allevents(username, password, matchids)
Note: See documentation for additional parameters available to access different API versions, run in parallel or not, choose a specific number of cores. (A previous version of this function was named allmatches(), allmatches() is now deprecated).
- Pull Competitions From the API:
comps <- competitions(username, password)
- Filter for the competitions you want:
EuropeComps <- comps %>% filter(country_name == "Europe")
- Create a matrix of the competition and season ids:
competitionmatrix <- as.matrix(EuropeComps[,1:2])
- Pull all of the events:
Events <- MultiCompEvents(username, password, competitionmatrix)
lineups <- get.lineups(username, password, match_id)
matchids <- matchesvector(username, password, season_id, competition_id)
StatsBombLineups <- allineups(username, password, matchids, parallel = T)
StatsBombLineups <- cleanlineups(StatsBombLineups)
Although JSON files can often be a pain to clean, especially due to nested data frames, these helper functions may make your data wrangling much easier.
StatsBombData <- allclean(StatsBombData)
This function cleans the data in one line of code by running each of the functions below sequentially.
StatsBombData <- cleanlocations(StatsBombData)
Please note all location variables must be present in the data set. This function will not work with a subset of variables (i.e. if any location variables are missing).
StatsBombData <- goalkeeperinfo(StatsBombData)
Please note that additional information is located under type.name == "Goal Keeper" and within the Freeze Frames.
StatsBombData <- shotinfo(StatsBombData)
StatsBombData <- freezeframeinfo(StatsBombData)
Description of these variables:
- Density is calculated as the aggregated inverse distance for each defender behind the ball.
- Density in the cone is the density filtered for only defenders who are in the cone between the shooter, and each goal post.
StatsBombData <- formatelapsedtime(StatsBombData)
StatsBombData <- possessioninfo(StatsBombData)
- Some of the cleaning functions above depend on variables created in the functions presented before them. In order to be safe, please clean your data in the order that is presented in this document.
- Please re-install frequently, as new functions and bug fixes will be added regularly.
- As always, check out the Rdocumentation for each function (ex.
?StatsBombFreeEvents()
) for more specific description. - Please contact [email protected] with bugs and suggestions.