-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatch between available comps in Readme and in rds files #374
Comments
RE: load functions. I like the idea of having a function to list possible cups, comps, and years. We'll have to think of the right way to automate it. Right now, it wouldn't be hard to write a function to list available country+gender+tier for a given data set. DATA_REPO <- 'JaseZiv/worldfootballR_data'
get_possible_stashed_data <- function(tag, include_years = FALSE) {
raw <- piggyback::pb_list(DATA_REPO, tag = tag)
grid <- raw |>
tibble::as_tibble() |>
dplyr::filter(
tools::file_ext(file_name) == 'rds'
) |>
dplyr::select(file_name) |>
tidyr::separate_wider_regex(
file_name,
c(country = '^[A-Z]+', '_', gender = '[MF]', '_', tier = '1st|2nd', '_', extra = '.*$'),
cols_remove = FALSE
) |>
dplyr::select(
file_name,
country,
gender,
tier
)
grid
if (isFALSE(include_years)) {
return(grid |> dplyr::select(-file_name))
}
## would have to read in files to identify years
}
possible_data <- get_possible_stashed_data(
tag = 'fb_match_summary'
)
possible_data
#> # A tibble: 13 × 3
#> country gender tier
#> <chr> <chr> <chr>
#> 1 BRA M 1st
#> 2 ENG F 1st
#> 3 ENG M 1st
#> 4 ENG M 2nd
#> 5 ESP M 1st
#> 6 FRA M 1st
#> 7 GER M 1st
#> 8 ITA M 1st
#> 9 MEX M 1st
#> 10 NED M 1st
#> 11 POR M 1st
#> 12 USA F 1st
#> 13 USA M 1st It becomes more involved if you want to list seasons as well, since, as of now, we don't store that in a CSV anywhere, nor in the name of the stashed data files (which is why it's not hard to extract country, gender, and tier). As things stand now, you'd have to read in the data file, then extract the unique seasons. The data files can be slow to load, so this is not ideal. I'd have to think of a robust solution to this. |
RE: mismatched names. Yes, I've seen this kind of things with MLS team names, where they changed the name of a team at some point (e.g. 'Sporting Kansas City' -> 'Sporting KC'), either during the middle of the season or between seasons. I'm not sure what the best, general solution is to ensuring name consistency over time. Perhaps, we could re-scrape data like a year after it occurred, assuming that names are no longer being changed at that point. Obviously this would take a lot of time. Perhaps there are shortcuts for checking self-consistency. |
Maybe tying the names to a team ID? In the URLS in FBRef teams appear to have an ID For example, Grenoble Foot appears to have the id: 40aa7280 https://fbref.com/en/squads/40aa7280/Grenoble-Foot-Stats Maybe that can be scrapped and used to track changes to names? |
Competition list is outdated in the README for load_match_comp_results
Some competitions have changed name in the rds files
e.g
English Football League Cup
is nowEFL Cup
Copa America too
and UEFA Euro comps
More generally it would be nice to have a function listing all the competitions/country/league id which are available for each load functions so we could get the load data programmatically.
Thanks for this great package!
The text was updated successfully, but these errors were encountered: