Mismatch between available comps in Readme and in rds files #374

BenoitLondon · 2024-05-03T11:24:53Z

Competition list is outdated in the README for load_match_comp_results

Some competitions have changed name in the rds files
e.g English Football League Cup is now EFL Cup
Copa America too
and UEFA Euro comps

More generally it would be nice to have a function listing all the competitions/country/league id which are available for each load functions so we could get the load data programmatically.

Thanks for this great package!

The text was updated successfully, but these errors were encountered:

tonyelhabr · 2024-06-18T03:12:41Z

RE: load functions. I like the idea of having a function to list possible cups, comps, and years. We'll have to think of the right way to automate it. Right now, it wouldn't be hard to write a function to list available country+gender+tier for a given data set.

DATA_REPO <- 'JaseZiv/worldfootballR_data'
get_possible_stashed_data <- function(tag, include_years = FALSE) {
  raw <- piggyback::pb_list(DATA_REPO, tag = tag)
  
  grid <- raw |> 
    tibble::as_tibble() |> 
    dplyr::filter(
      tools::file_ext(file_name) == 'rds'
    ) |> 
    dplyr::select(file_name) |> 
    tidyr::separate_wider_regex(
      file_name,
      c(country = '^[A-Z]+', '_', gender = '[MF]', '_', tier = '1st|2nd', '_', extra = '.*$'),
      cols_remove = FALSE
    ) |> 
    dplyr::select(
      file_name,
      country,
      gender,
      tier
    )
  grid
  
  if (isFALSE(include_years)) {
    return(grid |> dplyr::select(-file_name))
  }
  
  ## would have to read in files to identify years
}

possible_data <- get_possible_stashed_data(
  tag = 'fb_match_summary'
)
possible_data
#> # A tibble: 13 × 3
#>    country gender tier 
#>    <chr>   <chr>  <chr>
#>  1 BRA     M      1st  
#>  2 ENG     F      1st  
#>  3 ENG     M      1st  
#>  4 ENG     M      2nd  
#>  5 ESP     M      1st  
#>  6 FRA     M      1st  
#>  7 GER     M      1st  
#>  8 ITA     M      1st  
#>  9 MEX     M      1st  
#> 10 NED     M      1st  
#> 11 POR     M      1st  
#> 12 USA     F      1st  
#> 13 USA     M      1st

It becomes more involved if you want to list seasons as well, since, as of now, we don't store that in a CSV anywhere, nor in the name of the stashed data files (which is why it's not hard to extract country, gender, and tier). As things stand now, you'd have to read in the data file, then extract the unique seasons. The data files can be slow to load, so this is not ideal.

I'd have to think of a robust solution to this.

tonyelhabr · 2024-06-18T03:15:31Z

RE: mismatched names. Yes, I've seen this kind of things with MLS team names, where they changed the name of a team at some point (e.g. 'Sporting Kansas City' -> 'Sporting KC'), either during the middle of the season or between seasons.

I'm not sure what the best, general solution is to ensuring name consistency over time. Perhaps, we could re-scrape data like a year after it occurred, assuming that names are no longer being changed at that point. Obviously this would take a lot of time. Perhaps there are shortcuts for checking self-consistency.

DDE1989 · 2024-07-28T15:34:38Z

Maybe tying the names to a team ID? In the URLS in FBRef teams appear to have an ID

For example, Grenoble Foot appears to have the id: 40aa7280

https://fbref.com/en/squads/40aa7280/Grenoble-Foot-Stats

Maybe that can be scrapped and used to track changes to names?

BenoitLondon changed the title ~~Mismatch between available comps in REadme and in rds files~~ Mismatch between available comps in Readme and in rds files May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch between available comps in Readme and in rds files #374

Mismatch between available comps in Readme and in rds files #374

BenoitLondon commented May 3, 2024 •

edited

Loading

tonyelhabr commented Jun 18, 2024

tonyelhabr commented Jun 18, 2024

DDE1989 commented Jul 28, 2024

Mismatch between available comps in Readme and in rds files #374

Mismatch between available comps in Readme and in rds files #374

Comments

BenoitLondon commented May 3, 2024 • edited Loading

tonyelhabr commented Jun 18, 2024

tonyelhabr commented Jun 18, 2024

DDE1989 commented Jul 28, 2024

BenoitLondon commented May 3, 2024 •

edited

Loading