Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xg data errors using load_fb_match_shooting #404

Open
1Colin11 opened this issue Nov 25, 2024 · 4 comments
Open

xg data errors using load_fb_match_shooting #404

1Colin11 opened this issue Nov 25, 2024 · 4 comments

Comments

@1Colin11
Copy link


For example,

library(tidyverse)
library(worldfootballR)

shot_times =
  load_fb_match_shooting(
    country = "ENG",
    gender = "M",
    tier = "1st"
  ) %>%
  janitor::clean_names()

shot_times %>%
  filter(match_url == "https://fbref.com/en/matches/29e4d6ac/Liverpool-Burnley-February-10-2024-Premier-League", player == "David Datro Fofana") %>%
  View

gives two rows of 0.09 and 0.08 xg.
If you go to the url, in the shots section, his two shots are recorded as 0.47 and 0.36 xg.
There are many examples of this. I have not checked other leagues.
Is it possible the repo could be correctly updated?
Many thanks for the helpful package.

@JaseZiv
Copy link
Owner

JaseZiv commented Nov 25, 2024

Not sure what's happened here... can you provide a few more examples to give me a sense of where to start? I've done a spot check of some La Liga matches and they all look fine.

The processing to enable rapid loading of data is pretty challenging, and needing to backfill potentially many seasons worth of games will be even more so...

@tonyelhabr
Copy link
Collaborator

i wonder if this may be due to fbref updates to the shot log after we have scraped the data? i think Opta often makes adjustments a day or so after a match is completed.

if this were the case, then maybe we should have the scraper wait ~5 days before attempting to scrape a given match week. it seems sort of difficult to confirm whether this is actually the case though.

@JaseZiv
Copy link
Owner

JaseZiv commented Nov 25, 2024

Yeah I think that might be the case @tonyelhabr... The possible solution you've proposed seems like a great idea... I've also been wondering about the whole loading functionality, but maybe we take that discussion offline.

@1Colin11
Copy link
Author

Hey, yeah I'd guess @tonyelhabr is right. Just guessing... Opta will have some sort of review process for ambiguous situations that makes changes to the data post match.

I'd also guess they'd probably want to complete this before a team plays another match (and push it to fbref), so perhaps you could use fixture data and scrape the log 1 day before a given team's next match. Maybe this is too much/fragile, however, I'd say every 5 days is a bad solution since teams often play twice in a week, and you'd want users to have the data available from a teams most recent matches (if that makes sense).

It will be cumbersome to patiently rescrape the entire data, sorry! But it is a fantastic resource for people. Perhaps the most important in the package since it is the most granular. Interested to see what you do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants