An open source model for projecting MLB season outcomes
Satchel is a simple MLB projection model meant to provide a basic forecast with relatively little effort. Much like the Marcel The Monkey Forecasting System, it is not the most advanced projection method and its projections shouldn't be treated as the word of God. I like to describe it as a minimum effort projection. Satchel essentially boils each game in a given season down to a weighted coin flip based on the amount of talent on each team's 40-man roster and then flips that coin thousands of times.
Team talent is calculated by summing up the total WAR on their 40-man roster given by FanGraphs' Depth Charts projections, then dividing that number by the mean (or median if you so chose) total for the league. By default, Satchel uses the Bradley-Terry model to calculate probabilities. With this model, the probability that team A beats team B is:
P(A Wins) = exp(TA) ÷ [exp(TA) + exp(TB)]Ti is the talent level of team i.
Satchel also supports Elo-style probability calculations where the probability team A beats team B is instead given by:
1 ÷ (1 + 10[TA - TB] / 400)Once each team's talent has been calculated, Satchel will simulate each season 10,000 times (the user can change that number if they wish), adding random noise to each team's talent each time. The results are then averaged to come up with the final projected win/loss totals and the probability each team win the wild card, division, World Series, etc.
Using Satchel is straightforward. Create an instance of the model class, tell it to run, then wait. Running the model will return a new class object, SatchelResults
, which contains the results of the simulation and a few fun methods for analyzing those results. To get started, you only need three lines of code:
from satchel.model import Satchel
model = Satchel()
results = model.simulate()
The Satchel
class supports a number of optional arguments to customize your model.
talent_measure
:str
"mean" or "median". Each team's total WAR will be compared to the league'stalent_measure
to determine their talent valuetransactions
:dict
Dictionary containing any transactions to include in the simulation. The format of the dictionary should be:{player_fangraphs_id: {"team": new_team, "date": effective_date}}
. Theplayer_fangraphs_id
variable is the ID FanGraphs has assigned to that player. Thenew_team
is the three-letter abbreviation for the team the player is being traded to. Theeffective_date
variable should be formattedYYYY-MM-DD
.noise
: bool: If true, random noise will be added to each team's talent measure during the simulationseed
: int or float: seed used for random draws, by default Nonesteamer_p_wt
: float: Weight placed on steamer pitcher projectionszips_p_wt
: float: Weight placed on ZIPs pitcher projectionssteamer_b_wt
: float: Weight placed on steamer batter projectionszips_b_wt
: float: Weight placed on ZIPs batter projectionsschedule
: Path, str: Path to a CSV with the season schedulepitcher_proj
: Path, str: Path to a CSV with pitcher WAR projections suitable for Satchelbatter_proj
: Path, str: Path to a CSV with batter WAR projections suitable for Satcheluse_current_results
: bool: If true, Satchel will simulate the season from today's date and add those results to each team's current record. This includes using both the team's records and the player's stats on the season in the talent ca* lculations. If false, Satchel will simulate the full season using the provided schedule and pre-season projectionswar_method
: str: Method used for calculating all player's remaining WAR. Ifonly_projections
a player's final WAR will be their WAR to date plus their projected WAR multiplied by the fraction of the season remaining. Ifcurrent_pace
, it will be their current WAR plus their projected WAR multiplied by the remaining fraction of the season and their relative production rate. The latter is calculated by multiplying their projection by the fraction of the season already played and dividing their WAR to date by that numbercache
: bool: If true, the new scheudle generated will be cached
After running the model, the SatchelResults
class has a number of methods and attributes for summarizing the results.
Attributes
al_central
: A Data Frame summarizing the results of the AL Central teamsal_east
: A Data Frame summarizing the results of the AL East teamsal_west
: A Data Frame summarizing the results of the AL West teamsbase_talent
: A DataFrame containing each team's base talemt, i.e., their talent without any noise added.div_counter
: Counts the number of times each team won their division in the simulationsfull_seasons
: A list of DataFrames, each representing a simulated season.league_counter
: Counts the number of times each team won their league in the simulations.merged_schedule
: If the simulations are run mid-season and current results are merged in, this will be a DataFrame with the season-to-date results and remaining schedule merged together.n
: The number of times the simulations were run.nl_central
: A Data Frame summarizing the results of the NL Central teamsnl_east
: A Data Frame summarizing the results of the NL East teamsnl_west
: A Data Frame summarizing the results of the NL West teamsnoise
: The noise added to each team's talent in each simulation.playoff_counter
: Counts the number of times each team made the playoffs in the simulations.playoff_matchups
: A DataFrame containing all the playoff matchups in each simulation.results_df
: A DataFrame containing the full results of each simulationschedule
: A DataFrame with the schedule used in the simulations.season_summary
: A DataFrame summarizing the results.seed
: The random seed used in the simulation.trades
: A dictionary with all the trades used in the simulations.wc_counter
: Counts the number of times each team won a wild card spot in the simulations.ws_counter
: Counts the number of times each team won the World Series in the simulations.
Methods
boxplot()
: Creates a boxplot that shows the distribution of wins for each team.season_to_date()
: Creates a table that shows season results to date, and the projected remainder of the season.season_percentile(team, wins)
: Returns the percentile of the wins distribution for the specified team the given number of wins would be.results_dist_chart(team, cmap)
: Creates a bar chart showing the distribution of results for the specified team.results_grid()
: Creates a grid of bar charts showing the distribution of results for each team.results_scatter(team, offset, y)
: Creates a scatter plot that shows the distribution of results for the specified team.
Despite its methodological simplicity, Satchel does a pretty good job. When I compared it to FanGraphs' projections in 2021, it held its own. And it can be used for fun experiments like trying to get the Angels to the postseason (a very hard task). Is this whole thing a bit overdone for how simple the model is? Possibly. And I don't care.