10 Player Contribution

The goal of this chapter is to build models to account for individual contributions toward a team.

library("tidyverse"); theme_set(theme_bw())

Warning: package 'purrr' was built under R version 4.4.1

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4.9000     ✔ readr     2.1.5     
✔ forcats   1.0.0          ✔ stringr   1.5.1     
✔ ggplot2   3.5.1          ✔ tibble    3.2.1     
✔ lubridate 1.9.3          ✔ tidyr     1.3.1     
✔ purrr     1.0.4          
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library("DT")

Load necessary code

source("../../R/construct_matrix.R")

10.1 Basic

10.1.1 Plus-minus

The plus-minus statistic is the most basic of statistics to estimate player contribution. For each player, you

sum the points scored by that players team while that player was playing and
subtract the sum of points scored by the opposing team while that player was playing.

One drawback of this plus-minus statistic is that it doesn’t take into account who is playing with or against you. Thus if you are the best defender on a team and thus you are always subbed into and out of the game so that you are defending the best offensive player on the opposing team, your overall and defensive contribution may look poor.

10.1.2 Adjusted plus-minus

Adjusted plus-minus statistics, like those discussed below, keep track of all the other players on the court and adjusts accordingly. Recall that multiple regression models “adjust for other explanatory variables”.

10.2 Models

Many of the models we have introduced earlier can be used to estimate player contribution. Here we will introduce how to use margin of victory, win-loss, and offense-defense models in the context of measuring player contribution.

10.2.1 Margin of Victory

Recall that the margin of victory models use the difference between two teams scores, i.e. the margin of victory, to estimate the overall strength of a team. Here we will use the margin of victory to compute the contribution of players.

To do so, we need to organize the data by every combination of players that are on the court, calculate the margin while those players are on the court, and then build a model for those margins.

Let \(M_g\) be the margin of points scored while a set of players are on the court. Assume \[M_g = \sum_{p \in H[g]} \theta_p - \sum_{p\in A[g]} \theta_p+ \epsilon_g\] where

\(H[g]\) is the set of players on the home team during this time and
\(A[g]\) is the set of players on the away team during this time.

We have the following parameters

\(\theta_p\) is the strength player \(p\).

In this model, we only observe noisy estimates of the difference between collections of strengths. Thus we have an identifiability issue that will manifest itself in one of the estimates not being estimable. We will treat this non-estimable strength as 0 and are allowed to add a constant to the \(\theta\)s.

The parameters are transitive and thus the model is not capable of estimating complementary play among players on a team or match ups with opposing players.

If we assume \(\epsilon_g \stackrel{ind}{\sim} N(0,\sigma^2)\), then we have a regression model. To estimate the parameters in this regression model, we will need to construct a matrix that has a row for each combination of players and a column for each player. Each row contains a unique combination of home and away players with a 1 if the home player is included in the combination and a -1 if the away player is included in the combination.

10.2.2 Win-(tie-)loss

A win-loss or win-tie-loss model modifies the margin of victory model by only recording whether, in this combination of players, the home team won, tied, or lost to the opposing players. Then a logistic (or ordinal logistic) regression model is fit using the margin of victory matrix.

For high-scoring sports, using the margin provides much more information about contribution of players. In low-scoring sports, e.g. hockey or soccer, using these models may be useful.

10.2.3 Offense-defense

Rather than only using the margin of victory (or the win/tie/loss result) we can also estimate offense and defense contribution of the players by utilizing both home and away scores.

Let \(S^H_g\) (\(S^A_g\)) be the points scored by the home team while a combination of players are on the court. Assume \[S^H_g = \sum_{p \in H[g]} \theta_p - \sum_{p\in A[g]} \delta_p + \epsilon^H_g\] and \[S^A_g = \sum_{p \in A[g]} \theta_p - \sum_{p\in H[g]} \delta_p+ \epsilon^A_g\]

We have the following parameters

\(\theta_p\) is the offensive rating of player \(p\) and
\(\delta_p\) is the defensive rating of player \(p\).

In this model, we only observe noisy estimates of the difference between offense and defense ratings. Thus we have an identifiability issue that will manifest itself in one of the estimates not being estimable. We will treat this non-estimable rating as 0 and we are allowed to add a constant to the \(\theta\)s and (the same constant) to the \(\delta\)s.

The parameters are transitive and thus the model is not capable of estimating complementary play among players on a team or match ups with opposing players.

If we assume \(\epsilon^H_g,\epsilon^A_g \stackrel{ind}{\sim} N(0,\sigma^2)\), then we have a regression model. To estimate the parameters in this regression model, we will need to construct two matrices that each have a row for each combination of players and a column for each player. In the home matrix, each cell is a 1 if that home player is in the combination and a 0 otherwise. In the home matrix, each cell is a 1 if that away player is in the combination and a 0 otherwise.

The margin of victory (and win-tie-loss) matrix is actually just the difference between the home and away matrix just described.

10.3 Examples

10.3.1 FRC MOSE 2025

The First Robotics Competition (FRC) is a high school robotics competition. Each match in an FRC competition pits two alliances (red and blue) each composed of 3 robots from different teams. The robots within an alliance cooperate and compete against the robots in the other alliance to score points. The alliance with more points at the end wins the match.

tmp <- read_csv("../../data/frc_mose2025.csv")

Rows: 66 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Match, Start Time
dbl (8): Red 1, Red 2, Red 3, Blue 1, Blue 2, Blue 3, Red Final, Blue Final

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Create factor for all teams
teams <- tmp |>
  select(`Red 1`:`Blue 3`) |>
  pivot_longer(everything()) |>
  pull(value) |>
  unique() |>
  sort()

mose2025 <- tmp |>
  mutate(
    `Red 1`  = factor(`Red 1`,  levels = teams),
    `Red 2`  = factor(`Red 2`,  levels = teams),
    `Red 3`  = factor(`Red 3`,  levels = teams),
    `Blue 1` = factor(`Blue 1`, levels = teams),
    `Blue 2` = factor(`Blue 2`, levels = teams),
    `Blue 3` = factor(`Blue 3`, levels = teams)
  )

mose2025 |> datatable(filter = "top", rownames = FALSE)

Construct the model matrices.

# Construct model matrices
X_red <- construct_matrix(mose2025, "Red 1", 36) +
  construct_matrix(mose2025, "Red 2", 36) +
  construct_matrix(mose2025, "Red 3", 36) 

X_blue <- construct_matrix(mose2025, "Blue 1", 36) +
  construct_matrix(mose2025, "Blue 2", 36) +
  construct_matrix(mose2025, "Blue 3", 36) 

# Team 8112 did not arrive on time and thus is missing in its first two matches
table(rowSums(X_red))


 2  3 
 2 64

colSums(X_red) + colSums(X_blue)

 [1] 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
[26] 11 11 11 11 11 11  9 11 11 11 11

10.3.1.1 Margin-of-victory

# Margin of victory
margin <- mose2025$`Red Final` - mose2025$`Blue Final`
X      <- X_red - X_blue

# Fit model
m <- lm(margin ~ 0 + X)
summary(m)


Call:
lm(formula = margin ~ 0 + X)

Residuals:
    Min      1Q  Median      3Q     Max 
-32.072 -12.012  -0.991   9.348  28.648 

Coefficients:
     Estimate Std. Error t value Pr(>|t|)   
X1   32.78327   22.03586   1.488  0.14726   
X2   35.32327   22.13049   1.596  0.12094   
X3  -13.11301   21.10238  -0.621  0.53903   
X4   34.70001   19.76075   1.756  0.08929 . 
X5   53.45324   18.23908   2.931  0.00641 **
X6   -7.14437   21.22680  -0.337  0.73878   
X7   -2.67667   21.56301  -0.124  0.90204   
X8   -8.61222   21.24984  -0.405  0.68814   
X9   12.79973   21.38853   0.598  0.55404   
X10   0.40000   20.33682   0.020  0.98444   
X11  33.17262   19.32958   1.716  0.09644 . 
X12  -2.92442   20.55895  -0.142  0.88784   
X13  21.59756   20.91798   1.032  0.31010   
X14  24.78412   21.66550   1.144  0.26169   
X15  34.92238   21.42684   1.630  0.11359   
X16   8.41969   22.25842   0.378  0.70789   
X17  13.16429   21.66096   0.608  0.54793   
X18  16.34456   23.47253   0.696  0.49158   
X19  -4.75531   20.89914  -0.228  0.82155   
X20  22.66517   21.43216   1.058  0.29871   
X21  45.81500   21.40169   2.141  0.04054 * 
X22  31.37322   24.05015   1.304  0.20199   
X23  15.74974   19.99957   0.788  0.43716   
X24   7.17833   21.62800   0.332  0.74227   
X25   3.44937   19.26132   0.179  0.85908   
X26 -23.00223   21.66266  -1.062  0.29678   
X27  39.36746   20.85684   1.888  0.06880 . 
X28  14.98666   18.28304   0.820  0.41885   
X29   8.88407   20.87734   0.426  0.67348   
X30   0.31866   21.28422   0.015  0.98815   
X31  -0.01757   20.92329  -0.001  0.99934   
X32  -7.57706   21.74861  -0.348  0.72998   
X33   0.13821   21.78107   0.006  0.99498   
X34 -13.09630   22.12502  -0.592  0.55834   
X35  -5.65949   20.83653  -0.272  0.78778   
X36   9.70302   20.85504   0.465  0.64511   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 21.05 on 30 degrees of freedom
Multiple R-squared:  0.9216,    Adjusted R-squared:  0.8275 
F-statistic: 9.793 on 36 and 30 DF,  p-value: 4.045e-09

Extract parameters

mose2025_margin <- data.frame(
  team   = teams,
  rating = coef(m)
) |>
  mutate(
    rating = ifelse(is.na(rating), 0, rating),
    rating = rating - mean(rating),
    team   = factor(team, levels = team[order(rating)])
  ) |>
  arrange(desc(team))

mose2025_margin |> 
  datatable(filter = "top", 
            rownames = FALSE) |>
  formatRound(columns = "rating", digits = 1)

Let’s plot the teams

ggplot(mose2025_margin,
       aes(
         x = rating,
         y = team
       )) +
  geom_bar(stat="identity") +
  labs(
    x = "Rating",
    y = "Team",
    title = "FRC MOSE 2025 - Qualification"
  )

10.3.1.2 Offense-defense rating

# Offense-defense analysis
Y <- c(mose2025$`Red Final`, mose2025$`Blue Final`)

X <- rbind(
  cbind(X_red, -X_blue),
  cbind(X_blue, -X_red)
)

# Fit model
m <- lm(Y ~ 0 + X)
summary(m)


Call:
lm(formula = Y ~ 0 + X)

Residuals:
    Min      1Q  Median      3Q     Max 
-36.207  -7.766  -0.071   7.573  38.986 

Coefficients:
    Estimate Std. Error t value Pr(>|t|)    
X1   43.8951    13.3222   3.295 0.001656 ** 
X2   34.9304    13.5500   2.578 0.012414 *  
X3   10.8295    12.8455   0.843 0.402546    
X4   35.9272    12.2269   2.938 0.004677 ** 
X5   62.3330    11.3390   5.497 8.34e-07 ***
X6   -1.4760    13.0057  -0.113 0.910021    
X7    5.4824    13.2685   0.413 0.680939    
X8   -7.0574    13.0420  -0.541 0.590422    
X9   15.3925    13.2219   1.164 0.248964    
X10  19.5784    12.5320   1.562 0.123483    
X11  39.9903    11.9146   3.356 0.001375 ** 
X12   6.4936    12.5972   0.515 0.608110    
X13  29.0438    12.7899   2.271 0.026760 *  
X14  25.7603    13.3206   1.934 0.057850 .  
X15  49.6175    13.1305   3.779 0.000365 ***
X16  10.3956    13.5956   0.765 0.447488    
X17  27.6079    13.2784   2.079 0.041886 *  
X18  29.0033    14.2419   2.036 0.046123 *  
X19   6.7156    12.7538   0.527 0.600445    
X20  30.6782    13.1763   2.328 0.023285 *  
X21  52.6416    13.0133   4.045 0.000152 ***
X22  43.4943    14.5655   2.986 0.004086 ** 
X23  29.7568    12.2956   2.420 0.018561 *  
X24  15.8235    13.1610   1.202 0.233972    
X25   3.2810    11.9871   0.274 0.785246    
X26 -11.6460    13.2949  -0.876 0.384539    
X27  43.3526    12.7749   3.394 0.001227 ** 
X28  19.7888    11.4961   1.721 0.090342 .  
X29  21.3382    12.8701   1.658 0.102544    
X30  -1.9155    13.1388  -0.146 0.884578    
X31   4.2237    12.7994   0.330 0.742553    
X32  -6.1444    13.5890  -0.452 0.652785    
X33  14.3661    13.1934   1.089 0.280559    
X34  -2.0014    13.3977  -0.149 0.881753    
X35   6.2746    12.7327   0.493 0.623956    
X36   1.7197    12.7955   0.134 0.893538    
X37 -11.1118    13.3222  -0.834 0.407541    
X38   0.3929    13.5500   0.029 0.976964    
X39 -23.9425    12.8455  -1.864 0.067233 .  
X40  -1.2272    12.2269  -0.100 0.920387    
X41  -8.8797    11.3390  -0.783 0.436642    
X42  -5.6684    13.0057  -0.436 0.664519    
X43  -8.1591    13.2685  -0.615 0.540931    
X44  -1.5548    13.0420  -0.119 0.905503    
X45  -2.5927    13.2219  -0.196 0.845199    
X46 -19.1784    12.5320  -1.530 0.131184    
X47  -6.8177    11.9146  -0.572 0.569315    
X48  -9.4181    12.5972  -0.748 0.457603    
X49  -7.4462    12.7899  -0.582 0.562617    
X50  -0.9761    13.3206  -0.073 0.941827    
X51 -14.6952    13.1305  -1.119 0.267533    
X52  -1.9759    13.5956  -0.145 0.884934    
X53 -14.4436    13.2784  -1.088 0.281057    
X54 -12.6587    14.2419  -0.889 0.377640    
X55 -11.4709    12.7538  -0.899 0.372031    
X56  -8.0130    13.1763  -0.608 0.545389    
X57  -6.8266    13.0133  -0.525 0.601801    
X58 -12.1211    14.5655  -0.832 0.408608    
X59 -14.0071    12.2956  -1.139 0.259152    
X60  -8.6452    13.1610  -0.657 0.513773    
X61   0.1684    11.9871   0.014 0.988841    
X62 -11.3563    13.2949  -0.854 0.396403    
X63  -3.9851    12.7749  -0.312 0.756161    
X64  -4.8021    11.4961  -0.418 0.677646    
X65 -12.4542    12.8701  -0.968 0.337089    
X66   2.2341    13.1388   0.170 0.865551    
X67  -4.2413    12.7994  -0.331 0.741521    
X68  -1.4327    13.5890  -0.105 0.916386    
X69 -14.2278    13.1934  -1.078 0.285168    
X70 -11.0949    13.3977  -0.828 0.410885    
X71 -11.9341    12.7327  -0.937 0.352372    
X72   7.9833    12.7955   0.624 0.535045    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16.99 on 60 degrees of freedom
Multiple R-squared:  0.9834,    Adjusted R-squared:  0.9634 
F-statistic: 49.23 on 72 and 60 DF,  p-value: < 2.2e-16

Extract parameters

mose2025_offense_defense <- data.frame(
  team   = rep(teams, times = 2),
  type   = rep(c("offense","defense"), each = length(teams)),
  rating = coef(m)
) |>
  mutate(
    rating = ifelse(is.na(rating), 0, rating),
    rating = rating - mean(rating)
  ) |>
  pivot_wider(
    names_from = "type", 
    values_from = "rating") |>
  mutate(
    strength = offense + defense,
    team    = factor(team, team[order(strength)])
  ) |>
  arrange(desc(team))

mose2025_offense_defense |> 
  datatable(filter = "top", 
            rownames = FALSE) |>
  formatRound(columns = c("offense","defense","strength"), digits = 1)

Let’s plot the teams

ggplot(mose2025_offense_defense |>
         pivot_longer(
           offense:strength,
           names_to = "type",
           values_to = "rating"),
       aes(
         x = rating,
         y = team
       )) +
  geom_point(aes(color = type)) +
  # geom_bar(stat="identity") +
  # facet_wrap(~type) +
  labs(
    x = "Rating",
    y = "Team",
    title = "FRC MOSE 2025 - Qualification"
  )

10.4 Player Contribution Systems

Winval
WAR

10.5 Summary

These models could be extended or modified in many different directions.

10.5.1 Time

In the FRC example, the time is fixed by the match duration. In other sports, the time may be variable as we are looking at how long a combination of players is on the field. Incorporating the duration on the field would have two impacts: 1) the expected margin (or scores) would be expected to increase and 2) the variability would increase.

If \(T_g\) is the time associated with margin \(M_g\), then we can modify our margin of victory model accordingly

\[M_g = T_g\left(\sum_{p \in H[g]} \theta_p - \sum_{p\in A[g]} \theta_p \right)+ \epsilon_g, \quad \epsilon_g \stackrel{ind}{\sim} N(0,T_g \sigma^2)\]

Now, we need each row in our model matrix to be multiplied by the corresponding \(T_g\), e.g. multiply row 1 by \(T_1\). Then we can use weighted regression to incorporate \(T_g\) into the variance.

Our offense-defense contribution would be modified by

\[S^H_g = T_g\left(\sum_{p \in H[g]} \theta_p - \sum_{p\in A[g]} \delta_p \right)+ \epsilon^H_g, \quad \epsilon^H_g \stackrel{ind}{\sim} N(0,T_g\sigma^2)\] and \[S^A_g = T_g\left(\sum_{p \in A[g]} \theta_p - \sum_{p\in H[g]} \delta_p \right)+ \epsilon^A_g, \quad \epsilon^A_g \stackrel{ind}{\sim} N(0,T_g\sigma^2).\] To incorporate these changes, we would multiply each row in our matrices by the corresponding \(T_g\) and use weighted regression for the \(T_g\) in the variance.

10.5.2 Home advantage

The presentation here as ignored the home advantage. If we want to incorporate the home advantage (and we also have time) then we could add the home advantage into our model. In the margin of victory model \[M_g = T_g\left(\eta + \sum_{p \in H[g]} \theta_p - \sum_{p\in A[g]} \theta_p \right)+ \epsilon_g, \quad \epsilon_g \stackrel{ind}{\sim} N(0,T_g \sigma^2).\] The interpretation of \(\eta\) would be per unit time using whatever time units were used for \(T_g\). To get the entire home advantage, we would calculate \(\eta \sum_{g=1}^G T_g\). A similar change can be made in the offense-defense model for \(S^H_g\).

10.5.3 Probability

Probability calculations in these models is performed similarly to probability calculations in the associated margin of victory, win-(tie)-loss, or offense-defense models. Since in these models we are looking at the contribution of players, for the probability calculation we will need to determine which combination of players we are looking at. Once we have determined those players, we will calculate the appropriate difference in ratings.

10.5.4 Small counts

I have previously mentioned that in sports with small counts, e.g. soccer or hockey, it may be preferable to utilize a win-(tie-)loss model. Even in high scoring sports, e.g. basketball, when we are looking at a specific combination of players on the court, then we may have a small number of points scored (for both sides) when that specific combination is on the court. Thus, we may want to look into use Poisson (or negative binomial) based models for the offense-defense rating.

A Poisson model may look something like

\[S^H_g \stackrel{ind}{\sim} Po(\lambda^H_g)\] with \[\log(\lambda^H_g) = \sum_{p \in H[g]} \theta_p - \sum_{p\in A[g]} \delta_p.\] In these models, the interpretation of the parameters changed are are mostly easily understood as their multiplicative effect. For example, \[E[S^H_g] = \left. \prod_{p \in H[g]} e^{\theta_p} \right/ \prod_{p \in A[g]} e^{\theta_p}.\]

10.5.5 Estimability

Recall that identifiability is based on whether the parameters can be estimated with any data, i.e. the best data, while estimability is based on whether parameters can be estimated with a particular data set. Player contributions in some sports are very difficult to estimate with these models because insufficient combinations of players are available. Baseball and softball generally have the same set of players playing the field and the battling order is the same. Soccer generally has the same set of players on the field, and the sport is low scoring. Volleyball generally has the same players in the same order on the court. Even a sport like basketball can be affected by this issue if one player is immediately subbed in or out when another player is subbed (either on the same team or the other team).