-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path06-linear_models-plots.qmd
1727 lines (1401 loc) · 80.4 KB
/
06-linear_models-plots.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
```{r include=FALSE}
source("R/common.R")
knitr::opts_chunk$set(fig.path = "figs/ch06/")
```
::: {.content-visible unless-format="pdf"}
{{< include latex/latex-commands.qmd >}}
:::
# Plots for univariate response models {#sec-linear-models-plots}
For a univariate linear model fit using `lm()`, `glm()` and similar functions, the standard `plot()`
method gives basic versions of _diagnostic_ plots of residuals and other calculated quantities for assessing
possible violations of the model assumptions.
Some of these can be considerably enhanced using other packages.
Beyond this,
* tables of model coefficients, standard errors and test statistics can often be usefully
supplemented or even replaced by suitable _coeficient plots_ providing essentially the same information.
* when there are two or more predictors, standard plots of $y$ vs. each $x$ can be misleading because they ignore the other predictors in the model.
_Added-variable plots_ (also called _partial regression plots_) allow you to visualize _partial_ relations
between the response and a predictor by adjusting (controlling for) all other predictors. _Component + residual_ plots help to diagnose nonlinearity.
* in this same situation, you can more easily understand their separate impact
on the response by plotting the marginal _predicted effects_ of one or more focal variables, averaging
over other variables not shown in a given plot.
* when there are highly correlated predictors, some specialized plots are useful to
understand the nature of _multicolinearity_.
The classic reference on regression diagnostics is @Belsley-etal:80.
My favorite modern texts are the brief @Fox2020 and the more complete @FoxWeisberg:2018,
both of which are supported by the `r package("car", cite=TRUE)`. Some of the examples in
this chapter are inspired by @FoxWeisberg:2018.
**Packages**
In this chapter I use the following packages. Load them now:
```{r load-pkgs}
library(car)
library(dplyr)
library(easystats)
library(effects)
library(ggeffects)
library(ggplot2)
library(ggstats)
library(marginaleffects)
library(modelsummary)
library(performance)
library(tidyr)
```
```{r bib-pkgs}
#| echo: false
# to generate .bib items
library(ggeffects)
library(marginaleffects)
```
## The "regression quartet"
For a fitted model, plotting the model object with `plot(model)` provides for any of six basic plots,
of which four are produced by default, giving rise to the term _regression quartet_ for this collection.
These are:
* **Residuals vs. Fitted**: For well-behaved data, the points should hover around a horizontal line at residual = 0,
with no obvious pattern or trend.
* **Normal Q-Q plot**: A plot of sorted standardized residuals $e_i$ (obtained from`rstudent(model)`) against the theoretical values those values would have in a standard normal $\mathcal{N}(0, 1)$ distribution.
* **Scale-Location**: Plots the square-root of the absolute values of the standardized residuals $\sqrt{| e_i |}$
as a measure of "scale" against the fitted values $\hat{y}_i$ as a measure of "location". This provides an assessment
of homogeneity of variance. Violations appear as a tendency for scale (variability) to vary with location.
* **Residuals vs. Leverage**: Plots standardized residuals against leverage to help identify possibly influential observations.
Leverage, or "hat" values (given by `hat(model)`) are proportional to the squared Mahalanobis distances of
the predictor values $\mathbf{x}_i$ from the means, and measure the potential of an observation to
change the fitted coefficients if that observation was deleted. Actual influence is measured by Cooks's distance
(`cooks.distance(model)`) and is proportional to the product of residual times leverage. Contours of constant
Cook's $D$ are added to the plot.
One key feature of these plots is providing **reference** lines or smoothed curves for ease of judging the
extent to which a plot conforms to the expected pattern; another is the **labeling** of observations which
deviate from an assumption.
The base-R `plot(model)` plots are done much better in a variety of packages.
I illustrate some versions from the **car** [@R-car] and **performance** [@Ludecke-etal-performance] packages,
part of the **easystats** [@R-easystats] suite of packages.
### Example: Duncan's occupational prestige {#sec-example-duncan}
In a classic study in sociology, @Duncan:61 used data from the U.S. Census in 1950 to study how one could
predict the prestige of occupational categories --- which is hard to measure ---
from available information in the census for those occupations. His data is available in `carData:Duncan`, and contains
* `type`: the category of occupation, one of `prof` (professional), `wc` (white collar) or `bc` (blue collar);
* `income`: the percentage of occupational incumbents with a reported income > 3500 (about 40,000 in current dollars);
* `education`: the percentage of occupational incumbents who were high school graduates;
* `prestige`: the percentage of respondents in a social survey who rated the occupation as “good” or better in prestige.
These variables are a bit quirky in they are measured in percents, 0-100, rather dollars for `income` and
years for `education`, but this common scale permitted Duncan to ask an interesting sociological question:
Assuming that both income and education predict prestige, are they equally important, as might be
assessed by testing the hypothesis $\mathcal{H}_0: \beta_{\text{income}} = \beta_{\text{education}}$.
If so, this would provide a very simple theory for occupational prestige.
A quick look at the data shows the variables and a selection of the occupational categories, which are
the `row.names()` of the dataset.
```{r duncan}
data(Duncan, package = "carData")
set.seed(42)
car::some(Duncan)
```
Let's start by fitting a simple model using just income and education as predictors. The results look very good!
Both `income` and `education` are highly significant and the $R^2 = 0.828$ for the model indicates
that `prestige` is very well predicted by just these variables.
```{r duncan-mod}
duncan.mod <- lm(prestige ~ income + education, data=Duncan)
summary(duncan.mod)
```
Beyond this, Duncan was interested in the coefficients and whether income and education could be said
to have equal impacts on predicting occupational prestige. A nice display of model coefficients with
confidence intervals is provided by `parameters::model_parameters()`.
```{r duncan-coef}
parameters::model_parameters(duncan.mod)
```
We can also test Duncan's hypothesis that income and education have equal effects on prestige
with `car::linearHypothesis()`. This is constructed as a test of a restricted model in which
the two coefficients are forced to be equal against the unrestricted model. Duncan was very happy with
this result.
```{r duncan-linhyp}
car::linearHypothesis(duncan.mod, "income = education")
```
Equivalently, the linear hypothesis that $\beta_{\text{Inc}} = \beta_{\text{Educ}}$ can be tested
with a Wald test for difference between these coefficients, expressed as
$\mathcal{H}_0 : \mathbf{C} \mathbf{\beta} = 0$, using $\mathbf{C} = (0, -1, 1)$.
The estimated value, -0.053 has a confidence interval [-0.462, 0.356], consistent with
Duncan's hypothesis.
```{r duncan-wald}
wtest <- spida2::wald(duncan.mod, c(0, -1, 1))[[1]]
wtest$estimate
```
We can visualize this test and confidence intervals using a joint confidence ellipse for the
coefficients for income and education in the model `duncan.mod`. In @fig-duncan-beta-diff
the size of the ellipse is set to $\sqrt{F^{0.95}_{1,\nu}} = t^{0.95}_{\nu}$, so that
its shadows on the horizontal and vertical axes correspond to 1D 95% confidence intervals.
In this plot, the line through the origin with slope $= 1$ corresponds to
equal coefficients for income and education and the line with slope $= -1$
corresponds to their difference, $\beta_{\text{Educ}} - \beta_{\text{Inc}}$.
The orthogonal projection of the coefficient vector
$(\widehat{\beta}_{\text{Inc}}, \widehat{\beta}_{\text{Educ}})$ (the center of the ellipse)
is the point estimate of $\widehat{\beta}_{\text{Educ}} - \widehat{\beta}_{\text{Inc}}$
and the shadow of the ellipse along this axis is the 95% confidence interval for the difference
in slopes.
<!-- figure-code: R/Duncan/dunc-conf-ellipse.R -->
```{r echo=-1}
#| label: fig-duncan-beta-diff
#| code-fold: true
#| code-summary: See the code
#| out-width: "80%"
#| fig-width: 8
#| fig-height: 7
#| fig-cap: "Joint 95% confidence ellipse for $(\\beta_{\\text{Inc}}, \\beta_{\\text{Educ}})$, together with their 1D shadows, which give 95% confidence intervals for the separate coefficients and the linear hypothesis that the coefficients are equal. Projecting the confidence ellipse along the line with unit slope gives a confidence interval for the difference between coefficients, shown by the dark red line."
op <- par(mar = c(4,4, 1, 0)+ .3)
confidenceEllipse(duncan.mod, col = "blue",
levels = 0.95, dfn = 1,
fill = TRUE, fill.alpha = 0.2,
xlim = c(-.4, 1),
ylim = c(-.4, 1), asp = 1,
cex.lab = 1.3,
grid = FALSE,
xlab = expression(paste("Income coefficient, ", beta[Inc])),
ylab = expression(paste("Education coefficient, ", beta[Educ])))
abline(h=0, v=0, lwd = 2)
# confidence intervals for each coefficient
beta <- coef( duncan.mod )[-1]
CI <- confint(duncan.mod) # confidence intervals
lines( y = c(0,0), x = CI["income",] , lwd = 5, col = 'blue')
lines( x = c(0,0), y = CI["education",] , lwd = 5, col = 'blue')
points(rbind(beta), col = 'black', pch = 16, cex=1.5)
points(diag(beta) , col = 'black', pch = 16, cex=1.4)
arrows(beta[1], beta[2], beta[1], 0, angle=8, len=0.2)
arrows(beta[1], beta[2], 0, beta[2], angle=8, len=0.2)
# add line for equal slopes
abline(a=0, b = 1, lwd = 2, col = "darkgreen")
text(0.8, 0.8, expression(beta[Educ] == beta[Inc]),
srt=45, pos=3, cex = 1.5, col = "darkgreen")
# add line for difference in slopes
col <- "darkred"
x <- c(-1.5, .5)
lines(x=x, y=-x)
text(-.15, -.15, expression(~beta["Educ"] - ~beta["Inc"]),
col=col, cex=1.5, srt=-45)
# confidence interval for b1 - b2
wtest <- spida2::wald(duncan.mod, c(0, -1, 1))[[1]]
lower <- wtest$estimate$Lower /2
upper <- wtest$estimate$Upper / 2
lines(-c(lower, upper), c(lower,upper), lwd=6, col=col)
# projection of (b1, b2) on b1-b2 axis
beta <- coef( duncan.mod )[-1]
bdiff <- beta %*% c(1, -1)/2
points(bdiff, -bdiff, pch=16, cex=1.3)
arrows(beta[1], beta[2], bdiff, -bdiff,
angle=8, len=0.2, col=col, lwd = 2)
# calibrate the diff axis
ticks <- seq(-0.3, 0.3, by=0.2)
ticklen <- 0.02
segments(ticks, -ticks, ticks-sqrt(2)*ticklen, -ticks-sqrt(2)*ticklen)
text(ticks-2.4*ticklen, -ticks-2.4*ticklen, ticks, srt=-45)
```
<!-- ```{r} -->
<!-- #| label: fig-duncan-beta-diff-include -->
<!-- #| echo: false -->
<!-- #| out-width: "80%" -->
<!-- #| fig-cap: "Joint 95% confidence ellipse for $(\beta_{\text{Inc}}, \beta_{\text{Educ}}), together with their 1D shadows, which give 95% confidence intervals for the separate coefficients and the linear hypothesis that the coefficients are equal." -->
<!-- knitr::include_graphics("images/Duncan-beta-diff.png") -->
<!-- ``` -->
#### Diagnostic plots
But, should Duncan be **so** happy? It is unlikely that he ran any model diagnostics or plotted
his model; we do so now. Here is the regression quartet (@fig-duncan-plot-model) for this model. Each plot shows
some trend lines, and importantly, labels some observations that stand out and might deserve attention.
```{r}
#| label: fig-duncan-plot-model
#| out-width: "100%"
#| fig-cap: "Regression quartet of diagnostic plots for the `Duncan` data. Several possibly unusual observations are labeled."
op <- par(mfrow = c(2,2),
mar = c(4,4,3,1)+.1)
plot(duncan.mod, lwd=2, pch=16)
par(op)
```
Some points to note:
* A few observations (minister, reporter, conductor, contractor) are flagged in multiple panels.
It turns out (@sec-duncan-influence)
that the observations for minister and reporter noted in the residuals vs. leverage plot are highly influential and largely responsible for Duncan's finding that the slopes for income and
education could be considered equal.
* The `r colorize("red")` trend line in the scale-location plot indicates that residual variance is not constant, but rather increases from both ends. This is a consequence of the fact that `prestige` is measured as a percentage, bounded at [0, 100], and the standard deviation of a percentage $p$ is proportional to $\sqrt{p \times (1-p)}$ which is maximal at $p = 0.5.
Similar, but nicer-looking diagnostic plots are provided by `performance::check_model()`
which uses `ggplot2` for graphics. These include helpful captions indicating
what should be observed for each for a good-fitting model. However, they don't have
as good facilities for labeling unusual observations as the base R `plot()` or
functions in the `car` package.
```{r}
#| label: fig-duncan-check-model
#| out-width: "100%"
#| fig-cap: "Diagnostic plots for the `Duncan` data, using `check_model()`."
check_model(duncan.mod, check=c("linearity", "qq", "homogeneity", "outliers"))
```
### Example: Canadian occupational prestige {#sec-example-prestige}
<!-- **TODO**: Already introduced in @sec-prestige Pair this down & integrate with that -->
Following @Duncan:61, occupational prestige was studied in a Canadian context by Bernard Blishen
and others at York University, giving the dataset `carData::Prestige` which we looked at
in @sec-prestige.
It differs from the `Duncan` dataset primarily in that the main variables---prestige, income and education
were revamped to better reflect the underlying constructs in more meaningful units.
* `prestige`: Rather than a simple percentage of "good+" ratings, this uses a wider and more reliable scale
from @Pineo-Porter-1967 on a scale from 10--90.
* `income` is measured as the average income of incumbents in each occupation, in 1971 dollars, rather than
percent exceeding a given threshold ($3500)
* `education` is measured as the average education of occupational incumbents, years.
The dataset again includes `type` of occupation with the same levels `"bc"` (blue collar), `"wc"` (white collar) and `"prof"` (professional)[^reorder-type],
but in addition includes the percent of `women` in these
occupational categories.
[^reorder-type]: Note that the factor `type` in the dataset has its levels ordered alphabetically.
For analysis and graphing it is useful to reorder the levels in the natural increasing order.
An alternative is to make `type` an _ordered_ factor, but this would represent it
using polynomial contrasts for linear and quadratic trends, which seems unuseful in this context.
Our interest again is in understanding how `prestige`
is related to census measures of the average education, income, percent women of incumbents in those
occupations, but with attention to the scales of measurement and possibly more complex relationships.
```{r prestige}
data(Prestige, package="carData")
# Reorder levels of type
Prestige$type <- factor(Prestige$type,
levels=c("bc", "wc", "prof"))
str(Prestige)
```
We fit a main-effects model using all predictors (ignoring `census`, the Canadian Census occupational code):
```{r prestige-mod}
prestige.mod <- lm(prestige ~ education + income + women + type,
data=Prestige)
```
`plot(model)` produces four separate plots. For a quick look, I like to arrange them in a single 2x2 figure.
```{r}
#| label: fig-plot-prestige-mod
#| out-width: "100%"
#| fig-show: hold
#| fig-cap: "Regression quartet of diagnostic plots for the `Prestige` data. Several possibly unusual observations are labeled in each plot."
op <- par(mfrow = c(2,2),
mar=c(4,4,3,1)+.1)
plot(prestige.mod, lwd=2, cex.lab=1.4)
par(op)
```
## Other Model plots
**TODO**: What goes here?
## Coefficient displays
The results of linear models are most often reported in tables and typically with "significance stars"
(`*, **, ***`) to indicate the outcome of hypothesis tests. These are useful for looking up precise values and you can use this format to compare a small number of competing models side-by-side.
However, as illustrated by @KastellecLeoni:2007, plots of coefficients can increase the clarity of presentation
and make it easier to draw correct conclusions. Yet, when you need to present tables, there is a variety of tools
in R that can help make them attractive in publications.
For illustration, I'll consider three models for the `Prestige` data of increasing complexity:
* `mod1` fits the main effects of the three quantitative predictors;
* `mod2` adds the categorical variable `type` of occupation;
* `mod3` allows an interaction of `income` with `type`.
```{r prestige-models}
mod1 <- lm(prestige ~ education + income + women,
data=Prestige)
mod2 <- lm(prestige ~ education + women + income + type,
data=Prestige)
mod3 <- lm(prestige ~ education + women + income * type,
data=Prestige)
```
From our earlier analyses (@sec-prestige) we saw that the marginal relationship between `income` and `prestige` was nonlinear @fig-Prestige-scatterplot-income1), and was better represented in a linear model using
`log(income)` (@sec-log-scale) shown in @fig-Prestige-scatterplot2. However, this possibly non-linear relationship could also be explained by stratifying (@sec-stratifying) the data by `type` of occupation (@fig-Prestige-scatterplot3).
### Displaying coefficients
`summary()` gives the complete precis of a fitted model, with information about the estimated coefficients, residuals and goodness-of fit statistics like $R^2$. But if you only want to see the coefficients,
standard errors, etc. `lmtest::coeftest()` gives these results in the familiar format for console output.
`broom::tidy()` places these in a tidy format common to many modeling functions which is useful for
futher processing (e.g., comparing models).
```{r coeftest}
lmtest::coeftest(mod1)
broom::tidy(mod1)
```
The `r package("modelsummary", cite=TRUE)` is an easy to use,
very general package to summarize data and statistical models in R. The main function
`modelsummary()` can produce highly customizable tables of coefficients
in a wide variety of output formats, including HTML, PDF, LaTeX, Markdown, and MS Word.
You can select the statistics displayed for any model term with the `estimate` and
`statistic` arguments.
```{r}
#| label: tbl-modelsummary1
#| tbl-cap: Table of coefficients for the main effects model.
modelsummary(list("Model1" = mod1),
coef_omit = "Intercept",
shape = term ~ statistic,
estimate = "{estimate} [{conf.low}, {conf.high}]",
statistic = c("std.error", "p.value"),
fmt = fmt_statistic("estimate" = 3, "conf.low" = 4, "conf.high" = 4),
gof_omit = ".")
```
`gof_omit` allows you to omit or select the goodness-of-fit statistics and other model information
available from those listed by `get_gof()`:
```{r}
get_gof(mod1)
```
### Visualizing coefficients
`modelplot()` is the companion function. It allows you to plot model estimates and confidence intervals. It makes it easy to subset, rename, reorder, and customize plots using same mechanics as in `modelsummary()`.
```{r}
#| label: fig-modelplot1
#| out-width: "80%"
#| fig-cap: "Plot of coefficients and their standard error bar for the simple main effects model"
theme_set(theme_minimal(base_size = 14))
modelplot(mod1, coef_omit="Intercept",
color="red", size=1, linewidth=2) +
labs(title="Raw coefficients for mod1")
```
But this plot is disappointing and misleading because it show the **raw** coefficients.
From the plot, it looks like only `education` has a non-zero effect, but the effect of `income` is also
highly significant. The problem is that the magnitude of the coefficient $\hat{b}_{\text{education}}$
is more than 40,000 times that of the other coefficients, because education is measured years,
while income is measured in dollars.
The 95% confidence interval for $\hat{b}_{\text{income}} = [0.0008, 0.0019]$, but this is invisible in the plot.
Before figuring out how to fix this issue, I show the comparable displays from `modelsummary()`
and `modelplot()` for all three models.
When you give `modelsummary()` a list of models, it displays their coefficients side-by-side
as shown in @tbl-modelsummary2.
```{r}
#| label: tbl-modelsummary2
#| tbl-cap: Table of coefficients for three models.
models <- list("Model1" = mod1, "Model2" = mod2, "Model3" = mod3)
modelsummary(models,
coef_omit = "Intercept",
fmt = 2,
stars = TRUE,
shape = term ~ statistic,
statistic = c("std.error", "p.value"),
gof_map = c("rmse", "r.squared")
)
```
Note that a factor predictor (like `type` here) with $d$ levels is represented by $d-1$ coefficients
in main effects and in interactions with quantitative variables. These levels are coded with _treatment contrasts_ by default. Also by default, the first level is set as the reference level in alphabetical order.
Here the reference level is blue collar (`bc`), so the coefficient `typeprof = 5.91`
indicates that professional occupations on average are rated 5.91 greater on the Prestige scale
than blue collar workers.
Note also that unlike the table,
the coefficients in @fig-modelplot1 are ordered from bottom to top, because the Y axis starts
at the lower left corner. In @fig-modelplot2 I use `scale_y_discrete()` to reverse the order.
It is also useful to add a vertical reference line at $\beta = 0$.
```{r}
#| label: fig-modelplot2
#| out-width: "90%"
#| fig-cap: "Plot of raw coefficients and their confidence intervals for all three models"
modelplot(models,
coef_omit="Intercept",
size=1.3, linewidth=2) +
ggtitle("Raw coefficients") +
geom_vline(xintercept = 0, linewidth=1.5) +
scale_y_discrete(limits=rev) +
theme(legend.position = "inside",
legend.position.inside = c(0.85, 0.2))
```
### More useful coefficient plots
The problem with plots of raw coefficients shown in @fig-modelplot1 and @fig-modelplot2
is that the coefficients for different predictors are not directly comparable because
they are measured in different units.
One alternative is to plot the _standardized coefficients_.
Another way is to re-scale the predictors into more comparable and meaningful units.
I illustrate these ideas below.
#### Standardized coefficients {.unnumbered}
The simplest way to do this is to transform all variables to standardized ($z$) scores.
The coefficients are then interpreted
as the standardized change in prestige for a one standard deviation change in the
predictors.
The syntax below uses `scale` to transform all the numeric variables.
Then, we re-fit the models using the standardized data.
```{r standardize}
Prestige_std <- Prestige |>
as_tibble() |>
mutate(across(where(is.numeric), scale))
mod1_std <- lm(prestige ~ education + income + women,
data=Prestige_std)
mod2_std <- lm(prestige ~ education + women + income + type,
data=Prestige_std)
mod3_std <- lm(prestige ~ education + women + income * type,
data=Prestige_std)
```
The plot in @fig-modelplot3 now shows the significant effect of income in all three models.
As well, it offers a more sensitive comparison of the coefficients of other terms
across models; for example `women` is not significant in models 1 and 2, but becomes
significant in Model 3 when the interaction of `income * type` is included.
```{r}
#| label: fig-modelplot3
#| out-width: "90%"
#| fig-cap: "Plot of standardized coefficients and their confidence intervals for all three models"
models <- list("Model1" = mod1_std, "Model2" = mod2_std, "Model3" = mod3_std)
modelplot(models,
coef_omit="Intercept", size=1.3) +
ggtitle("Standardized coefficients") +
geom_vline(xintercept = 0, linewidth = 1.5) +
scale_y_discrete(limits=rev) +
theme(legend.position = "inside",
legend.position.inside = c(0.85, 0.2))
```
It turns out there is an easier way to get plots of standardized coefficients.
`modelsummary()` extracts coefficients from model objects using the `parameters` package,
and that package offers several options for standardization:
See [model parameters documentation](https://easystats.github.io/parameters/reference/model_parameters.default.html).
We can pass the `standardize="refit"` (or other) argument directly to `modelsummary()` or `modelplot()`, and that argument will be forwarded to `parameters`. The plot produced by the code below is identical to @fig-modelplot3 and is not shown.
```{r}
#| eval: false
modelplot(list("mod1" = mod1, "mod2" = mod2, "mod3" = mod3),
standardize = "refit",
coef_omit="Intercept", size=1.3) +
ggtitle("Standardized coefficients") +
geom_vline(xintercept = 0, linewidth=1.5) +
scale_y_discrete(limits=rev) +
theme(legend.position = "inside",
legend.position.inside = c(0.85, 0.2))
```
The `r package("ggstats", cite=TRUE)` provides even nicer versions
of coefficient plots that handle factors in a more reasonable way, as levels within
the factor. `ggcoef_model()` plots a single model and `ggcoef_compare()` plots
a list of models using sensible defaults. A small but nice feature is that it explicitly
shows the 0 value for the reference level of a factor (`type = "bc"` here)
and uses better labels for factors and their interactions.
```{r}
#| label: fig-ggcoef-compare
#| out-width: "90%"
#| fig-cap: "Model comparison plot from `ggcoef_compare()`"
models <- list(
"Base model" = mod1_std,
"Add type" = mod2_std,
"Add interaction" = mod3_std)
ggcoef_compare(models) +
labs(x = "Standarized Coefficient")
```
#### More meaningful units {.unnumbered}
Standardizing the variables makes the coefficients directly comparable, but it may be harder
to understand what they mean in terms of the variables. For example, the coefficient of
`income` in `mod2_std` is 0.25. A literal interpretation is that occupational prestige
is expected to increase 0.25 standard deviations for each standard deviation increase in income,
but it may be difficult to appreciate what this means.
A better substantive comparison of the coefficients could use understandable scales for the
predictors, e.g., months of education, $100,000 of income or 10% of women's participation.
Note that the effect of this is just to multiply the coefficients and their standard errors by a factor.
The statistical conclusions of significance are unchanged.
For simplicity, I do this just for Model 1.
```{r scaled-units}
Prestige_scaled <- Prestige |>
mutate(education = 12 * education,
income = income / 100,
women = women / 10)
mod1_scaled <- lm(prestige ~ education + income + women,
data=Prestige_scaled)
```
When we plot this with `ggcoef_model()`, there are many options to control how variables are
labeled and other details.
```{r}
#| label: fig-ggcoef-compare2
#| out-width: "90%"
#| fig-cap: "Plot of coefficients for prestige with scaled predictors for Model 1."
ggcoef_model(mod1_scaled,
signif_stars = FALSE,
variable_labels = c(education = "education\n(months)",
income = "income\n(/$100K)",
women = "women\n(/10%)")) +
xlab("Coefficients for prestige with scaled predictors")
```
So, on average, each additional month of education increases the prestige rating by 0.34 units,
while an additional $100,000 of income increases it by 0.13 units. While these are significant
effects, they are not large in relation to the scale of `prestige` which ranges 14.8---87.2.
<!-- ## Spread-level plot -->
## Added-variable and related plots {#sec-avplots}
In multiple regression problems, it is most often useful to construct a scatterplot matrix
and examine the plot of the response vs. each of the predictors as well as those
of the predictors against each other.
However, the simple, _marginal_ scatterplots of a response $y$ against _each_ of several predictors $x_1, x_2, \dots$ can be misleading because each one ignores the other predictors.
To see this consider a toy dataset, `coffee`, giving measures of coffee consumption,
occupational stress and an index of heart problems in a sample of $n=20$ graduate students
and professors.
```{r}
#| label: fig-coffee-spm
#| fig-width: 7
#| fig-height: 6
#| out-width: "100%"
#| fig-cap: "Scatterplot matrix showing pairwise relations among `Heart` ($y$), `Coffee` consumption ($x_1$) and `Stress` ($x_2$), with linear regression lines and 68% data ellipses for the bivariate relations"
data(coffee, package="matlib")
scatterplotMatrix(~ Heart + Coffee + Stress,
data=coffee,
smooth = FALSE,
ellipse = list(levels=0.68, fill.alpha = 0.1),
pch = 19, cex.labels = 2.5)
```
The message from these marginal plots in @fig-coffee-spm
seems to be that coffee is bad for your heart,
stress is bad for your heart, and stress is also strongly
related to coffee consumption. Yet, when we fit a model with both variables together,
we get the following results:
```{r fit-both}
fit.both <- lm(Heart ~ Coffee + Stress, data=coffee)
lmtest::coeftest(fit.both)
```
The coefficients suggest that stress is indeed bad for your heart,
but the negative (though non-significant) coefficient for coffee suggests that coffee is good for you.How can this be? Does that mean I should drink more coffee, while avoiding stress?
The reason for this apparent paradox is that
the general linear model fit by `lm()` estimates all effects together and so the coefficients pertain to the _partial_ effect of a given predictor, _adjusting_ for the effects of all others.
That is, the coefficient for coffee ($\beta_{\text{Coffee}} = -0.41$) estimates the effect of coffee for
people with _same_ level of stress. In the marginal scatterplot, the positive slope for coffee
(1.10) ignores the correlation of coffee and stress.
This is an example of _confounding_ in regression when an important predictor is omitted.
Stress is positively associated with both coffee consumption and heart damage.
When stress is omitted, the coefficient for coffee is biased because it "picks up"
the relation with the omitted variable.
A solution to this problem is
the _added-variable_ plot ("AV plot", also called _partial regression_ plot, MostellerTukey-1977).
This is a multivariate analog of a
a simple marginal scatterplot, designed to visualize directly the partial relation between $y$ and the predictors
$x_1, x_2, \dots$ in a multiple regression model.
You can think of this as a magic window that hides
the relations of all other variables with each of the $y$ and $x_i$ shown in a given added-variable plot.
This gives an unobstructed view of the net relation between $y$ and $x_i$ with the effect of
all other variables removed. In effect, it reduces the problem of viewing the complete model
in $p$-dimensional space to a sequence of $p$ 2D plots, each of which tells the story of one
predictor, unentangled from the others. This is essentially the same idea as
the partial variables plot (@sec-pvPlot) used to understand partial correlations.
The construction of an AV plot is conceptually very simple. For variable $x_i$, imagine that we fit two supplementary regressions:
* Regress $\mathbf{y}$ on $\mathbf{X_{(-i)}}$, the model matrix of all of the regressors except $x_i$.
By definition, the residuals from this regression,
$\mathbf{y}^\star \equiv \mathbf{y} \,\vert\, \text{others} = \mathbf{y} - \widehat{\mathbf{y}} \,\vert\, \mathbf{X_{(-i)}}$,
<!-- $r_y(x_i) = y - \widehat{y}_{\mathbf{X_{(-i)}}} \equiv y \,\vert\, \text{others}$, -->
are the part of $\mathbf{y}$ that cannot be
explained by all the other regression terms. These residuals are necessarily uncorrelated
with the other predictors.
* Regress $x_i$ on the other predictors, $\mathbf{X_{(-i)}}$ and again obtain the residuals.
These residuals,
$\mathbf{x}_i^\star \equiv \mathbf{x}_i \,\vert\, \text{others} = \mathbf{x}_i - \widehat{\mathbf{x}}_i \,\vert\, \mathbf{X_{(-i)}}$
give the part of $x_i$ that cannot be explained by the others, and so are uncorrelated with them.
The AV plot is then just a simple scatterplot of these residuals, $\mathbf{y}^\star$
on the vertical axis, and $\mathbf{x}^\star$ on the horizontal.
In practice, it is unnecessary to run the auxilliary regressions this way [@VellemanWelsh:81].
Both can be calculated using `stats::lsfit()` roughly as follows:
```{r AVcalc}
#| eval: false
AVcalc <- function(model, variable)
X <- model.matrix(model)
response <- model.response(model)
x <- X[, -variable]
y <- cbind(X[, variable], response)
fit <- lsfit(x, y, intercept = FALSE)
resids <- residuals(fit)
return(resids)
```
Note that `y` here contains both the current predictor, $\mathbf{x}_i$ and the response $\mathbf{y}$, so the
residuals `resids` have two columns, one for $x_i \,\vert\, \text{others}$
and one for $y \,\vert\, \text{others}$.
Added-variable plots are produced using `car::avPlot()` for one predictor or
`avPlots()` for any number of model terms. The `id` argument controls
which points are identified in the plots; `n=2` labels the two points
that are furthest from the mean on the horizontal axis _and_ the two with the largest
absolute residuals. For instance, in @fig-coffee-avPlots, observations 5 and 13 are flagged
because their conditional $\mathbf{x}_i^\star$ values are extreme; observation 17 has a
large absolute residual, $\mathbf{y}^\star = \text{Heart} \,\vert\, \text{others}$.
```{r}
#| label: fig-coffee-avPlots
#| out-width: "100%"
#| fig-cap: "Added-variable plots for Coffee and Stress in the multiple regression model"
avPlots(fit.both,
ellipse = list(levels = 0.68, fill=TRUE, fill.alpha = 0.1),
pch = 19,
id = list(n = 2),
cex.lab = 1.5,
main = "Added-variable plots for Coffee data")
```
The data ellipses for $\mathbf{x}_i^\star$ and $\mathbf{y}^\star$ summarize the conditional
(or partial) relations of the response to each predictor controlling for all other predictors
in each plot. The essential idea is that the data ellipse for $(\mathbf{x}_i^\star, \mathbf{y}^\star)$ has the identical relation to the estimate $\hat{b}_i$ in a
multiple regression as the data ellipse of $(\mathbf{x}, \mathbf{y})$ has to the slope in
a simple regression.
### Properties of AV plots
AV plots are particularly interesting and useful for the following noteworthy properties:
* The slope of the simple regression in the AV plot for variable $x_i$ is identical to the slope
$b_i$ for that variable in the full multiple regression model.
* The residuals in this plot are the same as the residuals using all predictors. This means you can
see the degree of fit for observations directly in relation to the various predictors, which is not
the case for marginal scatterplots.
* Consequentially, the standard deviation of the (vertical) residuals in the AV plot is the same as $s = \sqrt(MSE)$ in the full model and the standard error of a coefficient is $\text{SE}(b_i) = s / \sqrt{\Sigma (\mathbf{x}_i^\star)^2}$. This is shown by the size of the shadow of the data ellipses on
the vertical axis in @fig-coffee-avPlots.
* The horizontal positions, $\mathbf{x}_i^\star$, of points adjust for all other predictors, and so we can see points at the extreme left and right as unusual in relation to the others. If these points are also badly fitted (large residuals), we can see their influence on the fitted relation in the full model. AV plots thus provide visual displays of (partial) _leverage_ and _influence_ on each of the regression coefficients.
* The correlation of $\mathbf{x}_i^\star$ and $\mathbf{y}^\star$ shown by the shape of the data ellipses is the _partial correlation_ between $\mathbf{x}_i$ and $\mathbf{y}_i$
with other predictors partialled out.
### Marginal - conditional plots
The relation of the _conditional_ data ellipses in AV plots to those in _marginal_ plots of the same variables
provides further insight into what it means to "control for" other variables.
@fig-coffee-mcplot shows the same added-variable plots for Heart disease on Stress and Coffee
as in @fig-coffee-avPlots (with a zoomed-out scaling), but here we also overlay the
marginal data ellipses for $(\mathbf{x}_i, \mathbf{y})$ (centered at the means),
and marginal regression lines for Stress and Coffee separately. Drawing arrows connecting the
original data points to their positions in the AV plot shows what happens when we condition on
or partial out the other variable.
These _marginal - conditional plots_ are produced by `car::mcPlot()` (for one regressor)
and `car::mcPlots()` (for several). The plots for the marginal and conditional relations
can be compared separately using the same scales for both, or overlaid as shown here.
The points labeled here are only those with large absolute residuals $\mathbf{y}^\star$ in the vertical
direction.
```{r}
#| label: fig-coffee-mcplot
#| fig-width: 12
#| fig-height: 7
#| out-width: "100%"
#| fig-cap: Marginal $+$ conditional (added-variable) plots for Coffee and Stress in the multiple regression predicting Heart disease. Each panel shows the 68% conditional data ellipse for $x_i^\star, y^\star$ residuals (shaded, blue) as well as the marginal 68% data ellipse for the $(x_i, y)$ variables, shifted to the origin. Arrows connect the mean-centered marginal points (red) to the residual points (blue).
mcPlots(fit.both,
ellipse = list(levels=0.68, fill=TRUE, fill.alpha=0.2),
id = list(n=2),
pch = c(16, 16),
col.marginal = "red", col.conditional = "blue",
col.arrows = "black",
cex.lab = 1.5)
```
The most obvious feature of @fig-coffee-mcplot
is that Coffee has a negative slope in the conditional AV plot but a positive slope in the marginal
plot. This is an example of Simpson's paradox in a regression context: marginal and conditional relations
can have opposite signs.
\index{Simpson's paradox}
Less obvious is the relation between the marginal and AVP ellipses. In 3D, the marginal
data ellipse is the shadow of the ellipsoid for $(\mathbf{y}, \mathbf{x}_1, \mathbf{x}_2)$ on one of the coordinate planes,
while the AV plot is a slice through the ellipsoid where either $\mathbf{x}_1$ or $\mathbf{x}_2$
is held constant.
Thus, the AVP ellipse must be contained in the marginal ellipse, as we can see in @fig-coffee-mcplot.
If there are only two $x$s, then the AVP ellipse must touch the marginal ellipse at two points.
Finally, @fig-coffee-mcplot also shows how conditioning on other predictors works for individual
observations, where each point of $(\mathbf{x}_i^\star, \mathbf{y}^\star)$ is the image of $(\mathbf{x}_i, \mathbf{y})$ along the path of the marginal regression. The variability in the response and in the focal
predictor are both reduced, leaving only the uncontaminated relation of $\mathbf{y}$ with $\mathbf{x}_i$.
These plots are similar in spirit to the ARES plot ("Adding REgressors Smoothly") proposed
by @CookWeisberg-1994, but their idea was an interactive animation,
displaying a smooth transition between the fit of a marginal model and
the fit of a larger model. They used linear interpolation,
$$
(\mathbf{x}_i, \mathbf{y})_{\text{interp}} = (\mathbf{x}_i, \mathbf{y}) + \lambda [(\mathbf{x}_i^\star, \mathbf{y}^\star) - (\mathbf{x}_i, \mathbf{y})] \comma
$$
controlled by a slider
whose value, $\lambda \in [0, 1]$, was the weight given to the smaller marginal model.
See [this animation](https://www.datavis.ca/gallery/animation/duncanAV/) for an example
using the Duncan data.
### Prestige data
For a substantive example, let's return to the model for income, education and women in the
`Prestige` data. The plot in @fig-prestige-avplot shows the strong positive relations of
income and education to prestige in the full model, and the negligible relation of
percent women. But, in the plot for income, two occupations (physicians and general managers)
with high income strongly pull the regression line down from what can be seen in
the orientation of the conditional data ellipse.
```{r}
#| label: fig-prestige-avplot
#| fig-width: 10
#| fig-height: 10
#| out-width: "100%"
#| fig-cap: "Added-variable plot for the quantitative predictors in the `Prestige` data."
prestige.mod1 <- lm(prestige ~ education + income + women,
data=Prestige)
avPlots(prestige.mod1,
ellipse = list(levels = 0.68),
id = list(n = 2, cex = 1.2),
pch = 19,
cex.lab = 1.5,
main = "Added-variable plots for prestige")
```
The influential points for physicians and general managers could just be unusual, or
suggest that the relation of income to prestige is nonlinear. A rough test of this
is to fit a smoothed curve through the points in the AV plot as shown in @fig-prestige-av-income.
```{r}
#| label: fig-prestige-av-income
#| fig-width: 7
#| fig-height: 6
#| out-width: "75%"
#| fig-cap: "Added-variable plot for income, with a loess smooth."
op <- par(mar=c(4, 4, 1, 0) + 0.5)
res <- avPlot(prestige.mod1, "income",
ellipse = list(levels = 0.68),
pch = 19,
cex.lab = 1.5)
smooth <- loess.smooth(res[,1], res[,2])
lines(smooth, col = "red", lwd = 2.5)
```
However, this use of AV plots to diagnose nonlinearity or suggest transformations can be misleading
[@Cook-1996]. Curvature in these plots is an indication of _some_ model deficiency,
but unless the predictors are uncorrelated, they cannot determine the form of a possible
transformation of the predictors.
### Component + Residual plots
A plot more suited to detecting the need to transform a predictor $\mathbf{x}_i$
to a form $f(\mathbf{x}_i)$ to make it's relationship with the response $\mathbf{y}$
more nearly linear is the _component + residual plot_ ("C+R plot", also called _partial residual plot_, @LarsenMcCleary:72; @Cook:93). This plot displays
the partial residual $\mathbf{e} + \hat{b}_i \mathbf{x}_i$ on the vertical axis
against $\mathbf{x}_i$ on the horizontal, where $\mathbf{e}$ are the residuals from the full model.
A smoothed curve through the points will often suggest the form of the transformation $f()$.
The fact that the horizontal axis is $\mathbf{x}_i$ itself rather than $\mathbf{x}^\star_i$
makes it easier to see the functional form.
The C+R plot has the same desirable properties as the AV plot: The slope $\hat{b}_i$
and residuals $\mathbf{e}$ in this plot are the same as those in the full model.
C+R plots are produced by `car::crPlots()` and `car::crPlot()`.
@fig-prestige-crplot shows this just for income in the model `prestige.mod1`.
(These plots for education and women show no strong evidence of curvilinearity.)
The dashed `r colorize("blue")` line is the linear _partial fit_, $\hat{b}_i \mathbf{x}_i$,
whose slope $\hat{b}_2 = 0.0013$ is the same as that for income in `prestige.mod1`.
The solid `r colorize("red")` curve is the loess smooth through the points.
The same points are identified as noteworthy as in AV plot in @fig-prestige-av-income.
```{r}
#| label: fig-prestige-crplot
#| fig-width: 7
#| fig-height: 6
#| out-width: "75%"
#| results: hide
#| fig-keep: all
#| fig-cap: !expr paste("Component + residual plot for income in the model for the quantitative predictors of prestige. The dashed", blue, "line is the partial linear fit for income. The solid", red, "curve is the loess smooth.")
crPlot(prestige.mod1, "income",
smooth = TRUE,
order = 2,
pch = 19,
col.lines = c("blue", "red"),
id = list(n=2, cex = 1.2),
cex.lab = 1.5)
```
The partial relation between prestige and income is clearly curved, so it would be appropriate
to transform income or to include a polynomial (quadratic) term and refit
the model. As suggested earlier (@sec-prestige) it makes sense statistically and substantively
to model the effect of income on a log scale, so then the slope for `log(income)`
would measure the increment in prestige for a constant _percentage increase_ in income.
The effect of percent women on prestige seen in @fig-prestige-avplot appears very small
and essentially linear. However, if we wished to examine this more closely,
we could use the C+R plot in @fig-prestige-crplot-women.
```{r}
#| label: fig-prestige-crplot-women
#| fig-width: 7
#| fig-height: 6
#| out-width: "75%"
#| results: hide
#| fig-keep: all
#| fig-cap: "Component + residual plot for women in the model for the quantitative predictors of prestige."
crPlot(prestige.mod1, "women",
pch = 19,
col.lines = c("blue", "red"),
id = list(n=2, cex = 1.2),
cex.lab = 1.5)
```
This shows a slight degree of curvature, with modestly larger values in the extremes.
If we wished to test this statistically, we could fit a model with a quadratic effect of
women, and compare that to the linear-only effect using `anova()`.
```{r prestige-mod2}
prestige.mod2 <- lm(prestige ~ education + income + poly(women,2),
data=Prestige)
anova(prestige.mod1, prestige.mod2)
```
This model ignores the `type` of occupation ("bc", "wc", "prof")
as well as any possible interactions of type with other predictors. We examine this next,
using effect displays.
## Effect displays
For two predictors it is possible, even if awkward,
to display the fitted response surface in a 3D plot or faceted 2D views
in what I call a _full model plot_.
For more than two predictors
such displays become cumbersome if not impractical, particularly when there are interactions
in the model, when some effects are curvilinear,
or when the main substantive interest is focused understanding on one or more
main effects or interaction terms in the presence of others.
The method of _effect displays_, largely introduced by John Fox [@Fox:87;@Fox:03:effects;@FoxWeisberg2018]
is a generally useful solution to this problem. [^effects]
These plots are nearly always easier to understand than
tables of coefficients.
[^effects]: Earlier, but less general expression of these ideas go back to the use of
**adjusted means** in analysis of covariance [@Fisher-1936]
or **least squares means** or **population marginal means**
in analysis of variance [@Searle-etal:80]
The idea of effect displays is quite simple, but very general and handles models of arbitrary complexity.
Imagine that in a model we have a particular subset of predictors (_focal predictors_) whose effects
on the response variable we wish to visualize. The essence of an effect display is that we calculate
the predicted values (and standard errors) of the response for the model term(s) involving the
focal predictors (and all low-order relatives, e.g, main effects that are marginal to an interaction)
as those predictors are allowed to vary over a grid covering their range.
For a given plot, the other, non-focal variables are "controlled"
by being fixed at typical values.
For example, a quantitative predictor could be fixed at it's mean, median or some representative value.
A factor could be fixed at equal proportions of its levels or its proportions in the data.
The result, when plotted, shows the predicted effects of the focal variables,
either with multiple lines or in a faceted display, but with all the other variables controlled,
adjusted for or averaged over. For interaction effects all low-order relatives are typically included
in the fitted values for the term being graphed.
In practical terms, a scoring matrix $\mathbf{X}^\bullet$ is defined by
the focal variables varied over their ranges and the other variables held
fixed. The fitted values for a model term are then calculated as
$\widehat{\mathbf{y}}^\bullet = \mathbf{X}^\bullet \; \widehat{\mathbf{b}}$
using the equivalent of:
```{r}
#| eval: false
predict(model, newdata = X, se.fit = TRUE)
```