mrk17.qmd

---
title: "RePsychLing Masson, Rabe, & Kliegl, 2017) with Julia: Specification and selection"
jupyter: julia-1.9
---

# Setup

Packages we (might) use.

```{julia}
using CategoricalArrays
using DataFrames
using MixedModels
using MixedModelsMakie
using SMLP2023: dataset
using Statistics: mean, std
```

```{julia}
dat = DataFrame(dataset(:mrk17_exp1))
describe(dat)
```
# Specification

This section covers the general terminology and advice for model specification.

## Response, covariates, and factors

Linear mixed models (LMMs), like many other types of statistical models, describe a relationship between a *response* variable and *covariates* that have been measured or observed along with the response. The statistical model assumes that the residuals of the fitted response (i.e., not the responses) are normally -- also identically and independently -- distributed. This is the *first assumption* of normality in the LMM. It is standard practice that model residuals are inspected and, if serious skew is indicated, that the response is Box-Cox transformed (unless not justified for theoretical reasons) to fulfill this model assumption.

In the following we distinguish between *categorical covariates* and *numerical covariates*. Categorical covariates are  *factors*. The important characteristic of a factor is that, for each observed value of the response, the factor takes on the value of one of a set of discrete levels.  The levels can be unordered (nominal) or ordered (ordinal). We use the term *covariate* when we refer to *numerical covariates*, that is to continuous measures with some distribution. In principle, statistical models are not constrained by the distribution of observations across levels of factors and covariates, but the distribution may lead to problems of model identification and it does implications for statistical power.

Statistical power, especially for the detection of interactions, is best when observations are uniformly distributed across levels of factors or uniform across the values of covariates. In experimental designs, uniform distributions may be achieved by balanced assignment of subjects (or other carriers of responses) to the levels of factors or combinations of factor levels. In observational contexts, we achieve uniform distributions by stratification (e..g., on age, gender, or IQ scores). Statistical power is worse for skewed than normal distributions (I think ...). Therefore, although it is *not* required to meet an assumption of the statistical model, it may be useful to consider Box-Cox transformations of covariates.

## Nested and crossed random (grouping) factors

In LMMs the levels of at least one of the factors represents *units* in the data set that are assumed to be sampled, ideally randomly, from a population that is normally distributed with respect to the response. *This is the second assumption of normal distribution in LMMs.*  In psychology and linguistics the observational units are often the subjects or items (e..g., texts, sentences, words, pictures) in the study. We may use numbers, such as subject identifiers, to designate the particular levels that we observed; we recommend to prepend these numbers with "S" or "I" to avoid confusion with numeric variables.

Random sampling is the basis of generalization from the sample to the population. The core statistics we will estimate in this context are variances and correlations of grand means and (quasi-)experimental effects. These terms will be explained below. What we want to stress here is that the estimation of (co-)variances / correlations requires a larger number of units (levels) than the estimation of means. Therefore, from a practical perspective, it is important that random factors are represented with many units.

When there is more than one random factor, we must be clear about their relation. The two prototypical cases are that the factors are *nested* or *crossed*.  In multilevel models, a special case of mixed models, the levels of the random factors are strictly nested. For example, at a given time, every student attends a specific class in a specific school. Students, classes, and schools could be three random factors. As soon as we look at this scenario across several school years, the nesting quickly falls apart because students may move between classes and between schools.

In psychology and linguistics, random factors are often crossed, for example, when every subject reads every word in every sentence in a word-by-word self-paced reading experiment (or alternatively: when every word in every sentence elicits a response from every subject). However, in an eye-movement experiment (for example), the perfect crossing on a measure like fixation duration is not attainable because of blinks or skipping of words.

In summary, the typical situation in experimental and observational studies with more than one random factor is _partial crossing_ or _partial nesting_ of levels of the random factors. Linear mixed models handle these situations very well.

## Experimental and quasi-experimental fixed factors / covariates

*Fixed experimental factor or covariate*. In experiments the units (or levels) of the random factor(s) are assigned to manipulations implemented in their design. The researcher controls the assignment of units of the random factor(s) (e.g., subjects, items) to experimental manipulations. These manipulations are represented as factors with a fixed and discrete set of levels (e.g., training vs. control group) or as covariates associated with continuous numeric values (e.g., presentation times).

*Fixed quasi-experimental factor or covariate*. In observational studies (which can also be experiments) the units (or levels) of random factors may "bring along" characteristics that represent the levels of quasi-experimental factors or covariates beyond the control of the researcher. Whether a a subject is female, male, or diverse or whether a word is a noun, a verb, or an adjective are examples of quasi-experimental factors of gender or word type, respectively. Subject-related covariates are body height, body mass, and IQ scores; word-related covariates are their lengths, frequency, and cloze predictability.

## Between-unit and within-unit factors / covariates

The distinction between between-unit and within-unit factors is always relative to a random (grouping) factor of an experimental design. A between-unit factor / covariate is a factor for which every unit of the random factor is assigned to or characterized by only one level of the factor. A within-unit factor is a factor for which units of the random factor appear at every level of the factor.

For the typical random factor, say *Subject*, there is little ambiguity because we are used to the between-within distinction from ANOVAs, more specifically the F1-ANOVA. In psycholinguistics, there is the tradition to test effects also for the second random factor *Item* in an F2-ANOVA. Importantly, for a given fixed factor all four combinations are possible. For example, *Gender* is a fixed quasi-experimental between-subject / within-item factor; word frequency is a fixed quasi-experimental within-subject / between-item covariate; *Prime-target relation* is a fixed experimental  within-subject / within-item factor (assuming that targets are presented both in a primed and in an unprimed situation); and when a training manipulation is defined by the items used in the training, then in a training-control group design, the fixed factor *Group* is a fixed experimental between-subject / between-item factor.

These distinctions are critical for setting up LMMs because variance components for (quasi-)experimental effects can only be specified for within-unit effects. Note also that loss of data (within limits), counterbalancing or blocking of items are irrelevant for these definitions.

## Factor-based contrasts and covariate-based trends

The simplest fixed factor has two levels and the model estimates the difference between them. When we move to factors with *k*  levels, we must decide on how we *spend* the *k-1* degrees of freedom, that is we must specify a set of contrasts. (If we don't do it, the program chooses DummyCoding contrasts for us.)

The simplest specification of a covariate is to include its linear trend, that is its slope. The slope (like a contrast) represents a difference score, that is the change in response to a one-unit change on the covariate. For covariates we must decide on the order of the trend we want to model.

## Contrast- and trend-based fixed-effect model parameters

Fixed factors and covariates are expected to have effects on the response. Fixed-effect model parameters estimate the hypothesized main and interaction effects of the study. The estimates of factors are based on contrasts; the estimates of covariates are based on trends. Conceptually, they correspond to unstandardized regression coefficients in multiple regression.

The intercept is a special regression coefficient; it estimates the value of the dependent variable when all fixed effects associated with factors and trends associated with covariates are zero. In experimental designs with higher-order interactions there is an advantage of specifying the LMM in such a way that the intercept estimates the grand mean (GM; mean of the means of design cells). This happens if (a) contrasts for factors are chosen such that the intercept estimates the GM (positive: EffectsCoding, SeqDifferenceCoding, or HelmertCoding contrasts; negative: DummyCoding), (b) orthogonal polynomial trends are used (Helmert, anova-based), and (c) covariates are centered on their mean before inclusion in the model. As always, there may be good theoretical reasons to depart from the default recommendation.

The specification of contrasts / trends does not depend on the status of the fixed factor / covariate. It does not matter whether a factor varies between or within the units of a random factor or whether it is an experimental or quasi-experimental factor. Contrasts are *not* specified for random (grouping) factors.

## Variance components (VCs) and correlation parameters (CPs)

Variance components (VCs) and correlation parameters (CPs) are within-group model parameters; they correspond to (some of the) *within-unit* (quasi-)experimental fixed-effect model parameters. Thus, we may be able to estimate a subject-related VC for word frequency. If we included a linear trend for word frequency, the VC estimates the subject-related variance in these slopes. We cannot estimate an item-related VC for the word-frequency slopes because there is only one frequency associated with words. Analogously, we may able to estimate an item-related VC for the effect of `Group (training vs. control)`, but we cannot estimate a subject-related VC for this effect.

The within-between characteristics of fixed factors and covariates relative to the random factor(s) are features of the design of the experiment or observational study. They fundamentally constrain the specification of the LMM. That's why it is of upmost importance to be absolutely clear about their status.

## Conditional modes of random effects

In this outline of the dimensions underlying the specification of an LMM, we have said nothing so far about the conditional modes of random effects (i.e., the results shown in caterpillar and shrinkage plots). They are not needed for model specification or model selection.

The VC is the prior variance of the random effects, whereas `var(ranef(model))` is the variance of the posterior means/modes of the random effects. See Kliegl et al. (2010, VisualCognition); [Rizopoulos (2019, stackexchange](https://stats.stackexchange.com/questions/392283/interpreting-blups-or-varcorr-estimates-in-mixed-models/392307#392307).

# Background for data

## Abstract
This semantic-priming experiment was reported in Masson, Rabe, & Kliegl (2017, Exp. 1, Memory & Cognition). It is a direct replication of an experiment reported in Masson & Kliegl (2013, Exp. 1, JEPLMC). Following a prime word a related or unrelated high- or low-frequency target word or a nonword was presented in clear or dim font. The subject's task was to decide as quickly as possible whether the target was a word or a nonword, that is subjects performed a lexical decision task (LDT). The reaction time and the accuracy of the response were recorded. Only correct reaction times to words are included. After filtering there were 16,409 observations recorded from 73 subjects and 240 items.

##  Codebook

The data (variables and observations) used by Masson et al. (2017) are available in file `MRK17_Exp1.RDS`

|Variable | Description|
|---------|----------- |
|Subj     | Subject identifier |
|Item     | Target (non-)word  |
|trial    | Trial number |
|F        | Target frequency is _high_  or _low_ |
|P        | Prime is _related_ or _unrelated_ to target |
|Q        | Target quality is _clear_ or _degraded_   |
|lQ       | Last-trial target quality is _clear_ or _degraded_  |
|lT       | Last-trail target requires _word_ or _nonword_ response |
|rt       | Reaction time [ms] |

`lagQlty` and `lagTrgt` refer to experimental conditions in the last trial.

Corresponding indicator variables (-1/+1):

# Setup

Packages we (might) use.

```{julia}
using CategoricalArrays
using DataFrames
using MixedModels
using MixedModelsMakie
using SMLP2023: dataset
using Statistics: mean, std

```

```{julia}
dat = DataFrame(dataset(:mrk17_exp1))
describe(dat)
```

```{julia}
cells = combine(
  groupby(dat, [:F, :P, :Q, :lQ, :lT]),
  nrow => :n,
  :rt => mean => :rt_m,
  :rt => std => :rt_sd
 # :rt => (c -> mean(log, c)) => :lrt_m,
)
#dat_subj.CTR = categorical(dat_subj.CTR, levels=levels(dat.CTR))
cells
```

# Complex LMM

The following LMM is *not* the maximal factorial LMM because we do not include interaction terms and associated correlation parameters in the RE structure.

## Model fit

```{julia}
contrasts =
    Dict( :F => EffectsCoding(; levels=["LF", "HF"]) ,
          :P => EffectsCoding(; levels=["unr", "rel"]),
          :Q => EffectsCoding(; levels=["deg", "clr"]),
          :lQ =>EffectsCoding(; levels=["deg", "clr"]),
          :lT =>EffectsCoding(; levels=["NW", "WD"])
          );

m_cpx = let
    form = @formula (1000/rt) ~ 1+F*P*Q*lQ*lT +
                               (1+F+P+Q+lQ+lT | subj) +
                               (1  +P+Q+lQ+lT | item);
    fit(MixedModel, form, dat; contrasts)
end

VarCorr(m_cpx)
```

```{julia}
issingular(m_cpx)
```

```{julia}
MixedModels.PCA(m_cpx)
```
Variance-covariance matrix of random-effect structure suggests overparameterization
for both subject-related and item-related components.

We don't look at fixed effects before model selection.

## VCs and CPs

We can also look separately at item- and subj-related VCs and CPs for subjects and items.

```{julia}
first(m_cpx.λ)
```
VP is zero for last diagonal entry; not supported by data.

```{julia}
last(m_cpx.λ)
```

VP is zero for fourth diagonal entry; not supported by data.

# Zero-correlation parameter LMM

## Model fit

We take out correlation parameters.

```{julia}
m_zcp = let
    form = @formula (1000/rt) ~ 1+F*P*Q*lQ*lT +
                       zerocorr(1+F+P+Q+lQ+lT | subj) +
                       zerocorr(1  +P+Q+lQ+lT | item);
    fit(MixedModel, form, dat; contrasts)
end

VarCorr(m_zcp)
issingular(m_zcp)
MixedModels.PCA(m_zcp)
```

```{julia}
MixedModels.likelihoodratiotest(m_zcp, m_cpx)
```

Looks ok. It might be a good idea to prune the LMM by removing small VCs.

```{julia}
m_zcp2 = let
    form = @formula (1000/rt) ~ 1+F*P*Q*lQ*lT +
                       zerocorr(1  +P+Q+lQ+lT | subj) +
                       zerocorr(1  +P+Q   +lT | item);
    fit(MixedModel, form, dat; contrasts)
end
VarCorr(m_zcp2)
```

```{julia}
MixedModels.likelihoodratiotest(m_zcp2, m_zcp, m_cpx)
```

We can perhaps remove some more.

```{julia}
m_zcp3 = let
    form = @formula (1000/rt) ~ 1+F*P*Q*lQ*lT +
                       zerocorr(1    +Q   +lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end
VarCorr(m_zcp3)
```

```{julia}
MixedModels.likelihoodratiotest(m_zcp3, m_zcp2, m_zcp, m_cpx)
```

And another iteration.

```{julia}
m_zcp4 = let
    form = @formula (1000/rt) ~ 1+F*P*Q*lQ*lT +
                       zerocorr(1         +lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end
VarCorr(m_zcp4)
```

```{julia}
MixedModels.likelihoodratiotest(m_zcp4, m_zcp3, m_zcp2, m_zcp, m_cpx)
```

Too much removed. Stay with `m_zcp3`, but extend with CPs.

```{julia}
m_prm = let
    form = @formula (1000/rt) ~ 1+F*P*Q*lQ*lT +
                               (1+    Q+lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end
VarCorr(m_prm)
```
### post-hoc LMM
```{julia}
m_prm = let
    form = @formula (1000/rt) ~ 1+F*P*Q*lQ*lT +
                               (1+    Q+lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end
VarCorr(m_prm)
```

## VCs and CPs

```{julia}
MixedModels.likelihoodratiotest(m_zcp3, m_prm, m_cpx)
```

The LRT favors the complex LMM, but not that  χ² < 2*(χ²-dof); AIC and BIC suggest against selection.

```{julia}
gof_summary = let
  nms = [:m_zcp3, :m_prm, :m_cpx]
  mods = eval.(nms)
  lrt = MixedModels.likelihoodratiotest(m_zcp3, m_prm, m_cpx)
  DataFrame(;
    name = nms,
    dof=dof.(mods),
    deviance=round.(deviance.(mods), digits=0),
    AIC=round.(aic.(mods),digits=0),
    AICc=round.(aicc.(mods),digits=0),
    BIC=round.(bic.(mods),digits=0),
    χ²=vcat(:., round.(lrt.tests.deviancediff, digits=0)),
    χ²_dof=vcat(:., round.(lrt.tests.dofdiff, digits=0)),
    pvalue=vcat(:., round.(lrt.tests.pvalues, digits=3))
  )
end
```

# Parsimonious LMM - replication of MRK17 LMM

The LMM is not nested in the previous sequence.

## Crossed fixed effects

```{julia}
m_mrk17_crossed =let
   form = @formula (1000/rt) ~ 1 + F*P*Q*lQ*lT +
        (1+Q | subj) + zerocorr(0+lT | subj) + zerocorr(1 + P | item) ;
    fit(MixedModel, form, dat; contrasts)
end

VarCorr(m_prm)
```

```{julia}
show(m_mrk17_crossed)
```

Finally, a look at the fixed effects. The four-factor interaction reported in Masson & Kliegl (2013) was not replicated.

## Nested fixed effects

```{julia}
m_mrk17_nested =let
   form = @formula (1000/rt) ~ 1 + Q/(F/P) +
        (1+Q | subj) + zerocorr(0+lT | subj) + zerocorr(1 + P | item) ;
    fit(MixedModel, form, dat; contrasts)
end
```

# Questions from the Discussion forum

## Nesting within products of factors

Include parenthesis

```{julia}
m_mrk17_nested =let
   form = @formula (1000/rt) ~ 1 + Q/(F/P) +
        (1+Q | subj) + zerocorr(0+lT | subj) + zerocorr(1 + P | item) ;
    fit(MixedModel, form, dat; contrasts)
end
```

## Selection in fixed effects

```{julia}
using RegressionFormulae
# m_prm_5 is equivalent to m_prm
m_prm_5 = let
    form = @formula (1000/rt) ~ 1+(F+P+Q+lQ+lT)^5 + (1+Q+lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end;

m_prm_4 = let
    form = @formula (1000/rt) ~ 1+(F+P+Q+lQ+lT)^4 + (1+Q+lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end;

m_prm_3 = let
    form = @formula (1000/rt) ~ 1+(F+P+Q+lQ+lT)^3 + (1+Q+lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end;

m_prm_2 = let
    form = @formula (1000/rt) ~ 1+(F+P+Q+lQ+lT)^2 + (1+Q+lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end;

m_prm_1 = let
    form = @formula (1000/rt) ~ 1+ F+P+Q+lQ+lT +   (1+Q+lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end;

# Compare the fits
gof_summary = let
  nms = [:m_prm_1, :m_prm_2, :m_prm_3, :m_prm_4, :m_prm_5]
  mods = eval.(nms)
  lrt = MixedModels.likelihoodratiotest(m_prm_1, m_prm_2, m_prm_3, m_prm_4, m_prm_5)
  DataFrame(;
    name = nms,
    dof=dof.(mods),
    deviance=deviance.(mods),
    AIC=aic.(mods),
    AICc=aicc.(mods),
    BIC=bic.(mods),
    χ²=vcat(:.,lrt.tests.deviancediff),
    χ²_dof=vcat(:.,lrt.tests.dofdiff),
    pvalue=vcat(:., round.(lrt.tests.pvalues, digits=3))
  )
end
```

Depending on the level of precision of your hypothesis. You could stay with main effect (BIC), include 2-factor interactions (AIC; also called _simple_ interactions) or include 3-factor interactions [χ² < 2*(χ²-dof); also called _2-way_ interactions].

## Posthoc LMM

We are using only three factors for the illustruation.

```{julia}
m_prm3 = let
    form = @formula (1000/rt) ~ 1 + lT*lQ*Q +
                               (1+    Q+lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end
```

The `lT & lQ & Q` interactions is significant. Let's follow it up with a post-hoc LMM, that is looking at the `lQ & Q` interaction in the two levels of whether the last word was a target or not.

```{julia}
m_prm3_posthoc = let
    form = @formula (1000/rt) ~ 1 + lT/(lQ*Q) +
                               (1+    Q+lT | subj) + (1 | item);
    fit(MixedModel, form, dat; contrasts)
end
```

The source of the interaction are trials where the last trial was a word target; there is no evidence for the interaction when the last trial was a nonword target.

The original and post-hoc LMM have the same goodness of fit.

```{julia}
[objective(m_prm3), objective(m_prm3_posthoc)]
```

## Info

```{julia}
versioninfo()
```