mod_multivar-viz.qmd

---
title: "Multivariate Visualization"
---

## Overview

Under construction; check back later!

## Learning Objectives

After completing this module you will be able to: 

- <u>Define</u> some common multivariate visualization methods

## Preparation

This is a "bonus" module and thus was created for asynchronous learners. There is no suggested preparatory work.

## Networking Session

::: panel-tabset
### 2024 Guests

This was a bonus module for the 2024-25 cohort and so did not include a networking session.
:::

## Needed Packages

If you'd like to follow along with the code chunks included throughout this module, you'll need to install the following packages:

```{r install-pkgs}
#| eval: false

# Note that these lines only need to be run once per computer
## So you can skip this step if you've installed these before
install.packages("vegan")
install.packages("ape")
install.packages("supportR")
```

## Ordination

Ordination is the general term for many types of multivariate visualization but typically is used to refer to visualizing a distance or dissimiliarity measure of the data. Such measures collapse multiple columns of response variables into fewer (typically two) index values that are easier to visualize.

This is a common approach particularly in answering questions in community ecology or considering a suite of traits (e.g., life history, landscape, etc.) together. While the math behind reducing the dimensionality of your data is interesting, this module is focused on only the visualization facet of ordination so we'll avoid deeper discussion of the internal mechanics that underpin ordination.

In order to demonstrate two types of ordination we'll use a lichen community composition dataset included in the `vegan` package. However, ordination approaches are most often used on data with multiple groups so we'll need to make a simulated grouping column to divide the lichen community data.

```{r ord-prep}
#| message: false

# Load library
library(vegan)

# Grab data
utils::data("varespec", package = "vegan")

# Create a faux group column
treatment <- c(rep.int("Treatment A", nrow(varespec) / 2),
               rep.int("Treatment B", nrow(varespec) / 2))

# Combine into one dataframe
lichen_df <- cbind(treatment, varespec)

# Check structure of first few columns
str(lichen_df[1:5])
```

:::{.panel-tabset}

### Metric Ordination

Metric ordinations are typically used when you are concerned with retaining quantitative differences among particular points, _even after you've collapsed many response variables into just one or two_. For example, this is a common approach if you have a table of traits and want to compare the whole set of traits among groups while still being able to interpret the effect of a particular effect on the whole.

Two of the more common methods for metric ordination are <u>P</u>rincipal <u>C</u>omponents <u>A</u>nalysis (PCA), and <u>P</u>rincipal <u>Co</u>ordinates <u>A</u>nalysis (PCoA / "metric multidimensional scaling"). The primary difference is that PCA works on the data directly while PCoA works on a distance matrix of the data. We'll use PCoA in this example because it is closer analog to the non-metric ordination discussed in the other tab. **If the holistic difference among groups is of interest,** (rather than metric point-to-point comparisons), **consider a non-metric ordination approach.**

In order to perform a PCoA ordination we first need to get a distance matrix of our response variables and then we can actually do the PCoA step. The distance matrix can be calculated with the `vegdist` function from the `vegan` package and the `pcoa` function in the `ape` package can do the actual PCoA.

```{r pcoa-prep}
#| message: false
#| output: false

# Load needed libraries
library(vegan); library(ape)

# Get distance matrix
lichen_dist <- vegan::vegdist(x = lichen_df[-1], method = "kulczynski") # <1>

# Do PCoA
pcoa_points <- ape::pcoa(D = lichen_dist)
```
1. The `method` argument requires a distance/dissimilarity measure. Note that **if you use a non-metric measure** (e.g., Bray Curtis, etc.) **you lose many of the advantages conferred by using a metric ordination approach**.

With that in hand, we can make our ordination! While you could make this step-by-step on your own, we'll use the `ordination` function from the `supportR` package for convenience. This function automatically uses colorblind safe colors for up to 10 groups and has some useful base plot defaults (as well as including ellipses around the standard deviation of the centorid of all groups).

```{r pcoa-ord}
#| fig-align: center
#| fig-width: 7
#| fig-height: 5

# Load the library
library(supportR)

# Make the ordination
supportR::ordination(mod = pcoa_points, grps = lichen_df$treatment, 
                     x = "topleft", legend = c("A", "B")) #<1>
```
1. This function allows several base plot arguments to be supplied to alter non-critical plot elements (e.g., legend position, point size, etc.)

The percentages included in parentheses on either axis label are the percent of the total variation in the data explained by each axis on its own. Use this information in combination with what the graph looks like to determine how different the groups truly are.

### Non-Metric Ordination

Non-metric ordinations are typically used when you care more about the relative differences among groups rather than specific measurements between particular points. For instance, you may want to assess whether the composition of insect communities differs between two experimental treatments. In such a case, your hypothesis likely depends more on the holistic difference between the treatments rather than some quantitative difference on one of the axes.

The most common non-metric ordination type is called <u>N</u>onmetric <u>M</u>ultidimensional <u>S</u>caling (NMS / NMDS). This approach prioritizes making groups that are "more different" further apart than those that are less different. However, <u>NMS uses a dissimilarity matrix which means that the _distance_ between any two specific points cannot be interpreted meaningfully</u>. It _is_ appropriate though to interpret which cloud of points is closer to/further from another in aggregate. **If specific distances among points are of interest, consider a metric ordination approach.**

In order to perform an NMS ordination we'll first need to calculate a dissimilarity matrix for our response data. The vegan function `metaMDS` is useful for this. This function has many arguments but the most fundamental are the following:

- `comm` = the dataframe of response variables (minus any non-numeric / grouping columns)
- `distance` = the distance/dissimilarity metric to use
    - Note that there is no benefit to using a metric distance because when we make the ordination it will become non-metric
- `k` = number of axes to decompose to -- typically two so the graph can be simple
- `try` = number of attempts at minimizing "stress"
    - Stress is how NMS evaluates how good of a job it did at representing the true differences among groups (lower stress is better)

```{r nms-prep}
#| message: false
#| output: false

# Load needed libraries
library(vegan)

# Get dissimilarity matrix
dissim_mat <- vegan::metaMDS(comm = lichen_df[-1], distance = "bray", k = 2,
                             autotransform = F, expand = F, try = 50)
```

With that in hand, we can make our ordination! While you could make this step-by-step on your own, we'll use the `ordination` function from the `supportR` package for convenience. This function automatically uses colorblind safe colors for up to 10 groups and has some useful base plot defaults (as well as including ellipses around the standard deviation of the centorid of all groups).

```{r nms-ord}
#| fig-align: center
#| fig-width: 7
#| fig-height: 5

# Load the library
library(supportR)

# Make the ordination
supportR::ordination(mod = dissim_mat, grps = lichen_df$treatment, 
                     x = "bottomright", legend = c("A", "B")) #<1>
```
1. This function allows several base plot arguments to be supplied to alter non-critical plot elements (e.g., legend position, point size, etc.)

If the stress is less than 0.15 it is generally considered a good representation of the data. We can see that the ellipses do not overlap which indicates that the community composition of our two groups does seem to differ. We'd need to do real multivariate analysis if we wanted a _p_-value or AIC score to support that but as a visual tool this is still useful.

:::

## Additional Resources

### Papers & Documents

- Abdi, H. & Williams, L.J., [Principal Coponent Analysis](https://wires.onlinelibrary.wiley.com/doi/10.1002/wics.101). WIREs Computational Statistics. **2010**.
- Ramette, A., [Multivariate Analyses in Microbial Ecology](https://pubmed.ncbi.nlm.nih.gov/17892477/). FEMS Microbial Ecology. **2007**.
- Jackson, D.A., [Stopping Rules in Principal Components Analysis: A Comparison of Heuristical and Statistical Approaches](https://esajournals.onlinelibrary.wiley.com/doi/10.2307/1939574). Ecology. **1993**.
- Clarke, K.R., [Non-Parametric Multivariate Analyses of Changes in Community Structure](https://onlinelibrary.wiley.com/doi/10.1111/j.1442-9993.1993.tb00438.x). Australian Journal of Ecology. **1993**.
- Kenkel, N.C. & Oroloci, L., [Applying Metric and Nonmetric Multidimensional Scaling to Ecological Studies: Some New Results](https://esajournals.onlinelibrary.wiley.com/doi/10.2307/1939814). Ecology. **1986**.

### Workshops & Courses

- 

### Websites

-