Multivariate analysis with correlated outcomes in R
2021-11-14
Chapter 1 Background
Do you have correlated outcomes but unsure of how to perform multivariate analysis in R? Then you’re in the right place!
Previous literature has shown the importance of multivariate over univariate analyses when one has multiple correlated outcomes. Multivariate approaches increase the power of the model as well as reducing type I error. (Johnson, Wichern, et al. 2014; Mishra et al. 2021)
While multivariate analyses have been around for a while, they are not often used in research, perhaps due to lack of awareness and complex mathematical and statistical methods. So, let’s increase awareness and learn how to perform a multivariate analysis in R! Step by step.
1.1 Running the analysis
1.2 Use case
In this use case, we are examining the association between social support and seven highly correlated outcomes: volumes of subfields within the hippocampus collected on brain MRI.
1.3 Let’s specify the packages needed:
library(reshape2)
(for the ‘melt’ function)
library(nlme)
(for our multivariate models)
1.4 Let’s set our working directory:
setwd()
1.5 Load in our data:
coroutcomes <- read.csv(file = "coroutcomes.csv", header = TRUE)
1.6 Convert our data into a format relevant for multivariate analysis
coroutcomes_melt <- melt(coroutcomes[, c(".imp", "id", "AgeMRI7T", "sex", "ZICV_mm3", "socsup3","depsymptoms", "emo_ab", "psy_ab", "phys_ab", "sex_ab", "LE", "BAI_cat1", "child_mal", "ZCA1sum", "ZCA2sum", "ZCA3sum", "ZDGsum", "ZERCsum", "ZSUBsum", "ZTailsum", "Zbrain_corrected_mm3", "brain_corrected_mm3", "Tailsum", "SUBsum", "ERCsum", "DGsum", "CA3sum", "CA2sum", "CA1sum", "ZHIPPsum", "HIPPsum", "HIPP_L", "HIPP_R")], measure.vars = c("ZCA1sum", "ZCA2sum", "ZCA3sum", "ZDGsum", "ZERCsum", "ZSUBsum", "ZTailsum"))
Here, we are telling R to select our outcomes, the seven hippocampal subfields (“ZCA1sum,” “ZCA2sum,” “ZCA3sum,” “ZDGsum,” “ZERCsum,” “ZSUBsum,” “ZTailsum”) and creating a new categorical variable called “variable” for them as well as our new outcome variable “value” that holds the values for all subfields.
1.7 Model building
socialsupport <- gls(value ~ -1 + variable + variable:factor(socsup3) + variable:AgeMRI7T + variable:sex + variable:ZICV_mm3, data = coroutcomes_melt, method = "ML", correlation=corSymm(form = ~1 | id), weights = varIdent(form = ~1 | variable))
Here, we add ‘-1’ to withhold the intercept for ease of interpretation. We add here our main predictor, in this case social support or ‘socsup3,’ as well as our covariates: age, sex, and intracranial volume. We specify an unstructured correlation matrix per individual (‘id’) and for each outcome (‘variable’).
Now let’s compare our results to performing a univariate analysis:
socialsupport.univariate <- lm(ZCA1sum ~ factor(socsup3) + AgeMRI7T + sex + ZICV_mm3, data = coroutcomes)
Let’s compare our estimates:
Model | Estimate | SE |
---|---|---|
Univariate linear regression | -0.1155 | 0.1636 |
Linear regression with unstructured correlation | -0.1183 | 0.1425 |
As we see here, our univariate analysis resulted in larger standard errors than our multivariate analysis that considers the correlation between outcomes. Therefore, we see more precision and less probability of making a type-I error.
I hope this tutorial was helpful for you and inspires you to consider performing multivariate analyses over univariate ones when your outcomes are correlated!
Cheers!
- Emma