One of the major issues in the matching procedures is the presence of missing data on the covariates or outcome indicator since matching requires comparing the values of covariates for units in control and treated subgroups or relies on the predictions from a logistic regression model, and with missing values in the covariates within the model, the comparison or predictions cannot be done for that unit. There are a couple of solutions to address this problem (including the complete-case analysis) and with flaws and limitations in these approaches, adopting algorithms to multiply impute the missing data is growing as a popular alternative.
The mice
and Amelia
packages are widely accepted statistical tools for imputing the ignorable missing data in the R platform. The MatchThem
package simplifies the process of matching the imputed datasets by these packages and enables credible adoption of the two matching approaches (within and across) and several matching methods in practice.
The MatchThem
package can be installed from the Comprehensive R Archive Network (CRAN) repository as follows:
install.packages("MatchThem")
The latest (though unstable) version of the package can be installed from GitHub as follows:
devtools::install_github(repo = "FarhadPishgar/MatchThem")
Adopting algorithms to multiply impute the missing data, before the matching procedure, and the matching procedure itself may seem to be complicated tasks. This suggested workflow tries to map out this process into five steps (please see the package cheat sheet for more details):
- Imputing the Missing Data in the Dataset:
mice
andAmelia
packages should be used to multiply impute the missing data in the dataset. - Matching the Imputed Datasets:
matchthem()
from theMatchThem
package should be used to select matched units from control and treated subgroups of each imputed dataset. - Assessing Balance on the Matched Datasets:
cobalt
package should be used to assess the extent of balance for all covariates in the imputed datasets after matching. - Analyzing the Matched Datasets:
with()
from theMatchThem
package should be used to estimate causal effects in each matched dataset. - Pooling the Causal Effect Estimates:
pool()
from theMatchThem
package should be used to pool the obtained causal effect estimates from analyzing each dataset.
The logo for this package, a trip to the Arctic, was designed and kindly provided by Max Josino (please see his website and Dribble to see his works).
We would like to thank the CRAN team members for their technical support and comments on the package performance. This package relies on the MatchIt
, mice
, and WeightIt
packages. Please cite their reference manuals and vignettes in your work besides citing the reference manual and vignette of this package.
Farhad Pishgar
Noah Greifer
Clémence Leyrat
Elizabeth Stuart