Mijke Rhemtulla

• Home • Research • Publications •

Structural Equation Modeling with Difficult Data

The ideal design from an analysis perspective – one which gathers complete, multivariate normal data from a large simple random sample of participants – is virtually never feasible to implement. Instead, researchers are forced to compromise on some or all of these features to reduce participant burden and research costs. One line of my research focuses on testing and comparing methods for fitting SEM models to ordinal measurement scales, missing data, and long scales, and developing feasible designs that allow researchers to make optimal use of their resources. This line explores the consequences of modeling these prevalent types of data using common methods (e.g., software defaults that are not technically correct) compared to specialized methods (e.g., methods that are technically correct but often more complicated to use). For example, I have studied methods for modeling ordinal data in SEM, examining the circumstances under which ordinal data can be safely treated as continuous without affecting parameter estimates, confidence intervals, or test statistics.

A second interest within this line is the theoretical and practical implications of parceling long scales (i.e., summing several items on a scale to reduce the number of variables in a model) in SEM models. Parceling is often promoted as a strategy that can vastly simplify a model, but it is just as frequently derided for its potential to mask poor-quality data. My approach is to examine how parceling can affect parameter estimates in a model by redefining the model’s latent constructs. Understanding this process will give researchers the tools to decide whether parceling is an appropriate strategy for their particular data, model, and theory.

A third project within this line of research uses the concept of missing information to find optimal planned missing data designs for SEM models. In planned missing data designs, participants are randomly assigned to a particular pattern of missing data (e.g., missing a subset of items, or time points, or both) to reduce research expense and participant burden with no sacrifice to validity. Because the missingness is randomly assigned, it cannot lead to parameter bias (unlike unplanned missing data); however, it will still affect the efficiency with which parameters are estimated (i.e., confidence intervals will be wider). Missing information is a function of the pattern of missing data and the strength of relations among variables; it determines the extent to which parameter efficiency suffers. For example, when two variables are each missing data on large, non-overlapping subsets of cases, there will be very little information left to estimate the correlation between them. But when these variables are very highly correlated with other variables in the dataset that have no missing data, information can be borrowed from those related variables to estimate the correlation, so less information is lost. Using analytic methods to examine population outcomes and simulation to examine sample performance, my research investigates the extent to which particular planned missing data patterns (e.g., 3-form missingness) affect the efficiency of parameter estimates in SEM models.

Modeling Psychological Constructs

SEM relies on a latent variable representation of psychological constructs, in which a set of variables (e.g., items on a scale) are represented as observed indicators of an underlying psychological construct (e.g., a personality trait). The underlying-construct representation is pervasive; even when items are summed to form scale scores, rather than modeled as latent variables, the assumption of an underlying psychological trait remains. A competing representation has recently arisen in the form of network models, in which psychological constructs are seen as emergent properties that arise from interactions among individual variables; for example, depression is construed as a state characterized by feedback loops among symptoms (e.g., insomnia → fatigue → depressed mood), rather than as an unobserved state that independently gives rise to these symptoms. Network models can lead to new insights about the relations among psychological variables, and thus they provide an interesting alternative to latent variable models. I am involved with several projects that apply network models to gain insight into constructs including substance abuse, quality of life, personality, and psychopathology. While network models are a compelling alternative to latent variable models, however, there is a need for methods to determine whether a network or latent variable model is a more accurate representation of a particular set of measures on a particular construct. I am currently working on developing such methods.

Home | Research | Publications

This site was last updated 09/30/15

blogger counters