PsyResearch
ψ   Psychology Research on the Web   



Psychological Methods - Vol 29, Iss 5

Random Abstract
Quick Journal Finder:
Psychological Methods Psychological Methods is devoted to the development and dissemination of methods for collecting, analyzing, understanding, and interpreting psychological data. Its purpose is the dissemination of innovations in research design, measurement, methodology, and quantitative and qualitative analysis to the psychological community; its further purpose is to promote effective communication about related substantive and methodological issues.
Copyright 2024 American Psychological Association
  • Evaluating classification performance: Receiver operating characteristic and expected utility.
    One primary advantage of receiver operating characteristic (ROC) analysis is considered to be its ability to quantify classification performance independently of factors such as prior probabilities and utilities of classification outcomes. This article argues the opposite. When evaluating classification performance, ROC analysis should consider prior probabilities and utilities. By developing expected utility lines (EU lines), this article shows the connection between a classifier’s ROC curve and expected utility of classification. In particular, EU lines can be used to estimate expected utilities when classifiers operate at any ROC point for any given prior probabilities and utilities. EU lines are useful across all situations—no matter if one examines a single classifier or compares multiple classifiers, if one compares classifiers’ potential to maximize expected utilities or classifiers’ actual expected utilities, and if the ROC curves are full or partial, continuous or discrete. The connection between ROC and expected utility analyses reveals the common objective underlying these two methods: to maximize expected utility of classification. Particularly, ROC analysis is useful in choosing an optimal classifier and its optimal operating point to maximize expected utility. Yet, choosing a classifier and its operating point (i.e., changing conditional probabilities) is not the only way to increase expected utility. Inspired by parameters involved in estimating expected utility, this article also discusses other approaches to increase expected utility beyond ROC analysis. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
    Citation link to source

  • Sample size planning for replication studies: The devil is in the design.
    Replication is central to scientific progress. Because of widely reported replication failures, replication has received increased attention in psychology, sociology, education, management, and related fields in recent years. Replication studies have generally been assessed dichotomously, designated either a “success” or “failure” based entirely on the outcome of a null hypothesis significance test (i.e., p< .05 or p > .05, respectively). However, alternative definitions of success depend on researchers’ goals for the replication. Previous work on alternative definitions for success has focused on the analysis phase of replication. However, the design of the replication is also important, as emphasized with the adage, “an ounce of prevention is better than a pound of cure.” One critical component of design often ignored or oversimplified in replication studies is sample size planning, indeed, the details here are crucial. Sample size planning for replication studies should correspond to the method by which success will be evaluated. Researchers have received little guidance, some of which is misguided, on sample size planning for replication goals other than the aforementioned dichotomous null hypothesis significance testing approach. In this article, we describe four different replication goals. Then, we formalize sample size planning methods for each of the four goals. This article aims to provide clarity on the procedures for sample size planning for each goal, with examples and syntax provided to show how each procedure can be used in practice. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
    Citation link to source

  • Selecting scaling indicators in structural equation models (sems).
    It is common practice for psychologists to specify models with latent variables to represent concepts that are difficult to directly measure. Each latent variable needs a scale, and the most popular method of scaling as well as the default in most structural equation modeling (SEM) software uses a scaling or reference indicator. Much of the time, the choice of which indicator to use for this purpose receives little attention, and many analysts use the first indicator without considering whether there are better choices. When all indicators of the latent variable have essentially the same properties, then the choice matters less. But when this is not true, we could benefit from scaling indicator guidelines. Our article first demonstrates why latent variables need a scale. We then propose a set of criteria and accompanying diagnostic tools that can assist researchers in making informed decisions about scaling indicators. The criteria for a good scaling indicator include high face validity, high correlation with the latent variable, factor complexity of one, no correlated errors, no direct effects with other indicators, a minimal number of significant overidentification equation tests and modification indices, and invariance across groups and time. We demonstrate these criteria and diagnostics using two empirical examples and provide guidance on navigating conflicting results among criteria. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
    Citation link to source

  • Extending the actor-partner interdependence model to accommodate multivariate dyadic data using latent variables.
    This study extends the traditional Actor-Partner Interdependence model (APIM; Kenny, 1996) to incorporate dyadic data with multiple indicators reflecting latent constructs. Although the APIM has been widely used to model interdependence in dyads, the method and its applications have largely been limited to single sets of manifest variables. This article presents three extensions of the APIM that can be applied to multivariate dyadic data; a manifest APIM linking multiple indicators as manifest variables, a composite-score APIM relating univariate sums of multiple variables, and a latent APIM connecting underlying constructs of multiple indicators. The properties of the three methods in analyzing data with various dyadic patterns are investigated through a simulation study. It is found that the latent APIM adequately estimates dyadic relationships and holds reasonable power when measurement reliability is not too low, whereas the manifest APIM yields poor power and high type I error rates in general. The composite-score APIM, even though it is found to be a better alternative to the manifest APIM, fails to correctly reflect latent dyadic interdependence, raising inferential concerns. We illustrate the APIM extensions for multivariate dyadic data analysis by an example study on relationship commitment and happiness among married couples in Wisconsin. In cases where the measures are reliable reflections of psychological constructs, we suggest using the latent APIM for examining research hypotheses that discuss implications beyond observed variables. We conclude with stressing the importance of carefully examining measurement models when designing and conducting dyadic data analyses. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
    Citation link to source

  • Estimating and investigating multiple constructs multiple indicators social relations models with and without roles within the traditional structural equation modeling framework: A tutorial.
    The present contribution provides a tutorial for the estimation of the social relations model (SRM) by means of structural equation modeling (SEM). In the overarching SEM-framework, the SRM without roles (with interchangeable dyads) is derived as a more restrictive form of the SRM with roles (with noninterchangeable dyads). Starting with the simplest type of the SRM for one latent construct assessed by one manifest round-robin indicator, we show how the model can be extended to multiple constructs each measured by multiple indicators. We illustrate a multiple constructs multiple indicators SEM SRM both with and without roles with simulated data and explain the parameter interpretations. We present how testing the substantial model assumptions can be disentangled from testing the interchangeability of dyads. Additionally, we point out modeling strategies that adhere to cases in which only some members of a group can be differentiated with regards to their roles (i.e., only some group members are noninterchangeable). In the online supplemental materials, we provide concrete examples of specific modeling problems and their implementation into statistical software (Mplus, lavaan, and OpenMx). Advantages, caveats, possible extensions, and limitations in comparison with alternative modeling options are discussed. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
    Citation link to source

  • Data-driven covariate selection for confounding adjustment by focusing on the stability of the effect estimator.
    Valid inference of cause-and-effect relations in observational studies necessitates adjusting for common causes of the focal predictor (i.e., treatment) and the outcome. When such common causes, henceforth termed confounders, remain unadjusted for, they generate spurious correlations that lead to biased causal effect estimates. But routine adjustment for all available covariates, when only a subset are truly confounders, is known to yield potentially inefficient and unstable estimators. In this article, we introduce a data-driven confounder selection strategy that focuses on stable estimation of the treatment effect. The approach exploits the causal knowledge that after adjusting for confounders to eliminate all confounding biases, adding any remaining non-confounding covariates associated with only treatment or outcome, but not both, should not systematically change the effect estimator. The strategy proceeds in two steps. First, we prioritize covariates for adjustment by probing how strongly each covariate is associated with treatment and outcome. Next, we gauge the stability of the effect estimator by evaluating its trajectory adjusting for different covariate subsets. The smallest subset that yields a stable effect estimate is then selected. Thus, the strategy offers direct insight into the (in)sensitivity of the effect estimator to the chosen covariates for adjustment. The ability to correctly select confounders and yield valid causal inferences following data-driven covariate selection is evaluated empirically using extensive simulation studies. Furthermore, we compare the introduced method empirically with routine variable selection methods. Finally, we demonstrate the procedure using two publicly available real-world datasets. A step-by-step practical guide with user-friendly R functions is included. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
    Citation link to source

  • Updated guidelines on selecting an intraclass correlation coefficient for interrater reliability, with applications to incomplete observational designs.
    Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
    Citation link to source

  • A tutorial on ordinary differential equations in behavioral science: What does physics teach us?
    The present tutorial proposes to use concepts of physics and mathematics to help behavioral scientists to use differential equations in their studies. It focuses on the first-order and the second-order (damped oscillator) differential equation. Simple examples allow to detail the meaning of the coefficients, the conditions of applicability of these differential equations, the underlying hypothesis, and their consequences for the researcher willing to use them. More complex psychological examples demonstrate the importance of parameters’ interpretation. Particular attention is paid to how potential external perturbations should be considered. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
    Citation link to source

  • How survey scoring decisions can influence your study’s results: A trip through the IRT looking glass.
    Though much effort is often put into designing psychological studies, the measurement model and scoring approach employed are often an afterthought, especially when short survey scales are used (Flake & Fried, 2020). One possible reason that measurement gets downplayed is that there is generally little understanding of how calibration/scoring approaches could impact common estimands of interest, including treatment effect estimates, beyond random noise due to measurement error. Another possible reason is that the process of scoring is complicated, involving selecting a suitable measurement model, calibrating its parameters, then deciding how to generate a score, all steps that occur before the score is even used to examine the desired psychological phenomenon. In this study, we provide three motivating examples where surveys are used to understand individuals’ underlying social emotional and/or personality constructs to demonstrate the potential consequences of measurement/scoring decisions. These examples also mean we can walk through the different measurement decision stages and, hopefully, begin to demystify them. As we show in our analyses, the decisions researchers make about how to calibrate and score the survey used has consequences that are often overlooked, with likely implications both for conclusions drawn from individual psychological studies and replications of studies. (PsycInfo Database Record (c) 2024 APA, all rights reserved)
    Citation link to source



Back to top


Back to top