Establishing a training plan and estimating inter-rater reliability across the multi-site Texas childhood trauma research network

Jeffrey D. Shahidullah, James Custer, Oscar Widales-Benitez, Nazan Aksan, Carly Hatchell, D. Jeffrey Newport, Karen Dineen Wagner, Eric A. Storch, Cynthia Claassen, Amy Garrett, Irma T. Ugalde, Wade Weber, Charles B. Nemeroff, Paul J. Rathouz

Research output: Contribution to journalArticlepeer-review

3 Scopus citations


Objective: Minimal guidance is available in the literature to develop protocols for training non-clinician raters to administer semi-structured psychiatric interviews in large, multi-site studies. Previous work has not produced standardized methods for maintaining rater quality control or estimating interrater reliability (IRR) in such studies. Our objective is to describe the multi-site Texas Childhood Trauma Research Network (TX-CTRN) rater training protocol and activities used to maintain rater calibration and evaluate protocol effectiveness. Methods: Rater training utilized synchronous and asynchronous didactic learning modules, and certification involved critique of videotaped mock scale administration. Certified raters attended monthly review meetings and completed ongoing scoring exercises for quality assurance purposes. Training protocol effectiveness was evaluated using individual measure and pooled estimated IRRs for three key study measures (TESI-C, CAPS-CA-5, MINI-KID [Major Depressive Episodes - MDE & Posttraumatic Stress Disorder – PTSD modules]). A random selection of video-recorded administrations of these measures was evaluated by three certified raters to estimate agreement statistics, with jackknife (on the videos) used for confidence interval estimation. Kappa, weighted kappa and intraclass correlations were calculated for study measure ratings. Results: IRR agreement across all measures was strong (TESI-C median kappa 0.79, lower 95% CB 0.66; CAPS-CA-5 median weighted kappa 0.71 (0.62), MINI-MDE median kappa 0.71 (0.62), MINI-PTSD median kappa 0.91 (0.9). The combined estimated ICC was ≥0.86 (lower CBs ≥0.69). Conclusions: The protocol developed by TX-CTRN may serve as a model for other multi-site studies that require comprehensive non-clinician rater training, quality assurance guidelines, and a system for assessing and estimating IRR.

Original languageEnglish (US)
Article number115168
JournalPsychiatry Research
StatePublished - May 2023


  • Inter-rater reliability
  • Measurement
  • Psychiatry research
  • Reliability
  • Training
  • Trauma

ASJC Scopus subject areas

  • Psychiatry and Mental health
  • Biological Psychiatry


Dive into the research topics of 'Establishing a training plan and estimating inter-rater reliability across the multi-site Texas childhood trauma research network'. Together they form a unique fingerprint.

Cite this