TY - JOUR
T1 - A Look at Demographics and Transition to Virtual Assessments
T2 - An Analysis of Bias in the American Board of Surgery General Surgery Certifying Exams
AU - Ibáñez, Beatriz
AU - Jones, Andrew T.
AU - Jeyarajah, D. Rohan
AU - Dent, Daniel L.
AU - Prendergast, Caroline
AU - Barry, Carol L.
N1 - Publisher Copyright:
© 2024 Association of Program Directors in Surgery
PY - 2024/4
Y1 - 2024/4
N2 - OBJECTIVE: The goals of this study were (1) to assess if examiner ratings in the American Board of Surgery (ABS) General Surgery Cetifying Exam (CE) are biased based on the gender, race, and ethnicity of the candidate or the examiners, and (2) if the format of delivering of the exams, in-person or virtual, affects how examiners rate candidates. DESIGN: We included every candidate-examiner combination for first time takers of the general surgery oral exam. Total scores and pass/fail outcomes based on the 4 scores given by examiners to candidates were analyzed using multilevel models, with candidates as random effects. Explanatory variables included the gender, race, and ethnicity of candidates and examiners, and the format of the exam (in-person or virtual). Candidates’ first attempt scores on the ABS General Surgery Qualifying Exam (QE) were also included in the models to control for the baseline knowledge of the candidate. Three sets of models were evaluated for each demographic variable (gender, race, ethnicity) due to missingness in data. p-values and coefficients of determination R2 were used to quantify the statistical and practical significance of the model coefficients (an existent relationship between the explored variables on CE scores was considered statistically and practically significant if the p-value was lower than 0.01 and R2 higher than 1%). PARTICIPANTS: All first-time takers of the American Board of Surgery General Surgery Certifying Exam from 2016 to 2022 that had demographic data, and the examiners that participated in those exams. RESULTS: The number of candidates/examiners for the 3 sets of models was 8665/514 (gender), 5906/465 (race), and 4678/295 (ethnicity). The demographic variables, format of the exam, or their interactions were not found to significantly relate to examiner-candidate ratings or pass/fail outcomes. The only variable that was significantly related to CE scores was candidates’ QE scores, which was added to the models as a measure of candidates’ initial knowledge; this held for all models for total scores (F[1,8659] = 1069.89, p-value < 0.01, R2 = 5% [gender models], F(1,5696.3) = 589.13, p-value < 0.01, R2 = 5% [race models], F(1,4459.5) = 278.33, p-value < 0.01, R2 = 5% [ethnicity models]), and pass/fail outcomes (CI = 1.61-1.73, p-value < 0.01, R2 = 3% [gender models], CI = 1.67-1.85, p-value < 0.01, R2 = 3% [race models], CI = 2.17-2.90, p-value < 0.01, R2 = 3% [ethnicity models]). CONCLUSIONS: This study shows that there is not a relationship between candidate and examiner gender, race, or ethnicity, and exam outcomes based on statistical models looking at examiner-candidate ratings and pass/fail outcomes. In addition, the delivery of the certifying exam in a virtual format appears to have no statistical impact on outcomes compared to in-person delivery. This suggests that the ABS is performing well in both demographic bias and virtual space.
AB - OBJECTIVE: The goals of this study were (1) to assess if examiner ratings in the American Board of Surgery (ABS) General Surgery Cetifying Exam (CE) are biased based on the gender, race, and ethnicity of the candidate or the examiners, and (2) if the format of delivering of the exams, in-person or virtual, affects how examiners rate candidates. DESIGN: We included every candidate-examiner combination for first time takers of the general surgery oral exam. Total scores and pass/fail outcomes based on the 4 scores given by examiners to candidates were analyzed using multilevel models, with candidates as random effects. Explanatory variables included the gender, race, and ethnicity of candidates and examiners, and the format of the exam (in-person or virtual). Candidates’ first attempt scores on the ABS General Surgery Qualifying Exam (QE) were also included in the models to control for the baseline knowledge of the candidate. Three sets of models were evaluated for each demographic variable (gender, race, ethnicity) due to missingness in data. p-values and coefficients of determination R2 were used to quantify the statistical and practical significance of the model coefficients (an existent relationship between the explored variables on CE scores was considered statistically and practically significant if the p-value was lower than 0.01 and R2 higher than 1%). PARTICIPANTS: All first-time takers of the American Board of Surgery General Surgery Certifying Exam from 2016 to 2022 that had demographic data, and the examiners that participated in those exams. RESULTS: The number of candidates/examiners for the 3 sets of models was 8665/514 (gender), 5906/465 (race), and 4678/295 (ethnicity). The demographic variables, format of the exam, or their interactions were not found to significantly relate to examiner-candidate ratings or pass/fail outcomes. The only variable that was significantly related to CE scores was candidates’ QE scores, which was added to the models as a measure of candidates’ initial knowledge; this held for all models for total scores (F[1,8659] = 1069.89, p-value < 0.01, R2 = 5% [gender models], F(1,5696.3) = 589.13, p-value < 0.01, R2 = 5% [race models], F(1,4459.5) = 278.33, p-value < 0.01, R2 = 5% [ethnicity models]), and pass/fail outcomes (CI = 1.61-1.73, p-value < 0.01, R2 = 3% [gender models], CI = 1.67-1.85, p-value < 0.01, R2 = 3% [race models], CI = 2.17-2.90, p-value < 0.01, R2 = 3% [ethnicity models]). CONCLUSIONS: This study shows that there is not a relationship between candidate and examiner gender, race, or ethnicity, and exam outcomes based on statistical models looking at examiner-candidate ratings and pass/fail outcomes. In addition, the delivery of the certifying exam in a virtual format appears to have no statistical impact on outcomes compared to in-person delivery. This suggests that the ABS is performing well in both demographic bias and virtual space.
KW - assessment
KW - bias
KW - certification
KW - demographics
KW - examiner
KW - scoring
UR - http://www.scopus.com/inward/record.url?scp=85186225103&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85186225103&partnerID=8YFLogxK
U2 - 10.1016/j.jsurg.2024.01.001
DO - 10.1016/j.jsurg.2024.01.001
M3 - Article
C2 - 38402095
AN - SCOPUS:85186225103
SN - 1931-7204
VL - 81
SP - 578
EP - 588
JO - Journal of Surgical Education
JF - Journal of Surgical Education
IS - 4
ER -