TY - JOUR
T1 - Genetic and psychosocial predictors of aggression
T2 - Variable selection and model building with component-wise gradient boosting
AU - Suchting, Robert
AU - Gowin, Joshua L.
AU - Green, Charles E.
AU - Walss-Bass, Consuelo
AU - Lane, Scott D.
N1 - Funding Information:
This work was supported in part by past National Institutes of Health (NIH) grants NIH DA P50 09262 and NIH DA R01 03166.
Publisher Copyright:
© 2018 Suchting, Gowin, Green, Walss-Bass and Lane.
PY - 2018/5/7
Y1 - 2018/5/7
N2 - Rationale: Given datasets with a large or diverse set of predictors of aggression, machine learning (ML) provides efficient tools for identifying the most salient variables and building a parsimonious statistical model. ML techniques permit efficient exploration of data, have not been widely used in aggression research, and may have utility for those seeking prediction of aggressive behavior. Objectives: The present study examined predictors of aggression and constructed an optimized model using ML techniques. Predictors were derived from a dataset that included demographic, psychometric and genetic predictors, specifically FK506 binding protein 5 (FKBP5) polymorphisms, which have been shown to alter response to threatening stimuli, but have not been tested as predictors of aggressive behavior in adults. Methods: The data analysis approach utilized component-wise gradient boosting and model reduction via backward elimination to: (a) select variables from an initial set of 20 to build a model of trait aggression; and then (b) reduce that model to maximize parsimony and generalizability. Results: From a dataset of N = 47 participants, component-wise gradient boosting selected 8 of 20 possible predictors to model Buss-Perry Aggression Questionnaire (BPAQ) total score, with R2 = 0.66. This model was simplified using backward elimination, retaining six predictors: smoking status, psychopathy (interpersonal manipulation and callous affect), childhood trauma (physical abuse and neglect), and the FKBP5_13 gene (rs1360780). The six-factor model approximated the initial eight-factor model at 99.4% of R2. Conclusions: Using an inductive data science approach, the gradient boosting model identified predictors consistent with previous experimental work in aggression; specifically psychopathy and trauma exposure. Additionally, allelic variants in FKBP5 were identified for the first time, but the relatively small sample size limits generality of results and calls for replication. This approach provides utility for the prediction of aggression behavior, particularly in the context of large multivariate datasets.
AB - Rationale: Given datasets with a large or diverse set of predictors of aggression, machine learning (ML) provides efficient tools for identifying the most salient variables and building a parsimonious statistical model. ML techniques permit efficient exploration of data, have not been widely used in aggression research, and may have utility for those seeking prediction of aggressive behavior. Objectives: The present study examined predictors of aggression and constructed an optimized model using ML techniques. Predictors were derived from a dataset that included demographic, psychometric and genetic predictors, specifically FK506 binding protein 5 (FKBP5) polymorphisms, which have been shown to alter response to threatening stimuli, but have not been tested as predictors of aggressive behavior in adults. Methods: The data analysis approach utilized component-wise gradient boosting and model reduction via backward elimination to: (a) select variables from an initial set of 20 to build a model of trait aggression; and then (b) reduce that model to maximize parsimony and generalizability. Results: From a dataset of N = 47 participants, component-wise gradient boosting selected 8 of 20 possible predictors to model Buss-Perry Aggression Questionnaire (BPAQ) total score, with R2 = 0.66. This model was simplified using backward elimination, retaining six predictors: smoking status, psychopathy (interpersonal manipulation and callous affect), childhood trauma (physical abuse and neglect), and the FKBP5_13 gene (rs1360780). The six-factor model approximated the initial eight-factor model at 99.4% of R2. Conclusions: Using an inductive data science approach, the gradient boosting model identified predictors consistent with previous experimental work in aggression; specifically psychopathy and trauma exposure. Additionally, allelic variants in FKBP5 were identified for the first time, but the relatively small sample size limits generality of results and calls for replication. This approach provides utility for the prediction of aggression behavior, particularly in the context of large multivariate datasets.
KW - Aggression
KW - Boosting
KW - Data science
KW - FKBP5
KW - Machine learning
KW - Psychopathy
KW - Trauma
UR - http://www.scopus.com/inward/record.url?scp=85046690463&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046690463&partnerID=8YFLogxK
U2 - 10.3389/fnbeh.2018.00089
DO - 10.3389/fnbeh.2018.00089
M3 - Article
AN - SCOPUS:85046690463
VL - 12
JO - Frontiers in Behavioral Neuroscience
JF - Frontiers in Behavioral Neuroscience
SN - 1662-5153
M1 - 89
ER -