TY - JOUR
T1 - Federated learning for cognitive impairment detection using speech data
AU - Blazquez-Folch, Josep
AU - Limones Andrade, María
AU - Calm, Berta
AU - Auñón García, Juan Miguel
AU - Alegret, Montserrat
AU - Muñoz, Nathalia
AU - Cano, Amanda
AU - Fernández, Victoria
AU - García-Gutiérrez, Fernando
AU - De Rojas, Itziar
AU - García-González, Pablo
AU - Olivé, Clàudia
AU - Puerta, Raquel
AU - Capdevila-Bayo, María
AU - Muñoz-Morales, Álvaro
AU - Bayón-Buján, Paula
AU - Miguel, Andrea
AU - Montrreal, Laura
AU - Espinosa, Ana
AU - Sanz-Cartagena, Pilar
AU - Rosende-Roca, Maitee
AU - Zaldua, Carla
AU - Gabirondo, Peru
AU - Cantero-Fortiz, Yahveth
AU - Gurruchaga, Miren Jone
AU - Tarraga, Lluis
AU - Boada, Mercè
AU - Ruiz, Agustín
AU - Marquié, Marta
AU - Valero, Sergi
N1 - Publisher Copyright:
Copyright © 2025 Blazquez-Folch, Limones Andrade, Calm, Auñón García, Alegret, Muñoz, Cano, Fernández, García-Gutiérrez, De Rojas, García-González, Olivé, Puerta, Capdevila-Bayo, Muñoz-Morales, Bayón-Buján, Miguel, Montrreal, Espinosa, Sanz-Cartagena, Rosende-Roca, Zaldua, Gabirondo, Cantero-Fortiz, Gurruchaga, Tarraga, Boada, Ruiz, Marquié and Valero.
PY - 2025
Y1 - 2025
N2 - Introduction: In Alzheimer’s disease (AD) research, clinical, neuroimaging, genetic, and biomarker data are vital for advancing its understanding and treatment. However, privacy concerns and limited datasets complicate data sharing. Federated learning (FL) offers a solution by enabling collaborative research while preserving data privacy. Methods: This study analyzed data from patients assessed at the Memory Unit of the Ace Alzheimer Center Barcelona who completed a standardized digital speech protocol. Acoustic features extracted from these recordings were used to distinguish between cognitively unimpaired (CU) and cognitively impaired (CI) individuals. The aim was to evaluate how data heterogeneity impacted the FL model performance across three scenarios: (1) equal contributions and class ratios, (2) unequal contributions, and (3) imbalanced class ratios. In each scenario, the performance of local models trained using an MLP feed-forward neural network on institutional data was analyzed and compared to a global model created by aggregating these local models using Federated Averaging (FedAvg) and Iterative Data Aggregation (IDA). Results: The cohort included 2,239 participants: 221 CU individuals (mean age 66.8, 64.7% female) and 2,018 CI subjects, comprising 1,219 with mild cognitive impairment (mean age 74.3, 61.9% female) and 799 with mild AD dementia (mean age 80.8, 64.8% female). In scenarios 1 and 3, FL provided modest gains in accuracy and AUC. In scenario 2, FL markedly improved performance for the smaller dataset (balanced accuracy rising from 0.51 to 0.80) while preserving 0.86 accuracy in the larger dataset, highlighting scalability across heterogeneous conditions. Conclusion: These findings demonstrate the potential of FL to enable collaborative modeling of speech-based biomarkers for cognitive impairment detection, even under conditions of data imbalance and institutional disparity. This work highlights FL as a scalable and privacy-preserving approach for advancing digital health research in neurodegenerative diseases.
AB - Introduction: In Alzheimer’s disease (AD) research, clinical, neuroimaging, genetic, and biomarker data are vital for advancing its understanding and treatment. However, privacy concerns and limited datasets complicate data sharing. Federated learning (FL) offers a solution by enabling collaborative research while preserving data privacy. Methods: This study analyzed data from patients assessed at the Memory Unit of the Ace Alzheimer Center Barcelona who completed a standardized digital speech protocol. Acoustic features extracted from these recordings were used to distinguish between cognitively unimpaired (CU) and cognitively impaired (CI) individuals. The aim was to evaluate how data heterogeneity impacted the FL model performance across three scenarios: (1) equal contributions and class ratios, (2) unequal contributions, and (3) imbalanced class ratios. In each scenario, the performance of local models trained using an MLP feed-forward neural network on institutional data was analyzed and compared to a global model created by aggregating these local models using Federated Averaging (FedAvg) and Iterative Data Aggregation (IDA). Results: The cohort included 2,239 participants: 221 CU individuals (mean age 66.8, 64.7% female) and 2,018 CI subjects, comprising 1,219 with mild cognitive impairment (mean age 74.3, 61.9% female) and 799 with mild AD dementia (mean age 80.8, 64.8% female). In scenarios 1 and 3, FL provided modest gains in accuracy and AUC. In scenario 2, FL markedly improved performance for the smaller dataset (balanced accuracy rising from 0.51 to 0.80) while preserving 0.86 accuracy in the larger dataset, highlighting scalability across heterogeneous conditions. Conclusion: These findings demonstrate the potential of FL to enable collaborative modeling of speech-based biomarkers for cognitive impairment detection, even under conditions of data imbalance and institutional disparity. This work highlights FL as a scalable and privacy-preserving approach for advancing digital health research in neurodegenerative diseases.
KW - Alzheimer’s disease
KW - cognitive impairments
KW - deep learning
KW - federated learning
KW - speech acoustics
UR - https://www.scopus.com/pages/publications/105019524901
UR - https://www.scopus.com/pages/publications/105019524901#tab=citedBy
U2 - 10.3389/frai.2025.1662859
DO - 10.3389/frai.2025.1662859
M3 - Article
C2 - 41141907
AN - SCOPUS:105019524901
SN - 2624-8212
VL - 8
JO - Frontiers in Artificial Intelligence
JF - Frontiers in Artificial Intelligence
M1 - 1662859
ER -