• Plataforma de vídeos del IIT
  • Twitter
  • LinkedIn
  • Asociación de Ingenieros del ICAI
  • Intranet
  • Spanish
Go top
Paper information

Clinical characteristics and prognostic factors for Crohn’s disease relapses using natural language processing and machine learning – a pilot study

F. Gomollón, J. P. Gisbert, I. Guerra, R. Plaza, R. Pajares Villarroya, L. Moreno Almazán, M.C. López Martín, M. Domínguez Antonaya, M.I. Vera Mendoza, J. Aparicio, V. Martínez, I. Tagarro, A. Fernández-Nistal, S. Lumbreras, C. Maté, C. Montoto

European Journal of Gastroenterology & Hepatology Vol. 34, nº. 4, pp. 389 - 397

Summary:

Background 

The impact of relapses on disease burden in Crohn’s disease (CD) warrants searching for predictive factors to anticipate relapses. This requires analysis of large datasets, including elusive free-text annotations from electronic health records. This study aims to describe clinical characteristics and treatment with biologics of CD patients and generate a data-driven predictive model for relapse using natural language processing (NLP) and machine learning (ML).

Methods 

We performed a multicenter, retrospective study using a previously validated corpus of CD patient data from eight hospitals of the Spanish National Healthcare Network from 1 January 2014 to 31 December 2018 using NLP. Predictive models were created with ML algorithms, namely, logistic regression, decision trees, and random forests.

Results 

CD phenotype, analyzed in 5938 CD patients, was predominantly inflammatory, and tobacco smoking appeared as a risk factor, confirming previous clinical studies. We also documented treatments, treatment switches, and time to discontinuation in biologics-treated CD patients. We found correlations between CD and patient family history of gastrointestinal neoplasms. Our predictive model ranked 25 000 variables for their potential as risk factors for CD relapse. Of highest relative importance were past relapses and patients’ age, as well as leukocyte, hemoglobin, and fibrinogen levels.

Conclusion 

Through NLP, we identified variables such as smoking as a risk factor and described treatment patterns with biologics in CD patients. CD relapse prediction highlighted the importance of patients’ age and some biochemistry values, though it proved highly challenging and merits the assessment of risk factors for relapse in a clinical setting.


Keywords: artificial intelligence, big data, electronic health records, inflammatory bowel disease, natural language processing


JCR Impact Factor and WoS quartile: 2.566 - Q4 (2020)

DOI reference: DOI icon 10.1097/MEG.0000000000002317

Published on paper: April 2022.

Published on-line: December 2021.



Citation:
F. Gomollón, J. P. Gisbert, I. Guerra, R. Plaza, R. Pajares Villarroya, L. Moreno Almazán, M.C. López Martín, M. Domínguez Antonaya, M.I. Vera Mendoza, J. Aparicio, V. Martínez, I. Tagarro, A. Fernández-Nistal, S. Lumbreras, C. Maté, C. Montoto. Clinical characteristics and prognostic factors for Crohn’s disease relapses using natural language processing and machine learning – a pilot study. European Journal of Gastroenterology & Hepatology. Vol. 34, nº. 4, pp. 389 - 397, April 2022. [Online: December 2021]