Hidden Markov Model for integrating longitudinal, clinical, and microbiome data to predict Growth Faltering in preterm infants

PAS 2021 Virtual

April 30 – May 4, 2021


View the Presentation


The gut microbiota for preterm infants is influenced heavily by preterm birth, the NICU environment, feeding, and clinical care, leading to variable and unpredictable outcomes among infants, particularly growth faltering (GF).

We hypothesized that we could improve personalized nutrition and clinical care for preterm infants, which in turn could enhance growth through a predictive framework based on gut bacterial community types (GCT). This framework incorporates transitions between the GCTs as well as the clinical and feeding decisions that influence these transitions.

We developed novel machine learning methods based on Hidden Markov models (HMMs) to predict GF in preterm infants. Our method learns two different models, one for preterm infants with GF and another for those with normal growth (GN). Each model uses microbiome and clinical data to assign temporal samples to GCTs and to learn transition between states. Given their probabilistic nature, our methods can overcome noise and uncertainty in the data, differences in sampling rates among infants, and missing data, while still learning a concise model that can be used to predict the probability of GF. Another advantage of using HMMs is their ability to use datasets of varying duration for predicting outcomes.

Longitudinal stool samples and clinical data were collected from preterm infants (n = 259), including infants with GF (n = 97) and GN (n = 162), from 3 NICUs. Following standard metagenomic analysis, 440 microbial taxa at the species-level were reported across 2,923 samples. The HMMs were learned using a clustering algorithm based on Dirichlet Multinomial Mixtures (DMM), and model performance to predict growth failure was assessed via 5-fold cross-validation. This approach improved models that only used clinical data for such predictions. The models allow specific assessment of differences between the two sets (GF and GN) including both the GCTs each is assigned to and transitions between GCTs.

Our results indicate that HMMs combining microbiota composition with clinical data result in enhanced model performance when compared to models with clinical data alone. A key advantage of identifying preterm infants at risk for GF using such methods is that  our HMM models can be extended to determine the impact of various clinical decisions and interventions (feeding, medications) to promote more personalized nutrition and treatments.


S. Xu1, J. Lugo-Martinez1, A. Tandon2, D. Genetti2, J. Levesque2, D. Gallagher2, T. Warren2, Z. Bar-Joseph1

1 Carnegie Mellon University,2 Astarte Medical