Predicting language outcome with network science and machine learning

Borovsky, A., Thal, D., & Leonard, L. B. (2021). Moving towards accurate and early prediction of language delay with network science and machine learning approaches. Scientific reports, 11(1), 1-12.

Aim of the paper:

There is currently no tool or test for clinicians to reliably decide whether or not a young child is at risk of developing a language disorder. The authors of this paper hope to develop a model that can predict low language outcomes in children by analysing measures of early language skills using machine learning and network science. To do so, the authors obtained early language measures and later language outcome from longitudinal datasets, and then created semantic networks based on children’s early expressive vocabulary. Finally, the authors used machine learning to create a predictive model of later language outcome based on early language measures from the databases as well as network measures from the networks created.


Semantic network: a network graph that describes how vocabulary items relate to each other in terms of meaning.

Semantics: the meaning of words and how they are related to other words.

What they found:

· The predictive model had reliable predictions of later language outcome with over 90% accuracy within a single dataset.

· Network measures based on children’s early expressive vocabulary were strongest predictors of later language outcome.

· The predictive model generated from one dataset show modest predictive power when used in another dataset. This is likely due to the difference in outcome variable and children’s age between different datasets.

What does this mean?

Overall, the findings of this paper show that the use of network science and machine learning approaches may be able to help improve early identification accuracy of language disorder. While predictive accuracy of the models is over 90% when used within the dataset that it is generated from, there was a large decrease in accuracy when used in another dataset. This suggests that further research is needed to understand how differences between datasets, such as outcome measures, age of children and demographics, affect early identification of language disorder.

The finding also show that network measures are strong predictors of later language outcome. As these network measures reflect the semantic structure of language, the authors argued that the roots of language disorder are likely to develop from differences in learning semantics, or the meaning of words. Current screening methods might be improved by including semantic structure assessments of early language skills.

Where can I read this paper?

This paper is open access, which means everyone can read it. Please click here to find the full paper.