{{vm.result.Pagination.TotalResults}} ResultsResult
Home / Blog / Health

Health Research is Time-Consuming and Expensive. Can We Do Better Without Compromising Accuracy?

Sep 28, 2017 6 Min. Read
Machine learning could make document classification easier and more accurate — if we deploy it correctly.

From climate change to opioid addiction, we are facing serious public health crises that put our  research and data management experts to the test. When it comes to scientific evidence, systematic literature reviews—painstaking assessments of all the literature ever produced on a given subject—are often regarded as the gold standard. Though no research method is foolproof, says Vox health correspondent Julia Belluz, “these studies represent the best available syntheses of global evidence about the likely effects of different decisions, therapies and policies.”

That comprehensiveness comes at high price, though, in terms of time and money. It involves sifting through enormous volumes of literature--sometimes hundreds of thousands of scientific abstracts--stored in academic databases. Researchers use broad keywords to query these databases and capture as many results as possible, but the task of wading through those documents falls to subject matter experts in what turns out to be a manual, time-intensive, and expensive process.

Is Automation the Answer?

Machine learning could make the document classification process easier and more accurate, but faces barriers to adoption in certain contexts. Because systematic literature reviews have the power to influence clinical and public health practices--and, by extension, health outcomes for patients and communities--the stakes are much higher than usual. If Netflix fails to recommend a film its users would like, the fallout is pretty minimal. If a widely-disseminated systematic literature review fails to account for key research, though, lives could be at stake.

Examples of machine learning methods include:

  • Document classification technologies, like text analytics, which use computational algorithms to detect and exploit patterns in large volumes of text
  • Natural language processing (NLP), in which  machines use grammar and linguistic structure to analyze text in similar ways to how humans process language

Machine learning practitioners prioritize scientific defensibility and the reliability of their predictions when assessing model performance. They need to ensure that exempting a proportion of results from manual review based on text analytics technology does not result in omission of more than an acceptable level of relevant documents. Typically, regulators require that no more than 5% of relevant articles are omitted from a systematic review of the literature.

Automated document classification technologies can be broadly divided into two categories: supervised and unsupervised machine learning. Supervised machine learning methods use a training dataset—a set of instructions developed by the researcher to help the computer build a predictive model—to classify documents whose relevance status is unknown and also to produce metrics of the machine’s expected classification accuracy. Unsupervised machine learning does not include the time-consuming creation of a training dataset, but it requires users to devise classification rules and cannot explicitly predict model performance.

text analytics machine learning

By Arun Varghese
Improve Public Health Outcomes with ICF
ICF has been at the epicenter of critical global health issues for nearly half a century, aligned with our clients to improve health outcomes and promote well-being.