Converting FDA historical records into searchable openFDA data

ICF modernized the openFDA database, making decades of critical health data searchable, accessible, and actionable for millions of users

Each month, more than 11 million viewers visit the openFDA website. Yet searching this vast resource was cumbersome for users who needed the site’s data for reporting and transparency purposes—and costly to the agency. The U.S. Food and Drug Administration turned to ICF to help modernize the openFDA database to deliver a better experience for the website’s users.

One example is a collection of news releases and public health alerts dating to the agency’s founding in 1913. The information in these historical documents sheds light on the responsibilities and activities of FDA, which has had an outsized impact on the lives of American citizens for more than a century. When the agency’s historian wanted to make this collection easier for users to navigate, FDA approached ICF—which had partnered with the agency on other aspects of the openFDA project—to develop a solution.

Challenge

FDA had already converted thousands of historical documents in the openFDA database from physical to digital. But many of these documents weren’t machine-readable, which made it difficult for users to quickly and efficiently find the information they needed. To solve this challenge—and support FDA’s greater accountability efforts—the agency needed to convert thousands of scanned images to text.

Solution highlights

AI
Digital modernization
Cloud
Human-centered design

Solution

Leveraging open-source optical character recognition solutions, our team quickly, accurately, and securely converted the openFDA documents to machine-readable formats. These AI-powered tools interpreted handwritten, typewritten, and word-processed documents far faster than human staff could, even in cases where the original text was faded or partially obscured.

We also created a series of charts and other visualizations to highlight useful details about the documents, such as the most frequently reported side effects by decade. Finally, our team created APIs on the openFDA database’s back end that allow users to pull data directly into their own tools and systems for research, reporting, and other purposes.

Results

In all, we helped FDA efficiently convert more than 8,500 documents to machine-readable formats. By leveraging AI-powered tools, our team was able to deliver the modernized database with minimal downtime, ensuring the openFDA website could support the agency’s mission to maintain transparency, educate the public, and save lives.