Weighting non-probability and probability sample surveys

By Ronaldo Iachan
Oct 24, 2019
Drs. Lew Berman and Ronaldo Iachan discuss their novel hybrid approach to calculating statistical weights for the National Cancer Institute’s Population Health Assessment initiative.

When the National Cancer Institute (NCI) sought to better understand the populations serviced by its NCI-designated cancer centers, they turned to ICF for help. By using non-probability research methods to design community-based studies for the NCI, and pooling and comparing this data with national data, we were able to support 15 cancer centers. The program was such a success that the NCI added 14 more NCI-designated cancer centers to the initiative. We sat down with the study authors to find out more about this important work (published here). Transcript below:

Q. Can you please provide a summary of the work you did for the National Cancer Institute?

A. We helped several NCI-designated cancer centers design studies to collect information about the population of people who are serviced by the cancer center. Think of a hospital. A hospital has a broad area from which it draws patients—that’s called a catchment area—and in order to receive funding from NCI, it needs to understand what the catchment area is for cancer treatment. And beyond that, it needs to understand what the population within the catchment area knows about cancer care. How aware are they? What do they know about the continuum of cancer? Have they had cancer? Have they used the cancer center? We helped the cancer centers design studies to capture those data. However, designing a study like this can be challenging.

Q. What are some of the challenges?

A. It can be a challenge to select a sample of people into a study that represents the population who use the cancer center or could potentially use it. For example, if you’re dealing with a big area like Texas with a prominent cancer center, it may cover a large portion of the state or the entire state. You couldn’t possibly do a study on every single person who comes into the cancer center to represent the catchment area, so you need to draw people carefully. It can be very expensive and time-consuming to select a probability sample, especially if you are interested in some subpopulations, which leads to the use of non-probability studies instead. But because they don’t follow a statistical framework, you can’t adjust (weight) non-probability study estimates in the same manner as you would when computing estimates from a traditional statistical approach. So, the purpose of our work was to help these cancer centers design studies—using less expensive methods, tools, and techniques—and then find a way on the backend to account for those differences in design, but still come up with statistical estimates that are relevant and meaningful.

Q. That sounds like significant work. How does it fit into the broader survey research industry context?

A. In general, the survey research industry is struggling with the cost, timeliness, and lower response rates on studies. And because of our approach with non-probability designs and the potential for combining them where plausible with statistically valid and carefully controlled studies, you can potentially draw more people into the study for a lower cost in a shorter amount of time. That’s where the real bang for the buck in the industry is. If you don’t have the money to do a statistically rigorous study, then you can look at these non-probability based designs, and if you’ve done both or you want to split it, we have an approach here where you can combine data effectively.

Q. What inspired you to do this work?

A. (from Lew): I had been involved in several state and local studies while I worked as a government employee at the Centers for Disease Control and Prevention. I became interested in sub-national studies because of the direct local impact they can have. So, after starting with ICF, we met with colleagues at the National Cancer Institute and listened to ideas they had about moving their program forward. It aligned with our in-house experimental work at ICF. In fact, we conducted some internally funded studies in four US counties and then presented our results to NCI. Our work would allow NCI to amplify the impact of the large national study they were already running. They liked the idea because they wanted to do catchment area studies around NCI-designated cancer centers. Our non-probability survey design experience could in part facilitate this effort. As a result, we have been supporting NCI and 29 NCI designated cancer centers.

Q. That’s fantastic. The next question is for you, Ronaldo. As the senior statistician on the project, can you share how other researchers can apply this weighting methodology to their work?

A. Sure. Our methodology emphasizes calibrating or ranking as a main tool for standardizing all the study data in a uniform way so that you’re using the same population characteristics across sites. It facilitates the pooling of data so it can be combined and used for more powerful analysis. When you attempt to combine study data across a number of grantees that have completely different study designs, you will encounter challenges in harmonizing the data, the variables, the concept, but also the analysis of the data from a weighted data perspective. Our approach allows you to combine study data effectively.

Q. Final question. Scientists and researchers build on previous findings. Who or what has inspired you in the field of science or public health?

A. (from Lew): I’ve been fortunate over my career that I’ve been inspired by brilliant and creative people at the Naval Research Laboratory, National Institutes of Health, and the Centers for Disease Control and Prevention. One of my colleagues at the CDC developed the ideas around pediatric growth charts. We take it for granted that our children are carefully tracked for development using these charts when we visit the pediatrician. However, I saw first-hand the care and rigor with which data was collected, the analytic datasets were prepared, and the thoroughness with which the analyses were reviewed—it really put into perspective for me the challenges involved with collecting high quality data and the impact that these data can have on important health outcomes.

For a deep dive into this work, including a description of the data weighting methods Lew and Ronaldo developed, along with their results and conclusions, read the full article.

Go to ICF
Meet the author
  1. Ronaldo Iachan, Technical Director and Fellow, Senior Statistician

    Ronaldo is a statistical design and analysis expert with nearly 40 years of experience in survey sampling and evaluation. View bio

Subscribe to get our latest insights