# Prevalence predictor

Predicting prevalence of an outcome from point level data

# Rationale

Cross-sectional surveys are widely used to gather data on health outcomes among a population. These include prevalence of infection, prevalence of symptoms and coverage of vaccines and other interventions. Surveys are, however, expensive and as such are typically only conducted on a small fraction of the population. This leaves gaps in our understanding of these outcomes across the wider population. To fill in the gaps requires spatial modeling which is often not available to health programs. Even when available to programs, it is often expensive and time consuming.

Example of using prevalence predictor to create predictive maps of access to water in Zimbabwe

# Our approach

We have developed algorithms to automate the generation of predictive maps from survey data, simplifying access to these approaches. The algorithm can handle data from a single time point, as well as multiple time points, and can use predictors such as gridded environmental or climatological variables. Predictors can be supplied by users, but we also have pre-processed layers for every country that include information on distance to roads and rivers, average precipitation and temperature, seasonality of precipitation and temperature and population density. We use a Geoadditive modeling framework, which allows the algorithm to find non-linear associations with predictors and allows for the inclusion of a spatial (or spatio-temporal) effect which enables the algorithm to borrow strength across space and time. Once the algorithm has been fit, it can be used to predict the probability of the outcome across a grid, using the values of the predictors in that grid cell and the grid cell's location in space and time. As well as making a best guess of the probability/prevalence of the outcome in a given location, the algorithm generates a range of possibly values. Where the algorithm is more uncertain about the prediction, this range of values will be larger. Where it is more certain, the range will be smaller. This range of values can be used to quantify the uncertainty of the predictions, but also quantify how likely the prevalence of the outcome is greater or less than a user supplied threshold. For example, a disease program might be interested in predicting which locations have a prevalence of >5%. Predictions can be made across a grid, at user-defined points or over polygons.

In addition to making predictions, it is possible to use the algorithm to obtain the optimal locations for further surveys where the goal is to improve the predictions further. See here (opens new window) for further details of this so-called adaptive sampling where the goal is to improve predictions of hotspot locations.

# Implementations

This algorithm has been used to map hotspots of lymphatic filariasis (LF) in Samoa and will be used in a similar way as part of upcoming LF mapping exercises in Mali and Tanzania. This algorithm is also being used to help map vaccine coverage as well as mapping COVID-19 symptoms across the USA and risk of severe COVID-19 in Zimbabwe.

Think this sounds useful?

You can reach us at hello@locational.io to ask any questions, request additions or changes, or arrange a demo. We are actively developing these algorithms and would like to hear from you.