Do we really expect to predict the spread – with confidence – of a disease that was unheard of six months ago?

All models are wrong, but some are useful.

George Box, Statistician

This quote is usually attributed to the statistician George Box, and illustrates a principle generally understood by data scientists. The public is experiencing a crash course in modelling as we make our way through COVID-19. Even when the Government reports our GDP figures for the past, they usually have to revise them. Do we really expect to predict the spread of a disease that was unheard of six months ago with confidence, and what should we do with that information?  

The core tools available to ensure good decision making include lessons from history, open robust debate, real-world evidence, and modelling. In a world wary of data misuse, data scientists have an opportunity to provide some of the most important insights – not to maximise the number of clicks or likes, but to analyse what may happen under different scenarios affecting life and death. 

Modelling the spread of an infectious disease using mathematics and statistics goes right back to Bernoulli in 1786, but, until recently, the models were generally retrospective. The data was collected after the fact, and epidemiologists used models to explain what happened in an effort to understand the dynamics of the disease. Today is a whole new ballgame, with near real-time data sharing and a plethora of professional and amateur data scientists, with time on their hands, interpreting that data for us all. 

There have been some excellent articles explaining models in more detail, including why it is so difficult to get things right. Some of my favourites have been Why It’s So Freaking Hard To Make A Good COVID-19 Model and A call to honesty in pandemic modeling

Epidemic models track outcomes (infections, deaths, recoveries) by representing the individuals or proportion of a population who are susceptible (haven’t contracted the disease), infected, or removed (died or recovered and now immune). A person can transition between these states according to the dynamics of the disease, and the behaviour of the population. 

The most important number in any of these models is known as the reproduction number (usually called R0). This is essentially the rate at which each infected person transmits the disease to those who are susceptible. There’s plenty of complexity behind the dynamics, but it all boils down to this simple rule: 

If, on average, an infected person transmits the disease to more than one other, then the epidemic grows; if, on average, they transmit to less than one other, then the epidemic will subside.  

If you’ve studied exponential growth or compound interest then you understand that a small percentage increase at first may seem insignificant, but at some point that will get out of control. This is why, if we want to limit the impact, society has to act before it seems that we have a problem. 

This critical reproduction rate depends on properties of the disease itself, as well as on our behaviour. COVID-19 is highly contagious. Exactly how contagious isn’t fully knowable without more history and testing, but it appears to be more contagious than influenza; it happens to also be more deadly. It is less contagious than other diseases such as measles and polio, but those have the advantage of a vaccine. While we let the scientists work as hard and fast as they can on (1) producing a vaccine, and (2) producing treatments to minimise the effects of COVID-19, the only real tool at society’s disposal to reduce R0 is what has become known to us as social distancing. The goal of social distancing is to reduce the reproduction rate below one.  

So far, New Zealand has outperformed the expectations of most models. Isolating society into bubbles should initially have confined the spread to within each bubble; as we spend more time with a few people, if someone in your bubble has COVID-19 then you are highly susceptible. Community transmission is then limited to interactions between infected bubbles and non-infected bubbles.

The possibility of eradication is real, but it’s a long shot. At any given time a small number of us have COVID-19 without realising. We spend time with our bubble, and then one of our bubble goes to the grocery store. There is some risk that the disease is now transferred to an essential worker, who has contact with other people. Some of these people will contract COVID-19 and take it back to their bubbles. Stopping this cycle entirely, and also effectively closing the borders to block any new cases, is harder than we would like to admit. 

So why do we use those models? Can anyone know what will happen with confidence? No, but that is the wrong question. We build models to help us make better decisions, not to replace the decision makers. We test scenarios because they could happen, not because they will. This is a real-life dynamic situation where we have better data and models than we’ve ever had before – we just need to know how to use them.  

Usually data scientists build forecasts for accuracy, but in health we often build models with the aim of being wrong. We predict hospital admissions in an effort to avoid them. We believe that data-driven decisions are better than the alternative. With COVID-19 you will see data science tested and sometimes seem to be off the mark. Perversely, the more we listen to the models and the better we respond, the more it may seem that we overreacted. 

May we learn to live with the uncertainty, listen to experts, embrace the challenges this holds, and use data and models to their best effect. 

Kevin and his team are supporting New Zealand’s modelling for COVID-19 through daily scenarios based on the latest cases and research.