COVID reporting riddles

Data quality and interpretation during pandemic times

This blog comes into your line of vision courtesy of not just me, but also number-crunching colleague Dr Sam Chew, who works on analytics in a different sector but applied those quantitative skills to datasets explained herein. Thank you Sam.

I’ve written previously about the weird ‘deaths each day from COVID-19’ reporting system in the UK, both from the point of view of the reporting agency (specifically here, NHS England), and the widespread parroting of these numbers by too many in the media, even though they know full well the numbers are wrong for their purposes, and the subsequent interpretations of their viewers/readers.

In short, NHS England report one top headline number each day of new deaths reported by NHS Trusts. This is widely presented as, and assumed to be, ‘people that have died of COVID-19 in the last 24 hours’.

It is not that. Not in any way.

It is ‘deaths that have been reported in the last 24 hours’, in essence a very dull and unhelpful statistic. As we have seen, the actual date of death is overwhelmingly not from the previous days, due to (understandable) delays in NHS Trusts getting their data out. Most of those people died 2–5 days back, with some deaths taking up to 2 weeks or so to enter into the public domain.

The actual ’date of death’ as a far more interesting statistic that supports decision-making and public understanding. NHS England do now publish some useful spreadsheets, updated daily, showing exactly that. Which does beg the question, why one day they simply don’t say “right everyone, we’re now doing it like this, no more newly-reported numbers at the top of our press releases, here’s the date of death data with a 4-day delay on it”.

Anyway, having laid the groundwork, m’colleague Dr Chew here has looked at numbers of both actual date of death, and the ‘number of newly-reported deaths today’. See figure 1. It looks at just NHS England data of hospital deaths, so not including the devolved nations here, nor other data sources like ONS.

 
Figure 1. Date of death (blue) and daily reporting of new deaths (orange)

Figure 1. Date of death (blue) and daily reporting of new deaths (orange)

 

And from April 8th onwards up until about April 22nd (so we’re not including the bias of lag-time of Trust reporting), the daily reported number is typically much greater than the actual number of deaths that day. To quote a BBC journalist in an email from 7 April, “I guess if they under-report on the way up, they over-report on the way down”. Yes indeed.

For example, reported number of deaths on 30 March was 159, actual number of deaths that day is 585.
On April 10th, 866 deaths were reported, the highest daily reported number. 697 people actually died on that day.
And on April 21st, 778 was the reported number that day. Thus far, actual deaths are 442.

Big differences on those, and other days, which shows the pointlessness of the style of reporting, both from NHS England and the media.

One other interesting aspect that Sam looked at is the day of the week for actual deaths, and daily reporting of death. See figure 2 below.

 
Figure 2. Day of the week, of actual death (blue), and daily reporting of new deaths (orange)

Figure 2. Day of the week, of actual death (blue), and daily reporting of new deaths (orange)

 

Looking at the blue column (actual day of death), it’s intriguingly consistent. The full range is only 2356 (Tuesday) to 2796 (Wednesday). No evidence of a negative weekend effect there, a concept that Jeremy Hunt was such a fan of during his astonishingly-alienating time as Health Secretary here in the UK.

The day of the week with most newly-reported deaths (orange columns) is Saturday (3248), with lower end of the range as Monday (2014). Potentially some evidence of a weekend effect around Trust reporting, but not the actual day that people die. Again, this is rather a key distinction and an important consideration on how data is reported and publicly considered if, for example, you’re intrigued to know about how the NHS performs at the weekend.

One final graph — see figure 3 below. This shows the typical lag time in reporting, it’s overall about 3–4 days from actual date of death, to death being reported.

 
Figure 3. Reporting of actual death, and lagtime with daily reporting

Figure 3. Reporting of actual death, and lagtime with daily reporting

 

Key lessons here –

- How data is reported matters. Starting off with bafflingly-weird approaches as described here and elsewhere, it makes corrections to widespread assumptions and inferences seemingly tricky to apply later.

- Real-time reporting is tricky to get right. NHS Trusts take a few days to get their data together and reported.

- Lags in reporting, and that style of reporting, data must be improved upon during the inter-pandemic period (between this pandemic and the next one).

- Preparedness. That’s the key ahead of the next time. How do we best report useful data? How do we transfer PPE around the country to ensure all care providers are well-stocked? How do we scale up capacity for testing more rapidly than we have done this time around? And many other questions besides. Preparedness. It’s vital.

And if you got this far and fancy some light relief, do have a look at a previous blog post of mine, about the absurd, amusing and abusive emails in my inbox.


See original blog post here