When does a reduction in injury numbers become statistically significant?

By Dr. Marloes Nitert and Dr. Sidney Dekker

Those who’ve been around safety (and particularly safety differently) long enough, know that LTI (Lost Time Injuries) is a lousy safety measure. LTI, after all, was once instituted as a productivity measure, not a safety measure. But LTI is actually quite a silly measure too. This blog shows just how silly it gets, and how foolish (or statistically meaningless) any claims about LTI reduction really are. 

We were working with a company with 85 employees once that was very proud of their safety record. Over the past four years, their injuries dropped from 19, to 7, to 4, and then to 1. What a marvelous accomplishment! Managers were understandably glowing. Of course, any reduction in actually hurting people is a good thing. Any reduction in honesty around reporting, or not calling injuries for what they are (rather than creatively case-managed instances of ‘light duties’) is of course not a good thing. But that didn’t seem the issue as much here.

The issue was how the managers felt about their own interventions and actions. They’d done a bunch of things (like putting up posters telling everybody else to be more careful out there, and affixing stickers on bathroom mirrors telling that you were looking at the person most responsible for your safety) that they believed were responsible for this amazing reduction. They also did some more training of their people, and reminded them of the appropriate protective equipment to wear with various tasks.

Like in most industries, however, the injury numbers were so small to begin with, and the differences between before and after the intervention so small too, that it was very difficult (well, impossible, really) for any of them to claim that the reduction was anything other than statistical noise—or a random variation that could easily spike up the next year.

Here is why. Remember, the company had 85 employees. We established that together they worked 170,000 person hours per year for those four years that the injuries went from 19 to 1. They enjoyed an injury frequency rate drop from 0.011% to 0.0006% over four years, in other words. When you apply some straightforward statistics (and statistical thinking) to this, however, what you find might well rock your world. Because this is what it shows (and you’ll see the calculations below):

  • There is a 92% probability that the injury reduction is just random noise. In other words, a fat chance (very fat: 92% fat) that the managers had nothing, absolutely nothing to do with the reduction.
  • In science, we typically want to be 95% certain of something before we claim that it happened, or that we know what might have caused it to happen. So let’s translate the manager’s claim that their actions and interventions were responsible for the injury drop. If we would want to claim that in any seriousness (or science), then to claim with 95% certainty that the injury reduction wasn’t just noise but the result of what managers did, the 85 workers would together have to suffer 20,400 injuries in year 1, down to 1,020 injuries in year 4.
  • You could also look at it the other way around. If the company insisted that a drop from 19 to 1 injuries was 95% certain to be real and not just random variation or noise, it would have to employ an additional 53,129 workers to have a sample big enough to support that claim. That’s a 625% increase in the workforce.

The results show the dire need for managerial and board humility for claiming credit for LTI reductions. They also show the tyranny of small numbers and uselessness of LTI as a safety measure. And let’s not forget: LTI says nothing about severity of injury or length of absence from work. If it wasn’t useless as a measure of anything already, then that certainly makes it so. 

What does statistically significant mean? 

Statistically significant, formally speaking here, is the likelihood that the reduction in the number of injuries is caused by the intervention rather than by chance. If a manager or board wants to be confident that a reduction in injuries is due to what they did, rather than mere random variations (which can go as wildly up as they can go down!) then the numbers need to meet various stringent requirements. 

Remember the company we started with here:

  • 85 workers
  • Three 8-hour shifts, 5 days a week, which amounted to 170,000 hours worked (because each employee worked 50 weeks a year)
  • Injury reduction over four years: from 19, to 7, to 4, to 1.. 

Of course, just these numbers don’t mean much. What if the company did some serious downsizing during those 4 years, and in the last year they had only 1 employee left (not 85), and then that poor sod managed to get him- or herself injured? Or the company actually managed to hold on to their 85 people but they had nothing to do because there really wasn’t any work? This is why we need an injury rate, that is the number of injuries against the number of hours worked. 

In this company, people worked 170,000 hours worked annually The injury rate dropped from 0.011% (19 injuries/170,000 hours worked x 100) down to 0.0006% (1 injury/170,000 hours worked x100over four years.

A manager would of course love to claim that the drop from 19 injuries to 1 injury is significant. In a sense, of course it is. It means much less suffering, fewer hours at work lost, and less cost associated with injury management, medical assistance, insurance and so forth. But is it statistically significant? In other words, can the manager be confident that it isn’t just (good) luck, which might just be reversed into really bad luck next year ? This question is critical. If management practices are determined based on this number, after all, we can merrily be reinforcing some seriously bad ideas out there simply because their implementation was followed by a bout of low numbers. The low numbers, however, happened because of random variation! Of course, managers don’t like random variations at all. They would want to interpret the low numbers as a clear-cut consequence to their management intervention, which can then become gospel!

What is the probability that the injury rate reduction is just random noise?

If you look at absolute numbers, then a drop from 19 to 1 worker on 85 workers, then, yes, you might be able to make the claim that such a drop is statistically significant. A statistical test to ‘prove’ this with, is called Chi-square (which isn’t very exact with such small numbers in any case, but we have no more to go on in this case).

The probability that the injury rate drop from 19 to 1 is just random noise, is 92% 

Of course, absolute numbers are not a rate: they are not an injury frequency rate. A reduction in the absolute number of injured workers out of the 70 from one year to the next is nice (really nice, for sure). But it is extremely likely that it is pure chance. In other words, there is no basis to claim that the drop is due to what the manager(s) or the board did. There is, in fact, a way to formally calculate the chance that indeed the reduction is purely chance. (Got that?)

We can calculate this by going back to the injury rate (rather than the absolute number of injuries) and doing the same Chi Square test between year 1 and year 4. Which shows the following: 

 Injury rateNon-injury rateTotal 
Year 10.01499.986100
Year 40.000699.9994100

P value = 0.92: this means that there is only an 8% chance that the difference in injury rate between years 1 and 4 is real and sustainable if the managers keep doing what they are doing. But there is a 92% chance that this difference is just utterly random, and completely disconnected from whatever the managers are doing. 

It becomes clear that the reduction is highly likely due to chance. In fact, the chance that the drop is just a random variation is a whopping 92%. So any manager(s) claiming that the drop is due to their interventions and actions is not really to be believed. Well, if you insist, you should only believe the manager(s) 8% of the time; or only 8 out of a 100 times that they make the claim. Or, you should only believe them for just over half an hour during a workday: the rest of the day they’re just blabbing, or spouting grandiose nonsense.

How many people or injuries do you need to claim statistical significance?

Of course, a manager would want to be believed (or believe themselves) more times than 8 out of a 100. The typically desired level of statistical significance is 95% (P value = 0.05) in cases such as these. That would mean that the manager or board can be believed 19 out of 20 times they make the claim. Once out of twenty times, the reduction in injuries would still be the result of random variations. But how many people would the manager need to employ, or how many injuries would these people need to have, to achieve that level of statistical significance (or, to put it differently, a 95% confidence that the injury rate reduction is not random?). We turn to this now.

So how many injuries would you actually need to begin with (and then go down to) if you want to claim with 95% certainty that the drop is due to your interventions and actions, and not just a random variation?

Let’s ask that question again, but now in regard to the figures (a 19-fold drop) that we have already discussed:

If a manager of a company with 85 workers, who together work 170.000 hours per year, would want to claim, with 95% certainty, that a 19-fold drop in injuries in his or her company is due to his or her interventions and actions, then how many people would he or she need to employ extra, or how many injuries would he or she actually need in absolute numbers, from year 1 to year 4?

We find the two answers through what is called a statistical power calculation. The main purpose of a power analysis, in formal terms, is to determine the smallest sample size that is suitable to detect the effect of a given test at the desired level of significance. In other words, if a manager wants to be 95% certain (a typical level of statistical significance) that a 19-fold drop in injuries is due to his or her actions, how many workers or injuries would he or she need to demonstrate this?

How many people do you need to employ if you have 19 injuries?

Statisticalpower calculations can be done in various ways. The most common way is for it to determine the sample size necessary to be 95% certain that the effect was actually the result of the interventions or actions the manager made. We can use, in other words, a statistical power calculation to find out how many people the manager would need to employ if he or she wants to be 95% certain that his or her injury frequency rate drops from 0.011 to 0.0006 (as from 19 to 1, like in this example). 

In that case, the power calculation shows that the manager would need to employ 53,214 people rather than 70. And he or she would still be wrong to claim that the drop was due to their actions 1 out of 20 times. So only by the time the manager has employed the 53,214th person (while keeping all the 53,213 others at the same time!) can he or she be 95% sure that a drop from 19 injuries to one injury is due to his or her interventions and actions. Only with a sample that large can such certainty be claimed.

Formula for sample size calculation

where  is 0.05

is 0.2

z is the normalized score

π is the proportion (injury rate) with p0is 0.00011 and p1is 0.000006

This does mean, by the way, that a company which actually employs some 65,000 people, and which reports a drop from 19 injuries to 1 injury across allof those 65,000 people, couldhave enough statistical power (depending on the actual hours worked by all those people) to support the claim that the drop is due to what managers did. But not a company or a site with 85 people.

If you employ 85 people, how many injuries do they need to suffer for you to know that your interventions are actually working?

The other way in which we can use a statistical power calculation is to approach the number of injuries that would be necessary to claim a statistically significant reduction in a sample of only 85 workers. For this we did multiple calculations, setting the 85 individuals (n=85) in the context of different numbers of employees:

Injury rate Year 1Injury rate Year 4Number of injuries Year 1Number of injuries Year 4Number of individuals needed
1.180.0621900100823
0.1180.00620400102085

It shows that the manager who employs 85 people would need to record a drop from 20,400 injuries in Year 1 to 1,020 injuries in Year 4. This is an injury rate drop from 11.8% to 0.62% (which is a 19-fold reduction). Only with numbers this large would the manager be able to claim that the reduction is not due to chance. Although, do note, that he or she can be believed only 19 out of 20 times when the claim is made. In other words, in 1 out of 20 times it is still possible that the reduction is actually due to chance after all. 

This also applies to comparing injury rates between companies or sites

We can use the same calculations to show that comparing injury rate between companies is largely a fool’s errand. If company A has 19 injuries in one year, and company B has 1 injury in the same year (and their workforces are comparably large and they work roughly the same hours in a year), then this is just a random variation, a pure chance difference, unless (with such a low injury rate) the companies employ some 65,000 people each. 

Conclusion

With injury numbers relative to hours worked (i.e. injury rate or any other rate) as low as they are, it becomes easy to show that the requirements of statistical significance are never met. In other words, managers or boards claiming that they have seen a significant reduction in injury rate, or a significant difference between their injury rate and someone else’s injury rate, actually have no statistical basis for their claims. It’s literally mostly make-belief.


 

13 thoughts on “When does a reduction in injury numbers become statistically significant?”

  1. Great stuff, thanks. I have been frustrated by this nonsense for many years too. In an attempt to make the message more persuasive, I wrote a paper for the Society of Petroleum Engineers that used simple control charting on injury stats to try to demonstrate that most of the organizations I encounter are already at a point statistically so close to zero that it is essentially zero. My hope was that since many companies are familiar with control charting for QA/QC this would be a slam dunk. Hah! Fat chance.

    If interested, here is the reference:
    Ritchie, Norman, vPSI Group, LLC. (2013, April, 16-18). Journey to Zero: Aspiration Versus Reality. Paper SPE 179200 presented at the SPE European HSE Conference and Exhibition. London, United Kingdom.

    Drop me an email and I will assist if you struggle to track it down.

  2. If I read this correctly, you aren’t saying that efforts by the managers had no effect… just that it isn’t very likely that the managers’s efforts explain the drop. The broader point is that we shouldn’t be using LTI as a sole measure of effectiveness. Point taken and appreciated!

  3. I agree with the sentiments and justification presented here but my frustration is that you are preaching to the converted/wrong audience. I have worked in a number of companies over the past 5 years and in each company the safety professionals are absolutely clear and able to articulate the lack of validity, reliability and statistical significance of LTFR as a measure. There are two problems though. One is the rock solid adherence to the measure by boards and senior leadership (Safe Work Australia have attempted to influence this but again, wrong audience). The second is the failure of the safety community to come up with replacement measures. At the end of the day Boards and senior executive have a financial paradigm based on numbers, so if you want to remove a measure you will need to propose and justify an alternative and valid/reliable/statistically significant alternative measure. Because we are unable to do this ( for a number of very good reasons) we will continue to be required to present/report on LTIFR.

  4. Zilak, S., and McCloskey, D., (2014) The Cult of Statistical Significance. Uni of Michigan Press, Ann Arbor.
    Muller, J., (2018) The Tyranny of Metrics. Princeton Uni Press, Princeton.
    and
    Madsbjerg, C., (2017) Sensemaking. Little Brown, London.

    All worth a good read on the issue.

  5. Hi what is the inference if the reverse happened i.e the numbers went from 19 to 37 in 4 years ? food for thought ?

  6. Thanks for this. I’m not sure how to interpret the p-value here. The article arbitrarily decides to look at the rate of injury *per hour* but that’s just adding a multiplier to each value.

    Yet the p-value is very different indeed. What if we were looking at the injury rate per day (I found a p-value of 0.55 in that case)? Per second?

    I’m surprised that the statistical difference can be dramatically changed by a static multiplier.

    1. David H – The furthest I went with studying statistics was a University Stats101 Intro to Statistics course… but my gut instinct is the math in this post is all wrong, for one there is no (stated) null hypothesis that we are trying to establish the probability of being correct/incorrect and specific to the Chi Squared test there is no expected frequency established. Also the variables and values are introduced without any real explanation (and seemingly more to bamboozle the reader than for any other reason).
      But again I have only done basic statistics over 10 years ago… would be interesting for someone with more stats knowledge to comment…

  7. It’s a point well made that LTI can not by itself be seen as a measure of safety performance… also one I think most professionals understand. In their 2017 publication on Measuring and Reporting on Work Health and Safety, SafeWork provides and overview of a suite of measures which correlate to the Reason/ Hudson way of modelling safety culture . I am keen to hear what you make of this Mr Dekker. I do see you eloquently explain the limitations of the known problems, but how do you feel about the emerging solutions presented ?

  8. I share David H and Matt D’s concerns about the article… and suggest Mark Twain would have referred to ‘lies, damn lies, and statistics ‘ were he around to comment. Whilst I concede LTIFR reporting is of little value, and believe Norm R’s approaching zero concept has real validity… this article is an example of working in an adversarial rather than a cooperative framework, and without a defined jury to decide who’s right, it can’t lead anywhere worthwhile.

    Statistical analysis needs to be agnostic too – and not just about direction.

Leave a Reply