Machibet LoginSplit decisions and statistical significance: why the outlier judge isn’t always wrong

A split decision in professional boxing can turn a great night into a controversial one. Two judges award the fight to one boxer, while a third goes the other way. Cue the Twitter outrage, the commentary desk criticism, and the inevitable question: “What fight was that judge watching?”

Despite how it is treated by the TV analysts or all the Twitter soldiers chiming in from their mom’s basement, being the outlier doesn’t automatically mean you got it wrong. Yet, the judge on the short side of the decision is often seen as being off. Some professionals even grade judges on how often they are in the majority.

Let’s take a look at how probability and small sample sizes work and what it means when it comes to split decisions.

De La Hoya-Mosley, 25 years later

BoxingScene Today - ‘Like something out of a movie’

Thursday | Jun 19, 2025 | 00:19:00

Judging with a Sample Size of Three

There are three judges in a professional boxing match. In the world of statistics, that’s what we call a very small sample size.

If two people see one thing and a third sees something else, does that automatically make the third person wrong? Not necessarily. In fact, with only three observers, a single dissenting opinion is expected to happen by chance from time to time. It’s not a red flag unless it happens consistently over a large number of fights.

To truly determine whether a judge is a frequent outlier in a statistically significant way, you’d need many more observations — think 20, 30, or more judges all independently scoring the same fight. Then you could look at standard deviation — that’s a statistic that tells you how much the judges' scores differ from one another. If most judges score the fight very similarly — say, everyone has it 116–12 or 115–113 — the standard deviation is small, meaning strong agreement. If scores are all over the place — 118–110, 115–113, or even for the other fighter — then the standard deviation is large, meaning wide disagreement and a fight that could have gone either way.

But with only three judges, you don’t have that luxury.

Consensus Doesn’t Equal Correctness

Consensus feels good. We like when everyone agrees, especially on something as intense and emotional as a close fight. But agreement isn’t always truth.

Imagine 30 judges score a close fight and 16 of them give it to Fighter A while 14 give it to Fighter B. There’s a slight majority, but not an overwhelming one. That tells you it was a tough call. Now shrink that panel to just three. Two pick Fighter A, one picks Fighter B. Same proportion, same level of ambiguity — but suddenly the third judge is seen as the problem.

Statistically, that's not fair.

Perception and Position Matter

Judges sit at three different ringside positions, each with a unique view. What looks like a clean jab from one side might be partially blocked from another. A body shot that echoes through the arena might be muffled by crowd noise depending on where you sit. All of that affects scoring.

Add to that the psychological effects: confirmation bias (expecting a certain fighter to win), crowd influence, and the rapid pace of judging in real time with no replays. Even with rigorous training and consistent application of the scoring criteria, perception varies. That's why having multiple judges is a safeguard — not a guarantee of unanimity.

What Would Statistical Significance Look Like?

To know whether there’s a true consensus — not just chance — you need to look at something called confidence level. Statisticians often use a 95% confidence level, which means: “If I repeated this test over and over, I’d expect to be correct 95 out of 100 times.”

In boxing terms: you’d want to be sure that the majority score for a fighter wasn’t just random luck in a small sample — that it would likely hold up if you asked more judges.

In fact, you’d need about 385 judges scoring the same fight to be 95% sure the majority result wasn’t just chance. That’s how many scores it would take to say with confidence: “Yes, this fighter was clearly seen as the winner by most experts.”

Here’s an example: Say you want to know how most judges would score a close round — one where both fighters had moments.

If you asked three judges, and two scored it for Fighter A, one for Fighter B, that doesn’t tell you much — it could easily be random, especially in a close round.

But if you asked 385 judges, and 350 of them picked Fighter A, now you’d be much more confident — about 95% sure — that most experts really do see Fighter A as the winner.

Of course, boxing will never use hundreds of judges. But this underscores the point: with only three judges, even a 2–1 decision doesn’t tell you who was right — only that the fight was close.

Conclusion: Outliers Are Built into the System

The three-judge model in boxing was created to protect against bias and bad decisions — not to guarantee perfect agreement. When a decision is split, it often just reflects the complexity of what unfolded in the ring.

Being the dissenting judge is part of the job. It doesn’t mean you weren’t paying attention, or that you don’t know what you’re doing. It may simply mean that from your vantage point, with your training and your judgment, you saw it differently. And sometimes, in a close fight, that’s not only okay — it’s inevitable.