Type 1 vs. Type 2 Errors in UX Research

A Real-World Framework for Decision-Making

Sep 12, 2023

Article voiceover

1×

0:00

-11:13

Summary: This article addresses the risks of Type 1 errors in UX decision-making, using the common scenario of whether organizations should conduct usability tests before launching as an example.

In UX research, it's common for practitioners to grapple with tough decisions on a daily basis. With my seventeen-plus years of UX experience, I've seen my fair share of challenges. In this post, I'll outline a simple decision-making framework based on error types.

What Are Type 1 and Type 2 Errors?

In statistical jargon, we often hear the terms 'Type 1' and 'Type 2' errors when we talk about hypothesis testing.

Type 1 Error (False Positive): This occurs when we mistakenly reject a true null hypothesis. In simpler terms, we think something is significant when, in reality, it isn't.
Type 2 Error (False Negative): This happens when we mistakenly accept a false null hypothesis. Essentially, we overlook something significant, thinking it's not.

Applying the Concept to UX Research

To best illustrate my point, let's consider a real-world example: the decision an organization faces regarding whether or not to conduct basic usability testing before launching a new feature.

A cartoon number one and two battling, representing the conflict between type 1 and type 2 errors.

NOTE: While I'll be using this example throughout the rest of the article, the principles discussed hold true for most UX-related decisions.

Example Scenario 1

Imagine a midsized tech start-up eager to launch a new feature, but they've decided against conducting any usability testing with end users. This scenario highlights the risks associated with Type 1 and Type 2 errors.

Type 1 Error: The company pushes the feature out quickly without any user testing, banking on the idea that the new functionality will automatically improve things for users. The mindset is, "This new feature is what users have asked for, so why delay?" But consider this: typically, only about 33% of untested designs dodge major usability mistakes. What if the company falls into the more common 67% with critical UX flaws? If users struggle with the new feature, they won't get its full benefits. This can lead to many users dropping off, negative feedback, and some unforeseen financial troubles.

Type 2 Error: On the other hand, if the company takes a bit more time to test the feature and finds out it has no critical UX errors, then the main "cost" is just a little extra time spent. Sure, it might feel like we're being a tad slow, but it's all to ensure everything works as intended. By testing, we avoid big surprises after the launch. So, it's all about playing it safe and smart, even if it feels like a slight delay. This way, we keep users happy and dodge those big financial pitfalls.

Case Studies

During my time at ARRIS, a telecommunications giant, I saw what happens when essential UX testing steps are overlooked. In one instance, this oversight resulted in a 20% decrease in user engagement within a month. Leadership approved this risk but regretted it later. The project management team estimated that fixing the issue took roughly 4 times longer than it would have if initial testing had been done. This shows the real cost of Type 1 errors.

On the other hand, my experience at Minitab, a leader in statistical software, taught me the value of testing with real users. Even if launched later than planned, products that underwent testing experienced stronger user adoption and encountered significantly fewer issues post-launch. Do we know if the untested designs would have had similar success? No, but we all had confidence that launches would go as smoothly as possible because of our erring towards Type 2 errors.

So, Which Error Type is More Costly?

By now, you might already have a hint. But let's break it down further.

A cartoon character with an inquisitive expression represents the answer to the question, 'Which error type is more costly.'

Let's revisit the decision whether or not to conduct basic usability testing, but this time in a new scenario.

Example Scenario 2

Imagine you're crafting software for real estate appraisers who deal with vast amounts of data and require impeccable accuracy. A Type 1 error, like launching a feature that inaccurately computes property valuations, could lead to someone's license being taken away because of a simple math error. Such an oversight could tarnish a company's reputation and trustworthiness in the market.

Conversely, with a Type 2 error, you could spend a few more hours refining and perfecting a feature, but upon launch, you would have ensured its precision and reliability. Your users would trust your platform more, leading to increased loyalty and potentially more referrals.

While acknowledging the potential drawbacks of Type 2 errors in the context of missed opportunities, it's essential to understand that innovation without validation is inherently risky. Rapidly integrating new features without basic UX research may offer a temporary first-mover advantage, but at what cost? Poorly-tested features can lead to user churn, degrade brand reputation, and cost a lot to fix after the fact. In a user-centric world, trust is paramount, and organizations typically only get a few chances at launching before conviction is eroded.

A company's standing is often only as good as its last product or feature update. Therefore, ensuring a product's usability and user acceptance outweighs the fleeting advantages gained from rushing invalidated innovations to market. Investing time in usability testing today can prevent far-reaching, often irreversible, damage tomorrow.

What's The Counterargument?

To be fair, I want to present the most compelling arguments against my recommendation to favor Type 2 errors. While I stand by this approach, it's essential to recognize that Type 2 errors carry their own risks.

A wagging finger representing counterarguments.

Here are some of the most compelling points people have raised against my framework over the years:

1. Opportunity Costs: Time is a crucial resource, and the longer it takes to validate and act on a genuinely beneficial idea, the higher the opportunity cost. This doesn't just entail potential revenue but also includes brand positioning, market leadership, and user trust.

My Response: This is a common misconception I hear a lot. Basic usability testing doesn't have to be time-consuming. In fact, using the scoring system I've developed, you can complete a full usability test in just 3 or 4 days.

2. Lost Competitive Edge: The early bird often does get the worm in the digital space. Companies that delay implementing potentially groundbreaking features due to Type 2 errors might find their competitors seizing the initiative, capturing market share, and setting industry standards. Once competitors gain that advantage and user loyalty, playing catch-up becomes an uphill task.

My Response: I've never observed this objection in the real world. In my career, I've witnessed the exact opposite many times. The time spent revisiting and adjusting a feature due to hasty; invalidated launches is more likely to cause an organization to lag behind than the minimal time added for basic testing. In this case, I simply follow the math. It makes probabilistic sense to err on the side of Type 2 errors.

3. Stifling Innovation: The digital landscape is marked by swift, evolutionary changes. Organizations that habitually err on the side of caution might end up sidelining genuinely innovative ideas, inhibiting their growth potential. If every idea or feature is met with excessive skepticism or over-cautiousness, a company can stymie its own innovation pipeline.

My Response: This is a concern, but process-related solutions have typically worked for me in the past. We should avoid Type 1 errors when dealing with real end users but embrace them in the R&D space. Beta projects offer the perfect environment to test innovative ideas without the unnecessary risk of experimenting on real end users.

4. Resource Misallocation: Overemphasis on avoiding Type 1 errors could lead to disproportionate resources being allocated to over-testing and over-validating, often yielding diminishing returns. These resources, both time and money, could instead be used in areas like product development, marketing, or customer service.

My Response: The "diminishing returns" argument can be countered with a probabilistic response as well. While leaning towards Type 2 errors may add a bit more time and resources to projects, this cost is significantly less than the consequences of making a Type 1 error. Some studies suggest that around two-thirds of untested launches result in critical usability errors. Furthermore, the expense of rectifying a problematic launch can range from 4x to 10x the initial cost. The decision, in this case, is clear for anyone with an analytical mindset.

Conclusion

In the real world, especially in UX research, where risk mitigation is a large part of our value to organizations, it's almost always more dangerous to commit a type 1 error not only in terms of monetary loss but also in terms of brand reputation and user trust.

Drawing from my experiences, I've gleaned this rule of thumb: If you must err, err on the side of caution. Or, in statistical terms, lean towards Type 2 errors. Because, in the end, a slight delay in product launch is far better than having a bad reputation or a financial fiasco. Remember, in the domain of UX research, it's always about the user. And what serves the user best ultimately serves the business.

What do you think about UX's role in organizations in the context of risk mitigations? Comment here with your decision-making frameworks around risk you've seen success with. Thanks for reading, and feel free to share if you find these types of articles helpful.