Tag: performance evaluation

Gaming Incentives: The Risks of Rewarding the Wrong Behaviour

The Problem with Incentives: People Are Too Smart for Their Own Good

Let’s be real — every time you design a control system, someone, somewhere, is going to figure out how to game it. It’s not a question of if but when. Whether it’s employees, suppliers, or even top management, people will optimize for whatever earns them the biggest reward with the least effort or cost.

A core challenge for any organization is designing a performance management system where employee’s personal incentives align with the company’s long-term value creation.

The underlying mechanism for this alignment depends on how design choices shape employee’s perceived rewards and effort trade-offs — ultimately influencing their decisions and behaviors.

Without even knowing, organisations often promote the wrong behaviors:

Maybe it’s a sales team stuffing the pipeline with junk leads to hit their targets. Maybe it’s warehouse staff shipping incomplete orders just to boost throughput numbers. Or maybe it’s a service team closing tickets too fast — without actually solving anything.

This isn’t just frustrating; it can quietly wreck long-term performance. When bad behaviors get rewarded, good behaviors disappear. And suddenly, you’re drowning in inflated KPIs that make everything look great—on paper.

So, how do you build a performance management system that doesn’t fall apart the moment people start looking for loopholes?

1. Be Careful What You Measure—Because That’s What You’ll Get

Here’s the golden rule: what you measure is what people will optimize for. Not what you want them to optimize for—what you actually put on the scoreboard.

If you reward speed in customer service but don’t track resolution quality, don’t be surprised when agents rush through calls and leave customers frustrated. If you set procurement targets based on cost savings alone, expect suppliers to cut corners.

The problem? Most incentive systems assume people will interpret targets in good faith. They won’t. They’ll take them literally.

The Fix: Balance Your Metrics

The trick isn’t to micromanage every possible scenario—it’s to build in counterbalances:

• Sales targets? Pair them with customer retention rates.

• Productivity goals? Weigh them against error rates.

• Cost-cutting incentives? Measure impact on quality.

If employees know they have to hit multiple (sometimes conflicting) goals, they’re less likely to game one metric at the expense of everything else.

2. Beware of Short-Term Thinking—It’s Addictive

People love quick wins. But short-term rewards often come at the cost of long-term stability.

Take a classic example: bonus structures that reward quarterly performance. Sure, they drive immediate results—but they also encourage things like:

• Revenue pulling (closing deals early just to hit targets)

• Cost deferrals (pushing expenses into the next quarter to make numbers look better)

• Slashing investments (ignoring R&D or training because it doesn’t help thisquarter’s bonus)

The result? KPIs that looks great today but slowly crumble over time.

The Fix: Reward Long-Term Success

Instead of focusing only on short-term results, tie incentives to sustained performance:

• Instead of monthly sales targets, track 12-month rolling averages.

• Instead of quarterly cost reductions, measure total cost of ownership over several years.

• Instead of immediate project completion, reward teams for post-implementation success (did the process actually work six months later?).

This doesn’t mean ignoring short-term performance—it just means making sure it isn’t the only thing that matters.

3. Don’t Let People “Sandbag” Their Targets—Or Fear Success

Ever noticed how some teams always just hit their targets—but never exceed them? That’s sandbagging.

It happens when employees lowball their targets so they can hit them easily. Sales reps do it. Project managers do it. Even executives do it.

It’s understandable—if your bonus depends on meeting goals, why would you set them too high? But there’s another reason teams hold back: the ratchet effect.

When management tightens targets every time employees exceed them, they create a system where success feels like punishment. Employees quickly learn that overperformance today just means an even harder target tomorrow. Instead of pushing themselves, they start managing expectations—deliberately capping their effort to avoid setting an unsustainable precedent.

The Fix: Make Targets Ambitious—but Fair

Here’s how to stop sandbagging and prevent the ratchet effect from backfiring:

• Use historical data wisely – If a team consistently beats its targets by 20%, don’t just raise the bar without understanding why. Did they genuinely improve, or were targets too low to begin with?

• Incorporate stretch goals – Set a baseline target (minimum acceptable) and a stretch target (challenging but achievable). This encourages ambition without making employees feel trapped in an ever-rising cycle.

• Avoid automatic ratcheting – Instead of mechanically increasing targets every time they’re hit, factor in market conditions, workload capacity, and sustainability of performance.

• Recognize effort, not just output – If targets are continually raised without acknowledging the effort behind them, employees will disengage. Reward progress and long-term contributions, not just short-term spikes.

The goal isn’t just to push people to perform—it’s to make sure they don’t feel like they have to hold back just to survive.

4. Watch Out for “Perverse Incentives” (a.k.a. Manipulation Risks)

Not all gaming is intentional. Sometimes, poorly designed incentives lead to completely unexpected behaviors.

A corporate example? A company that paid call center agents based on how many calls they handled. Great—until employees started hanging up on difficult customers just to take more calls.

The Fix: Think Like a Skeptic

Before rolling out an incentive, ask: How could this be manipulated?

• Run small-scale pilots before full implementation.

• Get input from different departments (finance, operations, frontline employees) to spot blind spots.

• Create contingency plans—what happens if the incentive encourages the wrongbehaviors?

A little cynicism in planning can save a lot of chaos later.

5. The Best Incentives? A Culture Where People Don’t Need to Game the System

At the end of the day, no system is perfect. There will always be ways to game incentives. The real question is: why do people feel the need to do it in the first place?

Fostering the right culture and social norms is the only option that minimizes all gaming risks – with no downside attached.

If employees don’t trust leadership… If they feel like targets are arbitrary… If they think the system is rigged against them… They’ll look for ways to manipulate it.

But if they believe in the company’s goals? If they see their incentives as fair? If they trust that leadership rewards genuine success, not just numbers? They’ll be less inclined to game the system in the first place.

The Fix: Build a Performance System People Actually Respect

• Make incentives transparent—No hidden rules. No moving goalposts.

• Listen to employees—If teams say a target is unreasonable, find out why.

• Focus on purpose, not just paychecks—People work harder when they believe in what they’re doing.

The best incentive system isn’t the one with the most checks and balances—it’s the one people don’t want to cheat.

Final Thought: Don’t Blame the Players, Fix your Game

If your performance management system is constantly getting gamed, the problem isn’t just the employees—it’s the system itself.

People respond to incentives. Always have, always will. Your job isn’t to force them to behave the “right” way—it’s to design a system where the easiest way to win… is to do the right thing.

And when that happens? You won’t need to worry about gaming incentives anymore. You’ll have built a system where success actually means something

14th Feb 2025

How to Know If Your Improvement Project Worked: A Data-Driven Approach

That’s a question that every manager across all layers of the organisation gets. It’s a simple enough question on the surface, but the answers are usually complicated and riddled with disclaimers.

At some point, you have to wonder: Why do so many organizations assume that every project’s benefits are conveniently canceled out by some vaguely understood, unfortunately timed negative change? The project team claims success, business units avoid adjusting targets—everyone wins, except the organization itself

If pharmaceutical companies can measure the impact of new treatments — despite countless confounding factors — why do so many businesses claim that their environments are ‘too dynamic’ to evaluate change? The reality is, they could if they applied even a fraction of the rigor pharma uses.

Now, most would argue – there’s no way we’re investing in the level of sophistication a biostatistics department runs with. And my argument is that, unless your products are a matter of life or death, you don’t have to. But we can be inspired by the existing scientific process and apply it to our businesses.

Let’s run through a typical example and research question: “Has the new project reduced task processing times”. Below is a high level overview of the sequence of steps required.

1. Define your Hypothesis

Before you even being the project’s implementation, you need to have defined your hypotheses. This seems trivial, but clearly stating what you’ll be evaluating ensures all of your stakeholders are onboard with what metrics you’re targeting to improve and there’s no ambiguity in the desired effects.

You need two hypotheses – the null and the alternative. The null hypothesis represents business as usual—no change. The alternative hypothesis is what you’re trying to prove: that the project made a difference.

2. Design and Run Experiment

This is the stage in which most organisation will drop the ball. However, the importance of your experimental design cannot be understated. Your decisions will make or break your ability to justify your project’s impact.

First, choose your experiment’s design. Here are the two most common and simple methods:

Experiment Design	Details
Before and After	A simple and cost-effective approach where the same group is measured before and after implementation. However, this method is highly sensitive to external noise (e.g., seasonal trends, market changes), which can distort results.
A/B Testing	A more robust approach where two groups—one using the new process and one continuing with the old—are tested simultaneously. This method controls for external factors but requires careful change management, as running two processes in parallel adds complexity.

Second, choose the statistical test you’ll plan on running. You are not limited to only one test and your choice is not set in stone – that said, it’s important to have a plan of how you’ll evaluate your project.

The specific test to run depends on the attributes of the data you’re collecting. Most of the time you’ll be using t-tests, which assumes normal distribution of your data. Regression analyses are also a common and flexible choice. We won’t dive into the details of these tests here, but note e.g. the t-test is a build-in Excel function and implementing it requires minutes. See below table for a high-level summary.

Test Name	Data Type
Independent t-test	Normal distribution, independent sample, equal variance of groups
Paired t-test	Normal distribution, paired sample, different variance of groups
Mann-Whitney U test	Non-normal distribution, independent sample, equal variance
Wilcoxon Signed-Rank test	Non-normal distribution, paired sample, different variance of groups
Linear Regression	Linear relationship, normal distribution of residuals, others, homoscedasticity, independent samples

Third, consider your significance level. Also known as your p-value. It tells us how likely our results are due to random chance. A common threshold is 5%—if the p-value is below this, we conclude the project had a real effect. That said, don’t be overly rigid—no one will care about a p-value of 5.1% in a typical business environment.

The final thing to consider is what is your optimal sample size. You don’t want to run your experiment for too long and waste resources, but you also don’t want to end your experiment too early and find you need more data. As it’s generally expensive to re-start an experiment, unless really pressed on resources, I would lean towards building in a comfortable buffer. If you’re not sure how much data you need, tools like sample size calculators (or even ChatGPT) can help estimate it. If you want a more rigorous approach, a Monte Carlo simulation can model different scenarios to find the best balance between cost and accuracy – note ChatGPT can do that for you too.

3. Analyze Data & Evaluate Results

When you’ve designed your experiment and collected the data, it’s time to determine whether your project actually made an impact. This final step involves calculating your test statistic, interpreting the results, and generating conclusions that can drive decision-making.

Here’s an example of how your raw data might look like, assuming the “before/after” design was used:

A more powerful overview is to examine the histograms or KDEs of the two groups. The example below already gives us a really strong indication of what our results will be. The mean difference between the two groups is 5 minutes and we can already visually see how the two distributions are distinctly different. Next step is to have our statistical tests confirm that.

Using the chosen test (e.g., t-test), calculate the test statistic and the p-value. Most tools—Excel, Python, R, or even online calculators—can do this in seconds. The test statistic tells you how different your groups are, while the p-value quantifies the likelihood that these differences occurred by random chance.

If p-value < significance level (e.g., 0.05), we reject the null hypothesis. This means the observed change is statistically significant, and you have evidence that the project had an effect

In this case, the t-test confirms what was already visible in the visualization – with a p-value of approx. 0.00, it’s highly unlikely that the difference we see in means is due to random chance. We can comfortable reject the null hypotheses. You now finally have data-driven evidence that the project had a statistically significant impact.

Summary: Measuring With Confidence

Evaluating a project’s impact isn’t just about changing averages —it’s about determining whether that change was meaningful, measurable, and worth the investment. By applying structured hypothesis testing, organizations can move beyond vague assumptions and confidently assess whether an initiative has truly improved performance.

Beyond statistical significance, it’s essential to consider practical implications. While the results confirm an improvement, the next step is determining whether this reduction translates to meaningful operational benefits—such as cost savings, increased throughput, or improved customer experience.

Key Takeaways:

1. Define Success Clearly – Establish hypotheses upfront to ensure stakeholders align on what success looks like.

2. Design Thoughtful Experiments – Whether using before-and-after comparisons or A/B testing, the experimental design must minimize bias and account for variability.

3. Leverage Statistical Rigor – Using appropriate statistical tests, confidence intervals, and standard error calculations ensures conclusions are based on evidence, not assumptions.

By embedding a scientific mindset into business decisions, organizations can manage uncertainty and ensure that projects deliver real, measurable value — not just perceived success. Ultimately, what gets measured gets improved—but only if it’s measured correctly.

12th Feb 2025