Safe Refactoring in .NET with Light/Dark Mode and Feature Flags

You’ve been forced to maintain a poorly written legacy app. Spaghetti code, no tests, and every new feature breaks two existing ones. Team morale is at rock bottom. New features take forever to ship. Regression bugs are constant. You gather your arguments and head to management. You: We need time to refactor. Management: What are you talking about? You: Let us refactor. We’ll ship faster, with fewer bugs, and engineers won’t want to quit. I can do it. Management: ...Okay. Now comes the hard part. Are you going to deliver on your promises? Or end up behind schedule and still stuck with a spaghetti codebase? While large-scale refactoring depends on many factors like team alignment, technical constraints, and timing; there’s one powerful technique I’ve used successfully in production to de-risk the process: Light/Dark Mode refactoring. I first learned this technique in the book Refactoring at Scale, and I’ve since applied it many times in real systems to refactor code safely and confidently. In this post, I’ll show you how to implement it in .NET using feature flags, and how it can help you refactor without fear. What Is Light/Dark Mode Refactoring? Light/Dark Mode is a technique that lets you safely refactor code by running both the existing implementation ("light") and the new refactored implementation ("dark") side by side without exposing users to any risk. You run the light implementation as usual, return its result to the user, but also execute the dark version in the background and compare the two outputs. If they match, you know your refactor is likely safe. If they don’t, you’ve caught a mismatch before it hits production. This lets you: Ship the refactor behind a feature flag Gain confidence in correctness by comparing real-world results Gradually roll out the new implementation without surprises Eventually, when you're confident, you flip a flag and start returning the dark version to users. It’s a low-risk, high-safety way to refactor logic, especially queries, where the same inputs should lead to the same outputs. In the next section, I’ll show you how we applied this technique in a .NET application. Let’s take an example of a service that performs a moderately complex operation like calculating the final price of a product, taking into account discounts, tax rules, currency rounding, and maybe even user-specific pricing logic. Here’s how the legacy version might look: public async Task CalculateFinalPrice(Guid productId, Guid userId) { var product = await _productRepo.GetById(productId); var basePrice = product.BasePrice; if (_userService.IsSpecialUser(userId)) { basePrice *= 0.85m; } // Loyalty tier discount logic var user = await _userRepo.GetById(userId); var discount = 0m; if (user.LoyaltyTier == "Gold") { discount = basePrice * 0.10m; } else if (user.LoyaltyTier == "Silver") { discount = basePrice * 0.05m; } // Regional tax policy (duplicated logic scattered in multiple places) decimal tax = 0m; if (user.Region == "EU") { tax = (basePrice - discount) * 0.20m; } else if (user.Region == "US") { tax = 5.00m; // flat tax for some reason } // Random price adjustment based on product tags if (product.Tags.Contains("Clearance")) { discount += 3.00m; } // Last-minute adjustment for legacy reasons if (product.IsSubscription && user.Region == "EU") { tax *= 0.90m; } var finalPrice = basePrice - discount + tax; // Round to two decimals in a questionable way return Math.Round(finalPrice, 2, MidpointRounding.ToEven); } This is the kind of logic that: Spreads business rules across services and layers Has inconsistent handling (some percentage-based, some fixed) Includes hardcoded exceptions and tribal knowledge Is hard to test and reason about And it is not mildly as complex as to what can be found in legacy apps. Now imagine you want to replace this with a cleaner, rules-driven PricingEngine that encapsulates all this behavior in a composable, maintainable way. You can’t afford to flip the switch and hope it works. That’s where ExecuteLightDark comes in. Let’s see how to use it. The ExecuteLightDark Helper Let’s start with the core idea: run both the legacy and the refactored implementations, compare their results, and return the trusted one. Here’s the simplest version of ExecuteLightDark: public async Task ExecuteLightDark( Func light, Func dark, [CallerMemberName] string callingFunc = "") { var lightResult = await light(); var darkResult = lightResult; // default in case dark fails try { darkResult = await dark(); var lightSerialized = JsonConvert.SerializeObject(lightResult); var darkSerialized = JsonConvert.SerializeObject(darkResult); if (lightSerialized == darkSerialized) {

May 6, 2025 - 17:47
 0
Safe Refactoring in .NET with Light/Dark Mode and Feature Flags

You’ve been forced to maintain a poorly written legacy app. Spaghetti code, no tests, and every new feature breaks two existing ones.

Team morale is at rock bottom. New features take forever to ship. Regression bugs are constant.

You gather your arguments and head to management.

You: We need time to refactor.

Management: What are you talking about?

You: Let us refactor. We’ll ship faster, with fewer bugs, and engineers won’t want to quit. I can do it.

Management: ...Okay.

Now comes the hard part.

Are you going to deliver on your promises? Or end up behind schedule and still stuck with a spaghetti codebase?

While large-scale refactoring depends on many factors like team alignment, technical constraints, and timing; there’s one powerful technique I’ve used successfully in production to de-risk the process: Light/Dark Mode refactoring.

I first learned this technique in the book Refactoring at Scale, and I’ve since applied it many times in real systems to refactor code safely and confidently.

In this post, I’ll show you how to implement it in .NET using feature flags, and how it can help you refactor without fear.

What Is Light/Dark Mode Refactoring?

Light/Dark Mode is a technique that lets you safely refactor code by running both the existing implementation ("light") and the new refactored implementation ("dark") side by side without exposing users to any risk.

You run the light implementation as usual, return its result to the user, but also execute the dark version in the background and compare the two outputs. If they match, you know your refactor is likely safe. If they don’t, you’ve caught a mismatch before it hits production.

This lets you:

  • Ship the refactor behind a feature flag
  • Gain confidence in correctness by comparing real-world results
  • Gradually roll out the new implementation without surprises

Eventually, when you're confident, you flip a flag and start returning the dark version to users.

It’s a low-risk, high-safety way to refactor logic, especially queries, where the same inputs should lead to the same outputs.

In the next section, I’ll show you how we applied this technique in a .NET application.

Let’s take an example of a service that performs a moderately complex operation like calculating the final price of a product, taking into account discounts, tax rules, currency rounding, and maybe even user-specific pricing logic.

Here’s how the legacy version might look:

public async Task<decimal> CalculateFinalPrice(Guid productId, Guid userId)
{
    var product = await _productRepo.GetById(productId);
    var basePrice = product.BasePrice;

    if (_userService.IsSpecialUser(userId))
    {
        basePrice *= 0.85m;
    }

    // Loyalty tier discount logic
    var user = await _userRepo.GetById(userId);
    var discount = 0m;

    if (user.LoyaltyTier == "Gold")
    {
        discount = basePrice * 0.10m;
    }
    else if (user.LoyaltyTier == "Silver")
    {
        discount = basePrice * 0.05m;
    }

    // Regional tax policy (duplicated logic scattered in multiple places)
    decimal tax = 0m;
    if (user.Region == "EU")
    {
        tax = (basePrice - discount) * 0.20m;
    }
    else if (user.Region == "US")
    {
        tax = 5.00m; // flat tax for some reason
    }

    // Random price adjustment based on product tags
    if (product.Tags.Contains("Clearance"))
    {
        discount += 3.00m;
    }

    // Last-minute adjustment for legacy reasons
    if (product.IsSubscription && user.Region == "EU")
    {
        tax *= 0.90m;
    }

    var finalPrice = basePrice - discount + tax;

    // Round to two decimals in a questionable way
    return Math.Round(finalPrice, 2, MidpointRounding.ToEven);
}

This is the kind of logic that:

  • Spreads business rules across services and layers

  • Has inconsistent handling (some percentage-based, some fixed)

  • Includes hardcoded exceptions and tribal knowledge

  • Is hard to test and reason about

And it is not mildly as complex as to what can be found in legacy apps.

Now imagine you want to replace this with a cleaner, rules-driven PricingEngine that encapsulates all this behavior in a composable, maintainable way.

You can’t afford to flip the switch and hope it works.

That’s where ExecuteLightDark comes in. Let’s see how to use it.

The ExecuteLightDark Helper

Let’s start with the core idea: run both the legacy and the refactored implementations, compare their results, and return the trusted one.

Here’s the simplest version of ExecuteLightDark:

public async Task<T> ExecuteLightDark<T>(
    Func<Task<T>> light,
    Func<Task<T>> dark,
    [CallerMemberName] string callingFunc = "")
{
    var lightResult = await light();
    var darkResult = lightResult; // default in case dark fails

    try
    {
        darkResult = await dark();

        var lightSerialized = JsonConvert.SerializeObject(lightResult);
        var darkSerialized = JsonConvert.SerializeObject(darkResult);

        if (lightSerialized == darkSerialized)
        {
            logger.LogInformation("Light/dark results match for {func}", callingFunc);
        }
        else
        {
            logger.LogError("Mismatch in light/dark results \nLight: {light}\nDark: {dark}", lightSerialized, darkSerialized);
        }
    }
    catch (Exception ex)
    {
        logger.LogError(ex, "Dark execution failed");
    }

    return lightResult;
}

That’s a solid start for internal testing environments but it’s not good enough for production.

In production, we need more control.

What we really want is to first release the refactor in observation mode: we return the light result and only log the comparison. Then, without redeploying the app, we want the ability to:

  • Enable Light/Dark execution dynamically for a subset of users whether that’s a percentage, a specific user group, or internal beta testers.
  • Gradually ramp up the dark execution as we build confidence, while still falling back to the light result.
  • Control performance impact by only executing the dark logic some percentage of the time (since running both can be expensive or slow).

To accomplish all of this, we need to bring in feature flags—specifically, something like Azure App Configuration with Feature Management if you’re on Azure.

Let’s look at how we extended our ExecuteLightDark method to support those production-grade capabilities.

Making It Production-Ready with Feature Flags

To use Light/Dark Mode safely in production, we wrapped it with feature flags. This lets us:

  • Turn Light/Dark Mode on or off without redeploying
  • Decide whether to return the light or dark result
  • Only run the dark logic some percentage of the time (to reduce performance impact)

Here’s our final ExecuteLightDark implementation:

public async Task<T> ExecuteLightDark<T>(
    Func<Task<T>> light,
    Func<Task<T>> dark,
    [CallerMemberName] string callingFunc = "")
{
    // Check if we should run both implementations
    var shouldRunDark = await featureManager.IsEnabledAsync(FeatureFlags.UseLightDarkMode);

    Task<T> lightTask = light();
    Task<T>? darkTask = null;

    if (shouldRunDark)
    {
        darkTask = dark();
        logger.LogInformation("Running light/dark comparison for {func}", callingFunc);
    }

    T lightResult = await lightTask;

    if (darkTask == null)
    {
        return lightResult;
    }

    T darkResult = lightResult;

    try
    {
        darkResult = await darkTask;

        var lightSerialized = JsonConvert.SerializeObject(lightResult);
        var darkSerialized = JsonConvert.SerializeObject(darkResult);

        if (lightSerialized == darkSerialized)
        {
            logger.LogInformation("Results match for {func}", callingFunc);
        }
        else
        {
            logger.LogError("Mismatch in light/dark results for {func}\nLight: {light}\nDark: {dark}",
                callingFunc, lightSerialized, darkSerialized);
        }
    }
    catch (Exception ex)
    {
        logger.LogError(ex, "Dark execution failed for {func}", callingFunc);
    }

    var returnDark = await featureManager.IsEnabledAsync(FeatureFlags.ReturnDarkResult);
    return returnDark ? darkResult : lightResult;
}

This version offloads all the logic like sampling, targeting, gradual rollout to your feature management system, which can be configured to:

  • Target internal users, specific companies, or beta testers

  • Enable for X% of users

  • Use filters like country, environment, user ID, etc.

Final Result: Your PricingService After Refactoring

With ExecuteLightDark in place, here’s what your PricingService might look like after adopting the pattern:

public class PricingService(
    IProductRepository productRepo,
    IDiscountService discountService,
    ITaxCalculator taxCalculator,
    IPricingEngine newPricingEngine,
    ILogger<PricingService> logger,
    IFeatureManager featureManager)
{
    public Task<decimal> CalculateFinalPrice(Guid productId, Guid userId)
    {
        return ExecuteLightDark(
            () => CalculateOldPrice(productId, userId),
            () => CalculateNewPrice(productId, userId));
    }

    private async Task<decimal> CalculateOldPrice(Guid productId, Guid userId)
    {
        var product = await productRepo.GetById(productId);
        var basePrice = product.BasePrice;

        if (userId == Guid.Parse("4b2f58d0-a738-4f96-b9a3-f3f4f0c9515d"))
            basePrice *= 0.85m;

        var user = await discountService.GetUser(userId);
        var discount = user.LoyaltyTier switch
        {
            "Gold" => basePrice * 0.10m,
            "Silver" => basePrice * 0.05m,
            _ => 0m
        };

        if (product.Tags.Contains("Clearance"))
            discount += 3.00m;

        var tax = user.Region switch
        {
            "EU" => (basePrice - discount) * 0.20m,
            "US" => 5.00m,
            _ => 0m
        };

        if (product.IsSubscription && user.Region == "EU")
            tax *= 0.90m;

        return Math.Round(basePrice - discount + tax, 2, MidpointRounding.ToEven);
    }

    private async Task<decimal> CalculateNewPrice(Guid productId, Guid userId)
    {
        var context = new PricingContext(productId, userId);
        return await newPricingEngine.CalculateFinalPrice(context);
    }
}

Benefits and Caveats

Like any tool, Light/Dark Mode refactoring comes with trade-offs. But in the right context it’s incredibly powerful.

Benefits

  • Confidence through real data

    You’re testing the new logic against real production inputs, not just unit tests or pre-seeded environments.

  • Zero-risk rollouts

    You can ship and monitor the refactor without ever returning the dark result until you’re ready.

  • Gradual, flexible rollout

    Feature flags let you enable dark execution for internal users, specific companies, or small percentages of traffic.

  • Easy reversion

    If something goes wrong, just toggle the flag off, no rollback or hotfix needed.

    Caveats

  • Performance cost

    You’re potentially doubling the work by running two implementations. This is especially relevant if queries are heavy or slow.

  • Comparison accuracy

    Serializing and comparing objects can fail due to field order, time precision, or non-deterministic values. Consider using custom comparers or domain-specific checks if needed.

  • Log noise

    If your dark implementation is still evolving, early mismatches may flood your logs. You may want to filter, throttle, or alert only on critical deltas.

  • Not great for writes

    This pattern doesn’t apply cleanly to commands or state changes, where running two implementations could mutate data differently.

TL;DR

Use Light/Dark Mode for complex refactors of query logic, especially when the stakes are high and confidence is low. It gives you a way to ship change safely, with real-time validation and full control over risk.

This post was originally posted at my blog, visit for more technical posts around .NET.