PWC 321 Every Average Tells a Story, Don't It?

Monday morning at 3:30am, I woke up feeling like someone was watching me. My dog was sitting by the bed, burping slightly, and staring at me. This is his way of saying, "I feel the need to spend a moment outside. Or, if you'd rather not get up, I could leave something on the hallway rug for you to discover later." Quickly evaluating which scenario would start my week on a high note, I got up to let him out, leaving me thoroughly awake much too early. Fortunately, here at six time zones to the left of London, Monday morning means the Weekly Challenge is ready, so I had at it. Let's see what we have for week 321. Task 1 Distinct Average The description You are given an array of numbers with even length. Write a script to return the count of distinct average. The average is calculated by removing the minimum and the maximum, then average of the two. Example 1 Input: @nums = (1, 2, 4, 3, 5, 6) Output: 1 Step 1: Min = 1, Max = 6, Avg = 3.5 Step 2: Min = 2, Max = 5, Avg = 3.5 Step 3: Min = 3, Max = 4, Avg = 3.5 The count of distinct average is 1. Example 2 Input: @nums = (0, 2, 4, 8, 3, 5) Output: 2 Step 1: Min = 0, Max = 8, Avg = 4 Step 2: Min = 2, Max = 5, Avg = 3.5 Step 3: Min = 3, Max = 4, Avg = 3.5 the count of distinct average is 2. Example 3 Input: @nums = (7, 3, 1, 0, 5, 9) Output: 2 Step 1: Min = 0, Max = 9, Avg = 4.5 Step 2: Min = 1, Max = 7, Avg = 4 Step 3: Min = 3, Max = 5, Avg = 4 The count of distinct average is 2. The [lack of] thinking Musical theme: way too early for this kind of noise, but how about Every Picture Tells a Story, Rod Stewart, 1971. I read this too fast and jumped to the conclusion that the problem was to calculate the averages after the minimum and maximum were discarded. So, I got a bonus answer this week, but not for the question that was asked. It was dark and the coffee wasn't ready, OK? Throwing out extreme values from a data set is a useful statistical technique. I have done it to suppress transient extremes from moving averages, or to clean up suspiciously out-of-range data from noisy measurements. Arguably, that might have been a more practical challenge problem. Anyway, repeatedly looking for minimum and maximum suggests that we should sort the list, so that the minimum is always at one end and the maximum at the other. Reducing a list to distinct values is easily done by using the values as keys in a hash. The fun part sub distAvg(@ints) { my %average; @ints = sort { $a $b } @ints; while ( @ints ) { my $min = shift @ints; # Remove minimum; my $max = pop @ints; # Remove maximum; my $avg = ($max + $min) / 2; $average{ $avg } = true; } return scalar keys %average; } This can be smaller. The variables for mimimum, maximum, and average aren't really necessary; we can in-line them in the hash access. Nor for that matter, is the division by 2. We're only looking for distinct values, and the constant factor is not relevant. Eliminating division is old-time thinking in C. I have worked on embedded systems where was no division instruction, so avoiding division operations was a useful optimization. Not so much here, but I'm going to leave it out for nostalgia's sake. I'm being somewhat redundant on my return statement, explicitly naming the keys, and forcing a scalar value. Long ago, taking the scalar value of a hash used to return a funky string that gave information about the efficiency of the hash table, so it had to be done this way. About eight years ago (so still a "new feature" by Perl longevity standards) sanity prevailed and it was changed so that the scalar value of a hash is the number of distinct keys, so I could just say scalar(%average). Final answer: sub distAvg(@ints) { my %average; @ints = sort { $a $b } @ints; while ( @ints ) { $average{ (shift @ints)+(pop @ints) } = true; } return scalar %average; } Task 2 Backspace Compare Description You are given two strings containing zero or more #. Write a script to return true if the two given strings are same by treating # as backspace. Example 1 Input: $str1 = "ab#c" $str2 = "ad#c" Output: true For first string, we remove "b" as it is followed by "#". For second string, we remove "d" as it is followed by "#".In the end both strings became the same. Example 2 Input: $str1 = "ab##" $str2 = "a#b#" Output: true Example 3 Input: $str1 = "a#b" $str2 = "c" Output: false The thinking part Cleaning up backspaces in typed text is a very human thing to do. Let's make our musical theme Like Humans Do by David Byrne (2001). Two approaches: (1) regular expression replacements for two-character pairs where the second is #, or (2) sequential processing with a look-back every time we see a '#. The complication wi

May 17, 2025 - 14:50

PWC 321 Every Average Tells a Story, Don't It?

Monday morning at 3:30am, I woke up feeling like someone was watching me. My dog was sitting by the bed, burping slightly, and staring at me. This is his way of saying, "I feel the need to spend a moment outside. Or, if you'd rather not get up, I could leave something on the hallway rug for you to discover later." Quickly evaluating which scenario would start my week on a high note, I got up to let him out, leaving me thoroughly awake much too early. Fortunately, here at six time zones to the left of London, Monday morning means the Weekly Challenge is ready, so I had at it. Let's see what we have for week 321.

Task 1 Distinct Average

The description

You are given an array of numbers with even length. Write a script to return the count of distinct average. The average is calculated by removing the minimum and the maximum, then average of the two.

Example 1

Input: @nums = (1, 2, 4, 3, 5, 6)
Output: 1
- Step 1: Min = 1, Max = 6, Avg = 3.5
- Step 2: Min = 2, Max = 5, Avg = 3.5
- Step 3: Min = 3, Max = 4, Avg = 3.5
- The count of distinct average is 1.

Example 2

Input: @nums = (0, 2, 4, 8, 3, 5)
Output: 2
- Step 1: Min = 0, Max = 8, Avg = 4
- Step 2: Min = 2, Max = 5, Avg = 3.5
- Step 3: Min = 3, Max = 4, Avg = 3.5
- the count of distinct average is 2.

Example 3

Input: @nums = (7, 3, 1, 0, 5, 9)
Output: 2
- Step 1: Min = 0, Max = 9, Avg = 4.5
- Step 2: Min = 1, Max = 7, Avg = 4
- Step 3: Min = 3, Max = 5, Avg = 4
- The count of distinct average is 2.

The [lack of] thinking

Musical theme: way too early for this kind of noise, but how about Every Picture Tells a Story, Rod Stewart, 1971.

I read this too fast and jumped to the conclusion that the problem was to calculate the averages after the minimum and maximum were discarded. So, I got a bonus answer this week, but not for the question that was asked. It was dark and the coffee wasn't ready, OK?

Throwing out extreme values from a data set is a useful statistical technique. I have done it to suppress transient extremes from moving averages, or to clean up suspiciously out-of-range data from noisy measurements. Arguably, that might have been a more practical challenge problem.

Anyway, repeatedly looking for minimum and maximum suggests that we should sort the list, so that the minimum is always at one end and the maximum at the other.

Reducing a list to distinct values is easily done by using the values as keys in a hash.

The fun part

sub distAvg(@ints)
{
    my %average;
    @ints = sort { $a <=> $b } @ints;
    while ( @ints )
    {
        my $min = shift @ints;    # Remove minimum;
        my $max = pop @ints;      # Remove maximum;
        my $avg = ($max + $min) / 2;
        $average{ $avg } = true;
    }
    return scalar keys %average;
}

This can be smaller. The variables for mimimum, maximum, and average aren't really necessary; we can in-line them in the hash access.

Nor for that matter, is the division by 2. We're only looking for distinct values, and the constant factor is not relevant. Eliminating division is old-time thinking in C. I have worked on embedded systems where was no division instruction, so avoiding division operations was a useful optimization. Not so much here, but I'm going to leave it out for nostalgia's sake.

I'm being somewhat redundant on my return statement, explicitly naming the keys, and forcing a scalar value. Long ago, taking the scalar value of a hash used to return a funky string that gave information about the efficiency of the hash table, so it had to be done this way. About eight years ago (so still a "new feature" by Perl longevity standards) sanity prevailed and it was changed so that the scalar value of a hash is the number of distinct keys, so I could just say scalar(%average). Final answer:

sub distAvg(@ints)
{
    my %average;
    @ints = sort { $a <=> $b } @ints;
    while ( @ints )
    {
        $average{ (shift @ints)+(pop @ints) } = true;
    }
    return scalar %average;
}

Task 2 Backspace Compare

Description

You are given two strings containing zero or more #.
Write a script to return true if the two given strings are same by treating # as backspace.

Example 1

Input: $str1 = "ab#c" $str2 = "ad#c"
Output: true
- For first string, we remove "b" as it is followed by "#". For second string, we remove "d" as it is followed by "#".In the end both strings became the same.

Example 2

Input: $str1 = "ab##" $str2 = "a#b#"
Output: true

Example 3

Input: $str1 = "a#b" $str2 = "c"
Output: false

The thinking part

Cleaning up backspaces in typed text is a very human thing to do. Let's make our musical theme Like Humans Do by David Byrne (2001).

Two approaches: (1) regular expression replacements for two-character pairs where the second is #, or (2) sequential processing with a look-back every time we see a '#.

The complication will be if there are multiple backspaces in a row, possibly more backspaces than there are valid characters.

The "obvious" regular expression solution

We can take care of well-formed pairs with a simple substitution: $str =~ s/[^#]#//g. Any character that is not a # ([^#]) followed by a # can be deleted. If that leaves us with more backspaces remaining, we can repeat until all the backspace pairs are done.

However, what if the string starts with a sequence of #? Or what if the user smashed the backspace key until auto-repeat kicked in (like humans do) and there are dozens of extra # characters? There won't be any non-# characters to pair with. We'll have to do a cleanup for that.

sub bspRE($str)
{
    while ( $str =~ s/[^#]#//g ) { };
    return $str =~ s/#+//gr;
}

The sequential solution

We could process one character at a time. If it's not a #, move it to the output. If it is, pop the last item off the output (assuming there is any output).

{
    my @c = split(//, $str);
    my @out;
    for ( @c )
    {
        if ( $_ eq '#' )
        {
            pop @out;
        }
        else
        {
            push @out, $_;
        }
    }
    return join("", @out);
}

This could also be done with substr, to operate on strings instead of arrays, which might be more efficient, at least because we can avoid a split and a join.

sub bspSTR($str)
{
    my $out = '';
    while ( (my $c = substr($str, 0, 1, '')) ne '' )
    {
        if ( $c eq '#' )
        {
            substr($out, -1, 1, '');
        }
        else
        {
            $out .= $c;
        }
    }
    return $out;
}

Small programming note on the while loop: each execution of the substr will return the first character of $str, and replace it with an empty string. If the string contains a 0, Perl will treat that as a false value, so we have to explicitly check for an empty return to stop at the end of $str and not before.

Performance compare

Which is faster? There's a fair amount of under-the-hood processing in regular expression matching and substitution, so maybe the sequential processing is more efficient. String operations are probably faster than array operations. Let's find out.

sub runBenchmark($repeat)
{
    use Benchmark qw/cmpthese/;

    my $str = 'abcdefghijklmnopqrstuvwxyz' x 5;
    for ( 1 .. 15 ) { substr($str, int(rand(length($str))), 1, '#') }

    cmpthese($repeat, {
            array  => sub { bsp($str) },
            string => sub { bspSTR($str) },
            regex  => sub { bspRE($str) }
        });
}

I'm making a long-ish string with five copies of the alphabet, and then randomly replacing 15 of the characters. Each of my possible solutions is then run some large number of times. Here's what I get on my five-year-old MacBook with M1 processor:

           Rate  array string  regex
array   61728/s     --   -15%   -89%
string  72289/s    17%     --   -87%
regex  555556/s   800%   669%     --

String processing beats array processing by about 15%, but regular expression substitution blows both of them out of the water. Why is that? Probably because regex is highly-optimized C code, and all of the backspace pairs are removed in one pass of s///g, while the other solutions are executing Perl operations on every character. I suspect that if I implemented this in C, the direct manipulation of character arrays would win, because of the under-the-hood machinery of the regex library. (Narrator: he did not implement it in C.)