Dev.to

SPO600 Project Stage 3: Wrapping it Up

Introduction Welcome back! These are the goals for the final installment of the SPO600 Project: The pass must identify 2+ versions of a cloned function. Create a test case that provides a “PRUNE” or “NO PRUNE” recommendation. Refactor and clean up the Stage 2 implementation. Note: here’s the GitHub repo containing the test cases for and stage 3 implementation. Finding Multiple Function Versions of a Cloned Function The current implementation has access to all variants via call graph but only records the “fingerprint” of a single variant. // ~/gccsrc/gcc/gcc/tree-prune-clones.cc ... // Compare variable reference maps auto orig_map_iter = gimple_var_map_1.begin(); auto clone_map_iter = gimple_var_map_2.begin(); while (orig_map_iter != gimple_var_map_1.end()) { if (orig_map_iter->first != clone_map_iter->first || orig_map_iter->second != clone_map_iter->second) { return false; } ++orig_map_iter; ++clone_map_iter; } Creating a Function “Fingerprint” A fingerprint is unique to each person and can be used to identify a person. It can work in a similar way for functions, where the number of basic blocks, statements, and operations can identify the structure of a variant. This approach must be approached with caution, as there is an increased risk of false positives for both “PRUNE” and “NO PRUNE” cases (Spoiler: I ran into that issue during refactoring). // ~/gccsrc/gcc/gcc/tree-prune-clones.cc namespace { struct FunctionFingerPrint { int bb_count = 0; int stmt_count = 0; std::vector gimple_ops; std::vector ops_count; std::vector op_types; std::map mapped_ids; }; ... Recording a Function “Fingerprint” Multiple variants can be stored in a map data structure where the variant name is the key and the function “Fingerprint” is the corresponding value. // ~/gccsrc/gcc/gcc/tree-prune-clones.cc namespace { ... class pass_prune_clones : public gimple_opt_pass { public: pass_prune_clones(gcc::context ctxt) : gimple_opt_pass({GIMPLE_PASS, "prune_clones", OPTGROUP_NONE, TV_NONE, PROP_cfg, 0, 0, 0, 0}, ctxt) {} bool gate(function ) override { return true; } unsigned int execute(function func) override; private: static bool get_base; static std::map fingerprints; }; ... Recording the function stays the same from stage 2, but refactored into a GimpleVarMapper class. // ~/gccsrc/gcc/gcc/tree-prune-clones.cc namespace { ... class GimpleVarMapper { public: int getGimpleVars(std::map &resolver_map, int index, gimple stmt, std::map &compare_map) { std::vector gimple_vars; switch (gimple_code(stmt)) { case GIMPLE_ASSIGN: { tree lhs = gimple_assign_lhs(stmt); gimple_vars.push_back(lhs); int rhs_ops = gimple_num_ops(stmt) - 1; for (int i = 1; i

Apr 17, 2025 - 20:02

Introduction

Welcome back! These are the goals for the final installment of the SPO600 Project:

The pass must identify 2+ versions of a cloned function.
Create a test case that provides a “PRUNE” or “NO PRUNE” recommendation.
Refactor and clean up the Stage 2 implementation.

Note: here’s the GitHub repo containing the test cases for and stage 3 implementation.

Finding Multiple Function Versions of a Cloned Function

The current implementation has access to all variants via call graph but only records the “fingerprint” of a single variant.

// ~/gccsrc/gcc/gcc/tree-prune-clones.cc

...

// Compare variable reference maps
**auto orig_map_iter = gimple_var_map_1.begin();
auto clone_map_iter = gimple_var_map_2.begin();**

while (orig_map_iter != gimple_var_map_1.end())
{
    if (orig_map_iter->first != clone_map_iter->first || orig_map_iter->second != clone_map_iter->second)
    {
        return false;
    }
    ++orig_map_iter;
    ++clone_map_iter;
}

Creating a Function “Fingerprint”

A fingerprint is unique to each person and can be used to identify a person. It can work in a similar way for functions, where the number of basic blocks, statements, and operations can identify the structure of a variant.

This approach must be approached with caution, as there is an increased risk of false positives for both “PRUNE” and “NO PRUNE” cases (Spoiler: I ran into that issue during refactoring).

// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
namespace
{

 struct FunctionFingerPrint
    {
        int bb_count = 0;
        int stmt_count = 0;
        std::vector<enum gimple_code> gimple_ops;
        std::vector<unsigned> ops_count;
        std::vector<enum tree_code> op_types;
        std::map<int, int> mapped_ids;
    };

...

Recording a Function “Fingerprint”

Multiple variants can be stored in a map data structure where the variant name is the key and the function “Fingerprint” is the corresponding value.

// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
namespace
{

...

 class pass_prune_clones : public gimple_opt_pass
    {
    public:
        pass_prune_clones(gcc::context *ctxt)
            : gimple_opt_pass({GIMPLE_PASS,
                               "prune_clones",
                               OPTGROUP_NONE,
                               TV_NONE,
                               PROP_cfg,
                               0, 0, 0, 0},
                              ctxt) {}

        bool gate(function *) override { return true; }
        unsigned int execute(function *func) override;

    private:
        static bool get_base;
        **static std::map<std::string, FunctionFingerPrint> fingerprints;**
    };

...

Recording the function stays the same from stage 2, but refactored into a GimpleVarMapper class.

// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
namespace 
{

...

    class GimpleVarMapper
    {
    public:
        int getGimpleVars(std::map<tree, int> &resolver_map, int index, gimple *stmt, std::map<int, int> &compare_map)
        {
            std::vector<tree> gimple_vars;

            switch (gimple_code(stmt))
            {
            case GIMPLE_ASSIGN:
            {
                tree lhs = gimple_assign_lhs(stmt);
                gimple_vars.push_back(lhs);
                int rhs_ops = gimple_num_ops(stmt) - 1;
                for (int i = 1; i <= rhs_ops; ++i)
                    gimple_vars.push_back(gimple_op(stmt, i));
                break;
            }
            case GIMPLE_DEBUG_BIND:
            {
                if (tree var = gimple_debug_bind_get_var(stmt))
                    gimple_vars.push_back(var);
                if (tree val = gimple_debug_bind_get_value(stmt))
                    gimple_vars.push_back(val);
                break;
            }
            case GIMPLE_COND:
            {
                if (tree lhs = gimple_cond_lhs(stmt))
                    gimple_vars.push_back(lhs);
                if (tree rhs = gimple_cond_rhs(stmt))
                    gimple_vars.push_back(rhs);
                break;
            }
            case GIMPLE_CALL:
            {
                if (tree lhs = gimple_call_lhs(stmt))
                    gimple_vars.push_back(lhs);
                break;
            }
            default:
                break;
            }

            for (tree var : gimple_vars)
            {
                if (resolver_map.find(var) != resolver_map.end())
                {
                    compare_map[index] = resolver_map[var];
                }
                else
                {
                    resolver_map[var] = index;
                    compare_map[index] = index;
                }
                ++index;
            }

            return index;
        }
    };

    ...

Once the base function is recorded, the variants are fingerprinted and compared to the base function’s fingerprint. A boolean value is returned, corresponding to a “PRUNE” or “NO PRUNE” recommendation made in pass_prune_clones::execute.

// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
namespace 
{

...

    class CloneComparator
    {
    public:
        bool compare(const FunctionFingerPrint &resolver_map, function *func)
        {
            FunctionFingerPrint curr_fingerprint;
            std::map<tree, int> curr_resolver_map;
            std::map<int, int> curr_ids;
            int index = 0;

            basic_block bb;
            FOR_EACH_BB_FN(bb, func)
            {
                curr_fingerprint.bb_count++;
                for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
                {
                    curr_fingerprint.stmt_count++;
                }
            }

            if (curr_fingerprint.bb_count != resolver_map.bb_count || curr_fingerprint.stmt_count != resolver_map.stmt_count)
                return false;

            auto gop = resolver_map.gimple_ops.begin();
            auto opc = resolver_map.ops_count.begin();
            auto ot = resolver_map.op_types.begin();

            GimpleVarMapper varMapper;

            FOR_EACH_BB_FN(bb, func)
            {
                for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
                {
                    gimple *stmt = gsi_stmt(gsi);
                    index = varMapper.getGimpleVars(curr_resolver_map, index, stmt, curr_ids);

                    if (gimple_code(stmt) != *gop || gimple_num_ops(stmt) != *opc)
                        return false;

                    if (*gop == GIMPLE_ASSIGN && *(ot++) != gimple_assign_rhs_code(stmt))
                        return false;

                    ++gop;
                    ++opc;
                }
            }

            return resolver_map.mapped_ids == curr_ids;
        }
    };

...

 class pass_prune_clones : public gimple_opt_pass
    {
        unsigned int pass_prune_clones::execute(function *func)
        {
                 **CloneComparator comparator;
                    if (comparator.compare(resolver_map, func))
                    {
                        fprintf(dump_file, "[ PRUNE ]: %s\n", name.c_str());
                    }
                    else
                    {
                        fprintf(dump_file, "[ NO PRUNE ]: %s\n", name.c_str());
                    }

...**

Ta-da! The pass now has the ability to make a “PRUNE” or “NO PRUNE” recommendation on each variant. We’ll confirm the functionality in the “Results” section.

Test Cases

One of the new tests was inspired by the original test case provided by Chris Tyler - where it uses a “for” loop to scale an int16 value and converts it into an int32 value then back to int16.

//~/spo600/examples/test-clone/clone-test-core.c

// Original
CLONE_ATTRIBUTE
void scale_samples(int16_t *in, int16_t *out, int cnt, int volume) {
    for (int x = 0; x < cnt; x++) {
        out[x] = ((((int32_t) in[x]) * ((int32_t)(32767 * volume / 100) << 1)) >> 16);
    }
}

Unrolling 4 Units

This test was inspired by one of the 6502 optimization techniques - loop unrolling. This test combines both vectorization and arithmetic test cases.

// Slightly modified - unrolling 4 units
CLONE_ATTRIBUTE
void scale_samples_alt(int16_t *in, int16_t *out, int cnt, int volume) {
    int32_t factor = ((int32_t)(32767 * volume / 100) << 1);
    for (int x = 0; x < cnt; x += 4) {
        for (int j = 0; j < 4 && x + j < cnt; j++) {
            out[x + j] = (((int32_t) in[x + j]) * factor) >> 16;
        }
    }
}

The original test case used a “for” loop, processing a single unit at a time. With unrolling, it processes a “packet” of 4 units at a time. It accomplishes the same task but as seen in 6502, unrolling reduces the number of branches and loop counters.

This structural difference makes the function more focused on vectorization, and it’s likely to be preserved by clone detection as a unique variant, rather than pruned.

Arithmetic Operation

This test case is rather simple, where 2 int values are added via pointers.

// Scaler operation
CLONE_ATTRIBUTE
void scalar_op(int *a, int *b) {
    *a += *b;
}

This test can be considered a baseline to verify whether the pruning pass can distinguish between simple scalar operations and more complex, vectorizable variants like the previous case.

Performing the Tests

The directory with the original test was cloned:

cd ~/spo600/examples
cp -r ./test-clone ./test-clone-2

Insert the test cases in clone-test-core.c

// ~/spo600/examples/test-clone-2/clone-test-core.c

#include 
#include 
#include 
#include 
#include "vol.h"

int sum_sample(int16_t *buff, size_t samples) {
    int sum = 0;
    for (size_t i = 0; i < samples; i++) {
        sum += buff[i];
    }
    return sum;
}

// Original
CLONE_ATTRIBUTE
void scale_samples(int16_t *in, int16_t *out, int cnt, int volume) {
    for (int x = 0; x < cnt; x++) {
        out[x] = ((((int32_t) in[x]) * ((int32_t)(32767 * volume / 100) << 1)) >> 16);
    }
}

**// Slightly modified - unrolling 4 units
CLONE_ATTRIBUTE
void scale_samples_alt(int16_t *in, int16_t *out, int cnt, int volume) {
    int32_t factor = ((int32_t)(32767 * volume / 100) << 1);
    for (int x = 0; x < cnt; x += 4) {
        for (int j = 0; j < 4 && x + j < cnt; j++) {
            out[x + j] = (((int32_t) in[x + j]) * factor) >> 16;
        }
    }
}

// Scaler operation
CLONE_ATTRIBUTE
void scalar_op(int *a, int *b) {
    *a += *b;
}**

int main(int argc, char **argv) {
    int16_t *in = (int16_t *)calloc(SAMPLES, sizeof(int16_t));
    int16_t *out = (int16_t *)calloc(SAMPLES, sizeof(int16_t));
    int a = 1;
    int b = 2;

    vol_createsample(in, SAMPLES);

    if (argc > 1 && argv[1][0] == 'a') {
        scale_samples_alt(in, out, SAMPLES, VOLUME);
    scalar_op(&a, &b);
    } else {
        scale_samples(in, out, SAMPLES, VOLUME);
    scalar_op(&a, &b);
    }

    int ttl = sum_sample(out, SAMPLES);
    printf("Result: %d\n", ttl);

    free(in);
    free(out);
    return 0;
}

Uncomment DUMP_ALL = 1
Use the built gcc with the prune_clones pass
1. PATH="$HOME/gcc-test-002/bin:$PATH"
  1. Note: use your gcc directory
Run command make all to generate the dump files
1. Contains the “PRUNE” or “NO PRUNE” log.

Results

And the results!

// x86

japablo@x86-001:~/spo600/examples/test-clone-2$ grep -i "PRUNE" clone-test-x86-*
**clone-test-x86-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.arch_x86_64_v3
clone-test-x86-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples_alt.arch_x86_64_v3**
clone-test-x86-noprune-clone-test-core.c.264t.prune_clones:[ NO PRUNE ]: scale_samples.arch_x86_64_v3
clone-test-x86-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.popcnt
clone-test-x86-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples_alt.popcnt
clone-test-x86-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples.popcnt

// aarch64
[japablo@aarch64-002 test-clone]$ grep -i "PRUNE" clone-test-aarch64-*
clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.sve2
clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.fp
**clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ NO PRUNE ]: scale_samples_alt.sve2**
clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples_alt.fp
clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ NO PRUNE ]: scale_samples.sve2
clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples.fp
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.sve2
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.fp
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ NO PRUNE ]: scale_samples_alt.sve2
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples_alt.fp
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ NO PRUNE ]: scale_samples.sve2
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples.fp

Note: I added sve2 and fp when compiling in aarch64.

Discussion

Looking at the results, there are some false positives, where it recommends “PRUNE” when it should not, and false negatives, where it recommends “NO PRUNE” when it should. This can be attributed to the level of specificity in how a function is fingerprinted.

When refactoring, I tried normalizing SSA names and expressions, but the implementation ended up being too aggressive—treating variables with different origins, like parameters, constants, or intermediate results, as the same. That led to a higher false positive rate.

Limitations

1.Assumption of Same Control Flow for Comparison

The fingerprint comparison assumes that the control flow structure (basic blocks, statements) of the functions will be identical if they are clones. However, optimizations like reordering of basic blocks or merging of identical blocks could cause differences in control flow that don't impact the semantic behavior of the program. This could lead to false negatives where a valid clone is not pruned.

2. Hard-Coded Assumptions in Function Name Processing

The code assumes that the function name always contains a ".resolver" suffix to indicate whether a function is a resolver. This is a somewhat brittle approach, as the naming conventions could change, leading to incorrect behavior. Additionally, the method for extracting the base function name from the full name (name.substr(0, name.find("."))) may fail in edge cases, such as when function names multiple dots.

3. Potential for Incorrect Handling of Debugging Information

The GIMPLE_DEBUG_BIND operation is considered during SSA variable resolution, but debugging information may not always be included in actual program behavior, leading to the inclusion of irrelevant variables during the comparison. This could affect the accuracy of the pruning process.

Conclusion

Overall, this project was challenging—not just in theory, but especially in practice. Actually coding, testing each idea, and going through countless GCC builds taught me way more than just reading about it ever could. While the implementation isn’t perfect, it’s something I never would’ve attempted—or even understood—just four months ago. I had zero knowledge of this back then, so I’m really proud I was able to get it done!