SPO600 Project Stage 3: Wrapping it Up
Introduction Welcome back! These are the goals for the final installment of the SPO600 Project: The pass must identify 2+ versions of a cloned function. Create a test case that provides a “PRUNE” or “NO PRUNE” recommendation. Refactor and clean up the Stage 2 implementation. Note: here’s the GitHub repo containing the test cases for and stage 3 implementation. Finding Multiple Function Versions of a Cloned Function The current implementation has access to all variants via call graph but only records the “fingerprint” of a single variant. // ~/gccsrc/gcc/gcc/tree-prune-clones.cc ... // Compare variable reference maps **auto orig_map_iter = gimple_var_map_1.begin(); auto clone_map_iter = gimple_var_map_2.begin();** while (orig_map_iter != gimple_var_map_1.end()) { if (orig_map_iter->first != clone_map_iter->first || orig_map_iter->second != clone_map_iter->second) { return false; } ++orig_map_iter; ++clone_map_iter; } Creating a Function “Fingerprint” A fingerprint is unique to each person and can be used to identify a person. It can work in a similar way for functions, where the number of basic blocks, statements, and operations can identify the structure of a variant. This approach must be approached with caution, as there is an increased risk of false positives for both “PRUNE” and “NO PRUNE” cases (Spoiler: I ran into that issue during refactoring). // ~/gccsrc/gcc/gcc/tree-prune-clones.cc namespace { struct FunctionFingerPrint { int bb_count = 0; int stmt_count = 0; std::vector gimple_ops; std::vector ops_count; std::vector op_types; std::map mapped_ids; }; ... Recording a Function “Fingerprint” Multiple variants can be stored in a map data structure where the variant name is the key and the function “Fingerprint” is the corresponding value. // ~/gccsrc/gcc/gcc/tree-prune-clones.cc namespace { ... class pass_prune_clones : public gimple_opt_pass { public: pass_prune_clones(gcc::context *ctxt) : gimple_opt_pass({GIMPLE_PASS, "prune_clones", OPTGROUP_NONE, TV_NONE, PROP_cfg, 0, 0, 0, 0}, ctxt) {} bool gate(function *) override { return true; } unsigned int execute(function *func) override; private: static bool get_base; **static std::map fingerprints;** }; ... Recording the function stays the same from stage 2, but refactored into a GimpleVarMapper class. // ~/gccsrc/gcc/gcc/tree-prune-clones.cc namespace { ... class GimpleVarMapper { public: int getGimpleVars(std::map &resolver_map, int index, gimple *stmt, std::map &compare_map) { std::vector gimple_vars; switch (gimple_code(stmt)) { case GIMPLE_ASSIGN: { tree lhs = gimple_assign_lhs(stmt); gimple_vars.push_back(lhs); int rhs_ops = gimple_num_ops(stmt) - 1; for (int i = 1; i

Introduction
Welcome back! These are the goals for the final installment of the SPO600 Project:
- The pass must identify 2+ versions of a cloned function.
- Create a test case that provides a “PRUNE” or “NO PRUNE” recommendation.
- Refactor and clean up the Stage 2 implementation.
Note: here’s the GitHub repo containing the test cases for and stage 3 implementation.
Finding Multiple Function Versions of a Cloned Function
The current implementation has access to all variants via call graph but only records the “fingerprint” of a single variant.
// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
...
// Compare variable reference maps
**auto orig_map_iter = gimple_var_map_1.begin();
auto clone_map_iter = gimple_var_map_2.begin();**
while (orig_map_iter != gimple_var_map_1.end())
{
if (orig_map_iter->first != clone_map_iter->first || orig_map_iter->second != clone_map_iter->second)
{
return false;
}
++orig_map_iter;
++clone_map_iter;
}
Creating a Function “Fingerprint”
A fingerprint is unique to each person and can be used to identify a person. It can work in a similar way for functions, where the number of basic blocks, statements, and operations can identify the structure of a variant.
This approach must be approached with caution, as there is an increased risk of false positives for both “PRUNE” and “NO PRUNE” cases (Spoiler: I ran into that issue during refactoring).
// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
namespace
{
struct FunctionFingerPrint
{
int bb_count = 0;
int stmt_count = 0;
std::vector<enum gimple_code> gimple_ops;
std::vector<unsigned> ops_count;
std::vector<enum tree_code> op_types;
std::map<int, int> mapped_ids;
};
...
Recording a Function “Fingerprint”
Multiple variants can be stored in a map
data structure where the variant name is the key and the function “Fingerprint” is the corresponding value.
// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
namespace
{
...
class pass_prune_clones : public gimple_opt_pass
{
public:
pass_prune_clones(gcc::context *ctxt)
: gimple_opt_pass({GIMPLE_PASS,
"prune_clones",
OPTGROUP_NONE,
TV_NONE,
PROP_cfg,
0, 0, 0, 0},
ctxt) {}
bool gate(function *) override { return true; }
unsigned int execute(function *func) override;
private:
static bool get_base;
**static std::map<std::string, FunctionFingerPrint> fingerprints;**
};
...
Recording the function stays the same from stage 2, but refactored into a GimpleVarMapper
class.
// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
namespace
{
...
class GimpleVarMapper
{
public:
int getGimpleVars(std::map<tree, int> &resolver_map, int index, gimple *stmt, std::map<int, int> &compare_map)
{
std::vector<tree> gimple_vars;
switch (gimple_code(stmt))
{
case GIMPLE_ASSIGN:
{
tree lhs = gimple_assign_lhs(stmt);
gimple_vars.push_back(lhs);
int rhs_ops = gimple_num_ops(stmt) - 1;
for (int i = 1; i <= rhs_ops; ++i)
gimple_vars.push_back(gimple_op(stmt, i));
break;
}
case GIMPLE_DEBUG_BIND:
{
if (tree var = gimple_debug_bind_get_var(stmt))
gimple_vars.push_back(var);
if (tree val = gimple_debug_bind_get_value(stmt))
gimple_vars.push_back(val);
break;
}
case GIMPLE_COND:
{
if (tree lhs = gimple_cond_lhs(stmt))
gimple_vars.push_back(lhs);
if (tree rhs = gimple_cond_rhs(stmt))
gimple_vars.push_back(rhs);
break;
}
case GIMPLE_CALL:
{
if (tree lhs = gimple_call_lhs(stmt))
gimple_vars.push_back(lhs);
break;
}
default:
break;
}
for (tree var : gimple_vars)
{
if (resolver_map.find(var) != resolver_map.end())
{
compare_map[index] = resolver_map[var];
}
else
{
resolver_map[var] = index;
compare_map[index] = index;
}
++index;
}
return index;
}
};
...
Once the base function is recorded, the variants are fingerprinted and compared to the base function’s fingerprint. A boolean value is returned, corresponding to a “PRUNE” or “NO PRUNE” recommendation made in pass_prune_clones::execute
.
// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
namespace
{
...
class CloneComparator
{
public:
bool compare(const FunctionFingerPrint &resolver_map, function *func)
{
FunctionFingerPrint curr_fingerprint;
std::map<tree, int> curr_resolver_map;
std::map<int, int> curr_ids;
int index = 0;
basic_block bb;
FOR_EACH_BB_FN(bb, func)
{
curr_fingerprint.bb_count++;
for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
{
curr_fingerprint.stmt_count++;
}
}
if (curr_fingerprint.bb_count != resolver_map.bb_count || curr_fingerprint.stmt_count != resolver_map.stmt_count)
return false;
auto gop = resolver_map.gimple_ops.begin();
auto opc = resolver_map.ops_count.begin();
auto ot = resolver_map.op_types.begin();
GimpleVarMapper varMapper;
FOR_EACH_BB_FN(bb, func)
{
for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
{
gimple *stmt = gsi_stmt(gsi);
index = varMapper.getGimpleVars(curr_resolver_map, index, stmt, curr_ids);
if (gimple_code(stmt) != *gop || gimple_num_ops(stmt) != *opc)
return false;
if (*gop == GIMPLE_ASSIGN && *(ot++) != gimple_assign_rhs_code(stmt))
return false;
++gop;
++opc;
}
}
return resolver_map.mapped_ids == curr_ids;
}
};
...
class pass_prune_clones : public gimple_opt_pass
{
unsigned int pass_prune_clones::execute(function *func)
{
**CloneComparator comparator;
if (comparator.compare(resolver_map, func))
{
fprintf(dump_file, "[ PRUNE ]: %s\n", name.c_str());
}
else
{
fprintf(dump_file, "[ NO PRUNE ]: %s\n", name.c_str());
}
...**
Ta-da! The pass now has the ability to make a “PRUNE” or “NO PRUNE” recommendation on each variant. We’ll confirm the functionality in the “Results” section.
Test Cases
One of the new tests was inspired by the original test case provided by Chris Tyler - where it uses a “for” loop to scale an int16
value and converts it into an int32
value then back to int16
.
//~/spo600/examples/test-clone/clone-test-core.c
// Original
CLONE_ATTRIBUTE
void scale_samples(int16_t *in, int16_t *out, int cnt, int volume) {
for (int x = 0; x < cnt; x++) {
out[x] = ((((int32_t) in[x]) * ((int32_t)(32767 * volume / 100) << 1)) >> 16);
}
}
Unrolling 4 Units
This test was inspired by one of the 6502 optimization techniques - loop unrolling. This test combines both vectorization and arithmetic test cases.
// Slightly modified - unrolling 4 units
CLONE_ATTRIBUTE
void scale_samples_alt(int16_t *in, int16_t *out, int cnt, int volume) {
int32_t factor = ((int32_t)(32767 * volume / 100) << 1);
for (int x = 0; x < cnt; x += 4) {
for (int j = 0; j < 4 && x + j < cnt; j++) {
out[x + j] = (((int32_t) in[x + j]) * factor) >> 16;
}
}
}
The original test case used a “for” loop, processing a single unit at a time. With unrolling, it processes a “packet” of 4 units at a time. It accomplishes the same task but as seen in 6502, unrolling reduces the number of branches and loop counters.
This structural difference makes the function more focused on vectorization, and it’s likely to be preserved by clone detection as a unique variant, rather than pruned.
Arithmetic Operation
This test case is rather simple, where 2 int
values are added via pointers.
// Scaler operation
CLONE_ATTRIBUTE
void scalar_op(int *a, int *b) {
*a += *b;
}
This test can be considered a baseline to verify whether the pruning pass can distinguish between simple scalar operations and more complex, vectorizable variants like the previous case.
Performing the Tests
The directory with the original test was cloned:
cd ~/spo600/examples
cp -r ./test-clone ./test-clone-2
-
Insert the test cases in
clone-test-core.c
// ~/spo600/examples/test-clone-2/clone-test-core.c #include
#include #include #include #include "vol.h" int sum_sample(int16_t *buff, size_t samples) { int sum = 0; for (size_t i = 0; i < samples; i++) { sum += buff[i]; } return sum; } // Original CLONE_ATTRIBUTE void scale_samples(int16_t *in, int16_t *out, int cnt, int volume) { for (int x = 0; x < cnt; x++) { out[x] = ((((int32_t) in[x]) * ((int32_t)(32767 * volume / 100) << 1)) >> 16); } } **// Slightly modified - unrolling 4 units CLONE_ATTRIBUTE void scale_samples_alt(int16_t *in, int16_t *out, int cnt, int volume) { int32_t factor = ((int32_t)(32767 * volume / 100) << 1); for (int x = 0; x < cnt; x += 4) { for (int j = 0; j < 4 && x + j < cnt; j++) { out[x + j] = (((int32_t) in[x + j]) * factor) >> 16; } } } // Scaler operation CLONE_ATTRIBUTE void scalar_op(int *a, int *b) { *a += *b; }** int main(int argc, char **argv) { int16_t *in = (int16_t *)calloc(SAMPLES, sizeof(int16_t)); int16_t *out = (int16_t *)calloc(SAMPLES, sizeof(int16_t)); int a = 1; int b = 2; vol_createsample(in, SAMPLES); if (argc > 1 && argv[1][0] == 'a') { scale_samples_alt(in, out, SAMPLES, VOLUME); scalar_op(&a, &b); } else { scale_samples(in, out, SAMPLES, VOLUME); scalar_op(&a, &b); } int ttl = sum_sample(out, SAMPLES); printf("Result: %d\n", ttl); free(in); free(out); return 0; } Uncomment
DUMP_ALL = 1
-
Use the built gcc with the
prune_clones
pass-
PATH="$HOME/gcc-test-002/bin:$PATH"
- Note: use your gcc directory
-
-
Run command
make all
to generate the dump files- Contains the “PRUNE” or “NO PRUNE” log.
Results
And the results!
// x86
japablo@x86-001:~/spo600/examples/test-clone-2$ grep -i "PRUNE" clone-test-x86-*
**clone-test-x86-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.arch_x86_64_v3
clone-test-x86-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples_alt.arch_x86_64_v3**
clone-test-x86-noprune-clone-test-core.c.264t.prune_clones:[ NO PRUNE ]: scale_samples.arch_x86_64_v3
clone-test-x86-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.popcnt
clone-test-x86-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples_alt.popcnt
clone-test-x86-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples.popcnt
// aarch64
[japablo@aarch64-002 test-clone]$ grep -i "PRUNE" clone-test-aarch64-*
clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.sve2
clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.fp
**clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ NO PRUNE ]: scale_samples_alt.sve2**
clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples_alt.fp
clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ NO PRUNE ]: scale_samples.sve2
clone-test-aarch64-noprune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples.fp
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.sve2
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scalar_op.fp
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ NO PRUNE ]: scale_samples_alt.sve2
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples_alt.fp
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ NO PRUNE ]: scale_samples.sve2
clone-test-aarch64-prune-clone-test-core.c.264t.prune_clones:[ PRUNE ]: scale_samples.fp
Note: I added sve2
and fp
when compiling in aarch64
.
Discussion
Looking at the results, there are some false positives, where it recommends “PRUNE” when it should not, and false negatives, where it recommends “NO PRUNE” when it should. This can be attributed to the level of specificity in how a function is fingerprinted.
When refactoring, I tried normalizing SSA names and expressions, but the implementation ended up being too aggressive—treating variables with different origins, like parameters, constants, or intermediate results, as the same. That led to a higher false positive rate.
Limitations
1.Assumption of Same Control Flow for Comparison
The fingerprint comparison assumes that the control flow structure (basic blocks, statements) of the functions will be identical if they are clones. However, optimizations like reordering of basic blocks or merging of identical blocks could cause differences in control flow that don't impact the semantic behavior of the program. This could lead to false negatives where a valid clone is not pruned.
2. Hard-Coded Assumptions in Function Name Processing
The code assumes that the function name always contains a ".resolver" suffix to indicate whether a function is a resolver. This is a somewhat brittle approach, as the naming conventions could change, leading to incorrect behavior. Additionally, the method for extracting the base function name from the full name (name.substr(0, name.find("."))
) may fail in edge cases, such as when function names multiple dots.
3. Potential for Incorrect Handling of Debugging Information
The GIMPLE_DEBUG_BIND
operation is considered during SSA variable resolution, but debugging information may not always be included in actual program behavior, leading to the inclusion of irrelevant variables during the comparison. This could affect the accuracy of the pruning process.
Conclusion
Overall, this project was challenging—not just in theory, but especially in practice. Actually coding, testing each idea, and going through countless GCC builds taught me way more than just reading about it ever could. While the implementation isn’t perfect, it’s something I never would’ve attempted—or even understood—just four months ago. I had zero knowledge of this back then, so I’m really proud I was able to get it done!