Project Stage 2, Part 3: “Prune Away!”

Introduction Welcome back for the final part of stage 2, printing a “prune” suggestion in the dump file. Each function can be identified by its “fingerprint”, all we need to do is compare them and print a suggestion to prune or not. Let’s get started! Comparing Function Fingerprints // ~/gccsrc/gcc/gcc/tree-prune-clones.cc std::string base_function; int gimple_bb_count_1 = 0; int gimple_stmt_count_1 = 0; std::vector gimple_sequence_1; std::vector gimple_ops_count_1; std::vector gimple_op_types_1; std::map gimple_var_map_1; ... bool pass_prune_clones::compare_function_info(function *func) const { int bb_count = 0; int stmt_count = 0; basic_block bb; int i = 0; std::map first_occurrence_map; std::map gimple_var_map_2; FOR_EACH_BB_FN(bb, func) { bb_count++; for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) { stmt_count++; i = get_gimple_vars(first_occurrence_map, i, gsi_stmt(gsi), gimple_var_map_2); } } if (bb_count != gimple_bb_count_1) { fprintf(dump_file, "[DIFF] BB count: %d vs %d\n", bb_count, gimple_bb_count_1); return false; } if (stmt_count != gimple_stmt_count_1) { fprintf(dump_file, "[DIFF] Statement count: %d vs %d\n", stmt_count, gimple_stmt_count_1); return false; } if (bb_count != gimple_bb_count_1 || stmt_count != gimple_stmt_count_1) { return false; } auto it = gimple_sequence_1.begin(); auto count_it = gimple_ops_count_1.begin(); auto optype_it = gimple_op_types_1.begin(); FOR_EACH_BB_FN(bb, func) { for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi)) { if (gimple_code(gsi_stmt(gsi)) != *it || gimple_num_ops(gsi_stmt(gsi)) != *count_it) { return false; } if (*it == GIMPLE_ASSIGN && *(optype_it++) != gimple_assign_rhs_code(gsi_stmt(gsi))) { return false; } ++it; ++count_it; } } // Compare variable reference maps auto orig_map_iter = gimple_var_map_1.begin(); auto clone_map_iter = gimple_var_map_2.begin(); while (orig_map_iter != gimple_var_map_1.end()) { if (orig_map_iter->first != clone_map_iter->first || orig_map_iter->second != clone_map_iter->second) { return false; } ++orig_map_iter; ++clone_map_iter; } return true; } 1. traverse every basic block in the function using FOR_EACH_BB_FN, and count: The number of basic blocks (bb_count) The number of statements (stmt_count) We also regenerate a variable usage map (gimple_var_map_2) by calling get_gimple_vars(...) on each statement. This tracks how and where each variable is used, without caring about their actual names. 2. Compare Block & Statement Counts if (bb_count != gimple_bb_count_1 || stmt_count != gimple_stmt_count_1) Before diving into deep comparison, we check if the raw shape of the functions match: Same number of basic blocks? Same number of GIMPLE statements? If these are different, we conclude early — the functions are structured differently. 3. Compare GIMPLE Statement Patterns We iterate through each basic block and each GIMPLE statement, comparing the overall structure: The type of GIMPLE statement (e.g., GIMPLE_ASSIGN, GIMPLE_CALL, etc.) The number of operands in the statement And if it’s an assignment, we go deeper: We compare the operation type used on the RHS (right-hand side). These checks ensure that both functions are doing the same operations in the same order, even if the variable names (LHS/RHS identifiers) are different. // ~/gccsrc/gcc/gcc/tree-prune-clones.cc ... gimple_sequence_2.push_back(gimple_code(stmt)); gimple_ops_count_2.push_back(gimple_num_ops(stmt)); if (is_gimple_assign(stmt)) { gimple_op_types_2.push_back(gimple_assign_rhs_code(stmt)); } gimple_code(stmt) tells you what type of statement it is (GIMPLE_ASSIGN, GIMPLE_CALL, etc.) gimple_num_ops(stmt) tells you how many operands it uses gimple_assign_rhs_code(stmt) is the key one — it gives you the specific opcode on the RHS 4. Compare Variable Usage Roles The final and most subtle comparison: // ~/gccsrc/gcc/gcc/tree-prune-clones.cc if (orig_map_iter->first != clone_map_iter->first || orig_map_iter->second != clone_map_iter->second) We compare the normalized variable usage maps (gimple_var_map_1 vs. gimple_var_map_2). These maps tell us things like: “The second statement uses the same variable as the one first used in statement X.” If the maps are identical, that means the variables play the same roles in both functions, even if their names are completely different. Putting it all

Apr 6, 2025 - 04:42
 0
Project Stage 2, Part 3: “Prune Away!”

Introduction

Welcome back for the final part of stage 2, printing a “prune” suggestion in the dump file. Each function can be identified by its “fingerprint”, all we need to do is compare them and print a suggestion to prune or not. Let’s get started!

Comparing Function Fingerprints

// ~/gccsrc/gcc/gcc/tree-prune-clones.cc

std::string base_function;
        int gimple_bb_count_1 = 0;
        int gimple_stmt_count_1 = 0;
        std::vector<enum gimple_code> gimple_sequence_1;
        std::vector<unsigned> gimple_ops_count_1;
        std::vector<enum tree_code> gimple_op_types_1;
        std::map<int, int> gimple_var_map_1;
...

bool pass_prune_clones::compare_function_info(function *func) const
{
    int bb_count = 0;
    int stmt_count = 0;
    basic_block bb;
    int i = 0;
    std::map<tree, int> first_occurrence_map;
    std::map<int, int> gimple_var_map_2;

    FOR_EACH_BB_FN(bb, func)
    {
        bb_count++;
        for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
        {
            stmt_count++;
            i = get_gimple_vars(first_occurrence_map, i, gsi_stmt(gsi), gimple_var_map_2);
        }
    }

    if (bb_count != gimple_bb_count_1)
    {
        fprintf(dump_file, "[DIFF] BB count: %d vs %d\n",
                bb_count, gimple_bb_count_1);
        return false;
    }

    if (stmt_count != gimple_stmt_count_1)
    {
        fprintf(dump_file, "[DIFF] Statement count: %d vs %d\n",
                stmt_count, gimple_stmt_count_1);
        return false;
    }

    if (bb_count != gimple_bb_count_1 || stmt_count != gimple_stmt_count_1)
    {
        return false;
    }

    auto it = gimple_sequence_1.begin();
    auto count_it = gimple_ops_count_1.begin();
    auto optype_it = gimple_op_types_1.begin();

    FOR_EACH_BB_FN(bb, func)
    {
        for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
        {
            if (gimple_code(gsi_stmt(gsi)) != *it || gimple_num_ops(gsi_stmt(gsi)) != *count_it)
            {
                return false;
            }
            if (*it == GIMPLE_ASSIGN && *(optype_it++) != gimple_assign_rhs_code(gsi_stmt(gsi)))
            {
                return false;
            }
            ++it;
            ++count_it;
        }
    }

    // Compare variable reference maps
    auto orig_map_iter = gimple_var_map_1.begin();
    auto clone_map_iter = gimple_var_map_2.begin();
    while (orig_map_iter != gimple_var_map_1.end())
    {
        if (orig_map_iter->first != clone_map_iter->first || orig_map_iter->second != clone_map_iter->second)
        {
            return false;
        }
        ++orig_map_iter;
        ++clone_map_iter;
    }

    return true;
}

1. traverse every basic block in the function using FOR_EACH_BB_FN, and count:

  • The number of basic blocks (bb_count)
  • The number of statements (stmt_count)

We also regenerate a variable usage map (gimple_var_map_2) by calling get_gimple_vars(...) on each statement. This tracks how and where each variable is used, without caring about their actual names.

2. Compare Block & Statement Counts


if (bb_count != gimple_bb_count_1 || stmt_count != gimple_stmt_count_1)

Before diving into deep comparison, we check if the raw shape of the functions match:

  • Same number of basic blocks?
  • Same number of GIMPLE statements?

If these are different, we conclude early — the functions are structured differently.

3. Compare GIMPLE Statement Patterns

We iterate through each basic block and each GIMPLE statement, comparing the overall structure:

  • The type of GIMPLE statement (e.g., GIMPLE_ASSIGN, GIMPLE_CALL, etc.)
  • The number of operands in the statement
  • And if it’s an assignment, we go deeper:
    • We compare the operation type used on the RHS (right-hand side).

These checks ensure that both functions are doing the same operations in the same order, even if the variable names (LHS/RHS identifiers) are different.

// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
...
gimple_sequence_2.push_back(gimple_code(stmt));
gimple_ops_count_2.push_back(gimple_num_ops(stmt));

if (is_gimple_assign(stmt))
{
    gimple_op_types_2.push_back(gimple_assign_rhs_code(stmt));
}
  • gimple_code(stmt) tells you what type of statement it is (GIMPLE_ASSIGN, GIMPLE_CALL, etc.)
  • gimple_num_ops(stmt) tells you how many operands it uses
  • gimple_assign_rhs_code(stmt) is the key one — it gives you the specific opcode on the RHS

4. Compare Variable Usage Roles

The final and most subtle comparison:

// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
if (orig_map_iter->first != clone_map_iter->first || orig_map_iter->second != clone_map_iter->second)

We compare the normalized variable usage maps (gimple_var_map_1 vs. gimple_var_map_2). These maps tell us things like:

“The second statement uses the same variable as the one first used in statement X.”

If the maps are identical, that means the variables play the same roles in both functions, even if their names are completely different.

Putting it all Together

// ~/gccsrc/gcc/gcc/tree-prune-clones.cc
#include 
#include 
#include 
#include 

#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"
#include "gimple.h"
#include "tree-pass.h"
#include "ssa.h"
#include "tree-pretty-print.h"
#include "gimple-iterator.h"
#include "gimple-walk.h"
#include "internal-fn.h"
#include "gimple-pretty-print.h"
#include "cgraph.h"
#include "gimple-ssa.h"
#include "attribs.h"
#include "pretty-print.h"
#include "tree-inline.h"
#include "intl.h"
#include "function.h"
#include "basic-block.h"

namespace
{
    const pass_data prune_clones_pass_data = {
        GIMPLE_PASS,    /* type */
        "prune_clones", /* name */
        OPTGROUP_NONE,  /* optinfo_flags */
        TV_NONE,        /* tv_id */
        PROP_cfg,
        0,
        0,
        0,
        0,
    };

    class pass_prune_clones : public gimple_opt_pass
    {
    public:
        pass_prune_clones(gcc::context *ctxt)
            : gimple_opt_pass(prune_clones_pass_data, ctxt) {}

        bool gate(function *) final override { return true; }
        void set_pass_param(unsigned int, bool) override {}
        unsigned int execute(function *) final override;

    private:
        int get_gimple_vars(std::map<tree, int> &first_occurrence_map, int i, gimple *stmt, std::map<int, int> &out_map) const;
        bool is_base_function(const std::string &name) const;
        void record_function_info(function *func);
        bool compare_function_info(function *func) const;

        std::string base_function;
        int gimple_bb_count_1 = 0;
        int gimple_stmt_count_1 = 0;
        std::vector<enum gimple_code> gimple_sequence_1;
        std::vector<unsigned> gimple_ops_count_1;
        std::vector<enum tree_code> gimple_op_types_1;
        std::map<int, int> gimple_var_map_1;
    };

    int pass_prune_clones::get_gimple_vars(std::map<tree, int> &first_occurrence_map, int i, gimple *stmt, std::map<int, int> &out_map) const
    {
        std::vector<tree> vars;

        switch (gimple_code(stmt))
        {

        case GIMPLE_COND:
        {
            tree left_op = gimple_cond_lhs(stmt);
            tree right_op = gimple_cond_rhs(stmt);
            if (left_op)
                vars.push_back(left_op);
            if (right_op)
                vars.push_back(right_op);
            break;
        }

        case GIMPLE_CALL:
        {
            tree left_op = gimple_call_lhs(stmt);
            if (left_op)
                vars.push_back(left_op);
            break;
        }

        case GIMPLE_ASSIGN:
        {
            tree left_op = gimple_assign_lhs(stmt);
            vars.push_back(left_op);

            int rhs_ops = gimple_num_ops(stmt) - 1;
            for (int i = 1; i <= rhs_ops; ++i)
            {
                tree op = gimple_op(stmt, i);
                vars.push_back(op);
            }
            break;
        }

        case GIMPLE_DEBUG_BIND:
        {
            tree debug_var = gimple_debug_bind_get_var(stmt);
            tree debug_value = gimple_debug_bind_get_value(stmt);
            if (debug_var)
                vars.push_back(debug_var);
            if (debug_value)
                vars.push_back(debug_value);
            break;
        }

        default:
            break;
        }

        for (tree debug_var : vars)
        {
            std::map<tree, int>::iterator it = first_occurrence_map.find(debug_var);

            if (it != first_occurrence_map.end())
            {
                out_map[i] = it->second;
            }
            else
            {
                first_occurrence_map[debug_var] = i;
                out_map[i] = i;
            }

            ++i;
        }

        return i;
    }

    bool pass_prune_clones::is_base_function(const std::string &name) const
    {
        return name.find(base_function) == 0 && name.find(".resolver") == std::string::npos;
    }

    void pass_prune_clones::record_function_info(function *func)
    {
        basic_block bb;
        int i = 0;
        std::map<tree, int> first_occurrence_map;

        FOR_EACH_BB_FN(bb, func)
        {
            gimple_bb_count_1++;
            for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
            {
                gimple_stmt_count_1++;
                i = get_gimple_vars(first_occurrence_map, i, gsi_stmt(gsi), gimple_var_map_1);
                gimple_sequence_1.push_back(gimple_code(gsi_stmt(gsi)));
                gimple_ops_count_1.push_back(gimple_num_ops(gsi_stmt(gsi)));
                if (is_gimple_assign(gsi_stmt(gsi)))
                {
                    gimple_op_types_1.push_back(gimple_assign_rhs_code(gsi_stmt(gsi)));
                }
            }
        }
    }

    bool pass_prune_clones::compare_function_info(function *func) const
    {
        int bb_count = 0;
        int stmt_count = 0;
        basic_block bb;
        int i = 0;
        std::map<tree, int> first_occurrence_map;
        std::map<int, int> gimple_var_map_2;

        FOR_EACH_BB_FN(bb, func)
        {
            bb_count++;
            for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
            {
                stmt_count++;
                i = get_gimple_vars(first_occurrence_map, i, gsi_stmt(gsi), gimple_var_map_2);
            }
        }

        if (bb_count != gimple_bb_count_1)
        {
            fprintf(dump_file, "[DIFF] BB count: %d vs %d\n",
                    bb_count, gimple_bb_count_1);
            return false;
        }

        if (stmt_count != gimple_stmt_count_1)
        {
            fprintf(dump_file, "[DIFF] Statement count: %d vs %d\n",
                    stmt_count, gimple_stmt_count_1);
            return false;
        }

        if (bb_count != gimple_bb_count_1 || stmt_count != gimple_stmt_count_1)
        {
            return false;
        }

        auto it = gimple_sequence_1.begin();
        auto count_it = gimple_ops_count_1.begin();
        auto optype_it = gimple_op_types_1.begin();

        FOR_EACH_BB_FN(bb, func)
        {
            for (gimple_stmt_iterator gsi = gsi_start_bb(bb); !gsi_end_p(gsi); gsi_next(&gsi))
            {
                if (gimple_code(gsi_stmt(gsi)) != *it || gimple_num_ops(gsi_stmt(gsi)) != *count_it)
                {
                    return false;
                }
                if (*it == GIMPLE_ASSIGN && *(optype_it++) != gimple_assign_rhs_code(gsi_stmt(gsi)))
                {
                    return false;
                }
                ++it;
                ++count_it;
            }
        }

        // Compare variable reference maps
        auto orig_map_iter = gimple_var_map_1.begin();
        auto clone_map_iter = gimple_var_map_2.begin();
        while (orig_map_iter != gimple_var_map_1.end())
        {
            if (orig_map_iter->first != clone_map_iter->first || orig_map_iter->second != clone_map_iter->second)
            {
                return false;
            }
            ++orig_map_iter;
            ++clone_map_iter;
        }

        return true;
    }

    unsigned int pass_prune_clones::execute(function *func)
    {
        if (!dump_file || !func || !func->cfg)
            return 0;

        struct cgraph_node *node = cgraph_node::get(func->decl);
        if (!node)
        {
            fprintf(dump_file, "ERROR: cgraph_node::get(func->decl)\n");
            return 0;
        }

        const std::string name(node->name());
        if (base_function.empty() && name.find(".resolver") != std::string::npos)
        {
            base_function = name.substr(0, name.find(".resolver"));
            fprintf(dump_file, "[BASE] Found: %s\n", base_function.c_str());
        }

        if (!is_base_function(name))
            return 0;

        // Record info for the first matching function
        if (gimple_bb_count_1 == 0 && gimple_stmt_count_1 == 0)
        {
            record_function_info(func);
        }
        **// Compare info for cloned functions
        else if (compare_function_info(func))
        {
            fprintf(dump_file, "[PRUNE]: %s\n", base_function.c_str());
        }
        else
        {
            fprintf(dump_file, "[NO PRUNE]: %s\n", base_function.c_str());
        }**

        return 0;
    }

} // anonymous namespace

gimple_opt_pass *make_pass_prune_clones(gcc::context *ctxt)
{
    return new pass_prune_clones(ctxt);
}

The pruning suggestion is bolded in unsigned int pass_prune_clones::execute.

Results

Setup

Setup instructions can be found here

Test Results

japablo@x86-001:~/spo600/examples/test-clone$ cat clone-test-x86-noprune-clone-test-core.c.228t.prune_clones
...
;; Function scale_samples.arch_x86_64_v3 (scale_samples.arch_x86_64_v3, funcdef_no=25, decl_uid=3985, cgraph_uid=30, symbol_order=28)

[DIFF] BB count: 40 vs 31
[NO PRUNE]: 

japablo@x86-001:~/spo600/examples/test-clone$ cat clone-test-x86-prune-clone-test-core.c.228t.prune_clones
...
;; Function scale_samples.popcnt (scale_samples.popcnt, funcdef_no=25, decl_uid=3985, cgraph_uid=30, symbol_order=28)

[PRUNE]: 
__attribute__((target ("popcnt"), target_clones ("default", "popcnt")))
void scale_samples.popcnt (int16_t * in, int16_t * out, int cnt, int volume)

The respective tests show which function can be safely pruned and which ones should not be pruned!

Reflection

The bulk of the work for this stage was done in part 2. This was easier since we have the maps that “fingerprint” each function variant.

Conclusion

Time to rest my mind for a bit. Talk soon!