SPO600 Project Stage 2, Part 1: “To Prune, or not to Prune”

Introduction Welcome back! This blog is a continuation from SPO600 Project Stage 1 - where a pass was created that counts the number of basic blocks and GIMPLE statements for a given function. Project Stage 2 will be the “meat and potatoes” of the project — where variations of a function are compared to determine whether they can be "pruned" or not. Compiling with gcc is not a one-step process. Instead, there are multiple optimization stages between the source code and the final executable. In each stage, the code can translated into a different form or cloned while remaining functionally the same — which raises the question: can these forms be safely “pruned”? Gameplan Background This problem is similar to LeetCode question 100. Same Tree where the tree can be traversed to identify similarities in their tree structure. However, GIMPLE representations differ in a number of ways: The control flow graphs contain n-ary tree structures rather than binary trees, where each basic block could have multiple statements and each statement containing different number of operands. Nodes represent a mix of GIMPLE operations (ASSIGN, CALL, COND, etc.). Luckily, GCC has APIs for traversing and analyzing these structures, to name few: gimple.h gimple-iterator.h gimple-walk.h gimple-pretty-print.h Steps “Pruning” a function can be determined by: Identifying all function variations created Compare each function variations via their GIMPLE representations Suggest to “prune” if the GIMPLE representations are similar Note: to be considered similar: Both variations have identical control flow graphs Both variations have identical number of operands Both variations have identical operands Conclusion Stay tuned! Next blog will explore step 1 of the plan: Identifying all function variations created.

Apr 6, 2025 - 00:14
 0
SPO600 Project Stage 2, Part 1: “To Prune, or not to Prune”

Introduction

Welcome back! This blog is a continuation from SPO600 Project Stage 1 - where a pass was created that counts the number of basic blocks and GIMPLE statements for a given function.

Project Stage 2 will be the “meat and potatoes” of the project — where variations of a function are compared to determine whether they can be "pruned" or not.

Compiling with gcc is not a one-step process. Instead, there are multiple optimization stages between the source code and the final executable. In each stage, the code can translated into a different form or cloned while remaining functionally the same — which raises the question: can these forms be safely “pruned”?

Gameplan

Background

This problem is similar to LeetCode question 100. Same Tree where the tree can be traversed to identify similarities in their tree structure. However, GIMPLE representations differ in a number of ways:

  1. The control flow graphs contain n-ary tree structures rather than binary trees, where each basic block could have multiple statements and each statement containing different number of operands.
  2. Nodes represent a mix of GIMPLE operations (ASSIGN, CALL, COND, etc.).

Luckily, GCC has APIs for traversing and analyzing these structures, to name few:

  • gimple.h
  • gimple-iterator.h
  • gimple-walk.h
  • gimple-pretty-print.h

Steps

“Pruning” a function can be determined by:

  1. Identifying all function variations created
  2. Compare each function variations via their GIMPLE representations
  3. Suggest to “prune” if the GIMPLE representations are similar
    1. Note: to be considered similar:
      1. Both variations have identical control flow graphs
      2. Both variations have identical number of operands
      3. Both variations have identical operands

Conclusion

Stay tuned! Next blog will explore step 1 of the plan: Identifying all function variations created.