The dump-generic-nodes facility

The Need to Print the GENERIC Graph

Elsewhere we describe how we came to realize that we needed some way of dumping GCC’s GENERIC/GENERIC graph from inside a GCC compilation. That was our only hope for understanding – or, perhaps more accurately, reverse-engineering – how existing front ends generated those GENERIC graphs. The goal, of course, was learning how to create our own GENERIC graphs from parsed COBOL. More complete understanding about building complex graphs would come in time, we hoped, but we needed something to get started.

By trapping through GCC while it compiled a simple C program we came to realize that gimplify_function_tree in gimplify.cc seemed to be a good place to look at the generated GENERIC graph that was about to be reduced and optimized. But there didn’t seem to be a way of visualizing the graph in a useful way.

There was some code for printing out information about individual nodes in print-tree.cc. But we needed something more comprehensive.

How We Did It.

It’s simple enough, in the sense that shuffling the four aces to the top of the deck is “simple” for somebody who spends years learning how to manipulate a deck of cards.

Starting with the hints given in print-tree.cc, we created the dump_generic_nodes() routine in the new source code module gcc/dump-generic-nodes.cc. We chose to have that routine launched via a new GCC command-line option -fdump-generic-nodes.

So, to incorporate the print_generic_nodes() routine, we

What it does

With all that in place and compiled into GCC, using the new switch is what you’ expect: gcc -fdump-generic-nodes hello.c

The resulting output, in addition to the usual a.out executable, consists of two files. Each file’s basename is the name of the function the specified GENERIC graph implemented.

It is believed that the minimally-sized C program that can possibly be compiled is

void main(){}

The two files generated with -fdump-generic-nodes are

main.nodes
main.nodes.html

The information in the two files is identical. The first is a pure ASCII text file; the second has the GENERIC nodes in a hyperlinked format so that each reference to another node is an in-document link to that node.

The fact that the main(){} program, which contains nothing and does nothing, nonetheless generates a function_decl node with fifty-five “children” gives a glimpse into the level of complexity involved in GENERIC graphs.

Note: The nomenclature is sloppy. A GENERIC or GENERIC graph – we use the terms interchangeably – is made up of GENERIC nodes. Each node is a pointer called a tree in the GCC source code. After a while it all blurs together, with the GENERIC graph represented as a tree of trees. Each tree is made up of a bunch of attribute/value pairs, and many of the values are themselves trees. You get used to it.

The Simplest Example

We put main(){} into the file minimal.c and compiled it with -fdump-generic-nodes. Looking at the resulting main.nodes text file shows us the first two nodes as generated by dump_generic_nodes() routine:

***********************************This is NodeNumber0
(0x7fc0b8616800) NodeNumber0
tree_code: function_decl
tree_code_class: tcc_declaration
base_flags: static public
type: NodeNumber1 function_type
name: NodeNumber48 identifier_node "main"
context: NodeNumber49 translation_unit_decl "minimal.c"
source_location: minimal.c:3:6
uid: 2738
initial(bindings): NodeNumber52 block
machine_mode: QI(15)
align: 8
warn_if_not_align: 0
pt_uid: 2738
raw_assembler_name: NodeNumber48 identifier_node "main"
visibility: default
result: NodeNumber53 result_decl
function(pointer): 0x7fc0b863c000
saved_tree(function_body): NodeNumber54 bind_expr
function_code: 0
function_flags: public no_instrument_function_entry_exit
***********************************This is NodeNumber1
(0x7fc0b861d348) NodeNumber1
tree_code: function_type
tree_code_class: tcc_type
machine_mode: QI(15)
type: NodeNumber2 void_type
address_space:0
size(in bits): NodeNumber23 uint128 8
size_unit(in bytes): NodeNumber12 uint64 1
uid: 973
precision: 0
contains_placeholder: 0
align: 8
warn_if_not_align: 0
alias_set_type: -1
canonical: NodeNumber45 function_type
main_variant: NodeNumber45 function_type
lang_1: NodeNumber47 tree_list

How does that happen?

The dump_generic_nodes() starts with the function_decl node and walks the graph doing a depth-first search. It marks each node as “seen” as it is encountered; the downward search terminates whenever it encounters a node that is already flagged as “seen”. The output of the search is a simple list of nodes in the order they were encountered.

Once that linear list is created it is scanned in order. The information in each node is formatted as seen above and placed in the output files. In the case of the .html file, each node has assigned to it an in-document target HTML tag; each reference to NodeNumberNNN has a hyperlink to the associated NodeNumberNNN tag.

This gives us a complete listing of all of the decoded GENERIC tags, although they are in no particular order. The development of the hyperlinked version was spurred by our getting tired of searching for “NodeNumberXYZ” in the original text file.

It’s that simple, although there is, as with swans and ducks, a lot of frantic paddling going on under the apparently placid surface.

A More Complex Sample

Just to give us more to look at in the nodes files, we have the bigger.c source code:

#include <stdio.h>
#include <stdlib.h>

int value111;
int value222;

int main(int argc, char **argv)
  {
  value111 = atoi(argv[1]);
  value222 = atoi(argv[2]);
  if(value111 > value222)
. printf("First is bigger\n");
  else
. printf("First is smaller\n");
  return 0;
  }

That program generates 7,183 nodes, most of which are due to the inclusion of stdio.h and stdlib.h.

The main.nodes.html shows all of those nodes in the hyperlinked output.

Bob Dubner was the author of both -fdump-generic-nodes and this discussion. Questions and comments can be directed to him at rdubner@symas.com.