An overview of GCC internals

Background

When Symas/COBOLworx decided to build a COBOL front end for the GCC Gnu Compiler Collection, Jim Lowden and Bob Dubner broke the effort up into two pieces. Jim would figure out how to parse the COBOL source code (e.g., ADD A TO B GIVING C) and Bob would figure out how to tell the GCC downstream processing that it needed to add A to B and put the result in C.

Bob’s job was complicated by not knowing, in the slightest, how to do that.

This document is a high-level overview of how we learned how to do that.

An Extreme Statement of the Problem

GCC is practically a universe. The oldest entry in the GIT repository is dated 1988, and the comment indicates that the initial entries were converted from older ones in another source control system. So, diving into it cold means that it is all deep end. It’s not obvious where to start.

GCC has extensive documentation on GCC internals. But if you’ll forgive the analogy, starting with the compiler internals document is a bit like being given the maintenance manual for an F-15 fighter plane, and being told to build and fly one.

Happily, in the case of COBOL for GCC, both Jim and Bob are long-time users of GCC. Thus, for us, this was more like opening up the access panels of an aircraft after piloting it for a long time, and finally learning what was going on in there.

Bob got started by quickly scanning through the GCC Internals documentation. Freshly infused with almost total confusion, he figured out how to build a -O0 -ggdb build of the compiler, and then started trapping through it to figure out what it was doing when compiling a simple C program.

GCC From Ten Thousand Meters

We learned that when GCC compiles a program from source code to produce an executable:

The directed cyclic graph is an abstract syntax tree known variously as a GENERIC tree or a GIMPLE tree. (And, yes, the word “tree” is seriously overused. You have to cope with it.)

Restatement of the Problem

So, it reduced down to the easier-said-than done “Generate the GENERIC tree”.

The One Bright Spot

The one bright spot for Dubner as he launched that effort was that he had the compiler and all of the source code.

He was able to figure out that inside the middle end processing, the routine that seemed to accept the GENERIC/GIMPLE tree for further processing was the gimplify_function_tree(tree fndecl) in gcc/gimplify.cc

It became clear that what he wanted was a complete dump of the GENERIC tree, so that he could see what GCC was doing when it compiled C or FORTRAN. There didn’t seem to be any such facility in GCC. So, the -dump-generic-nodes utility was developed. Refer there for the next part of the tale.