Thursday, 30 October 2014

A walk through a cool code generation trick from Clang

[edit: fixed css]

If you don't know, Clang is an open source C, C++, Objective-C and Objective-C++ front-end for the LLVM compiler. Clang/LLVM contributors are probably some of the brightest software engineers you'll come across so there is always something to learn. Take a look at LLVM weekly to see the kinds of things people are working on. Just in the last week:

  • Improved MSVC/Golang/GCC support/compatibility
  • New intrinsics for the LLVM IR
  • A disassembler for the Hexagon backend
If you want to dig into the architecture of Clang/LLVM, you should read the LLVM chapter from "Architecture of Open Source Applications". In fact, both volumes of the book should be mandatory reading for anyone interested in software design and architecture.

The Clang Philosophy

5H126L1.png
Figure 1: The Clang Philosophy, summed up by Xzibit

When building Clang itself, you are actually using 3 code generators:
  • The C++ compiler you are using to compile Clang
  • The C++ preprocessor
  • A DSL-to-C++ code generator called tblgen - yes, there is
    literally a custom compiler used to build clang!
tblgen is run during the build process to generate (usually) C++ code. Although I'm not aware of it generating anything else, it could conceivably be used to do so. The purpose of the program is to help the developers maintain lists of domain-specific information. For example: allowable attributes on a function or types of nodes in the AST.

Here is the example we're going to look at: Attr.td. In particular, we'll follow the lines below into the final C++:

class MSInheritanceAttr : InheritableAttr;

def SingleInheritance : MSInheritanceAttr {
  let Spellings = [Keyword<"__single_inheritance">];
}

During the build, tblgen is given Attr.td which it parses into a AST and processes:

OS << "#ifndef LAST_MS_INHERITABLE_ATTR\n";
OS << "#define LAST_MS_INHERITABLE_ATTR(NAME)"
" MS_INHERITABLE_ATTR(NAME)\n";
OS << "#endif\n\n";

Record *InhClass = Records.getClass("InheritableAttr");
Record *InhParamClass = Records.getClass("InheritableParamAttr");
Record *MSInheritanceClass = Records.getClass("MSInheritanceAttr");
std::vector<Record*> Attrs = Records.getAllDerivedDefinitions("Attr"),
      NonInhAttrs, InhAttrs, InhParamAttrs, MSInhAttrs;
for (std::vector<Record*>::iterator i = Attrs.begin(), e = Attrs.end();
     i != e; ++i) {
  if (!(*i)->getValueAsBit("ASTNode"))
    continue;

  if ((*i)->isSubClassOf(InhParamClass))
    InhParamAttrs.push_back(*i);
  else if ((*i)->isSubClassOf(MSInheritanceClass))
    MSInhAttrs.push_back(*i);
  else if ((*i)->isSubClassOf(InhClass))
    InhAttrs.push_back(*i);
  else
    NonInhAttrs.push_back(*i);
}

EmitAttrList(OS, "INHERITABLE_PARAM_ATTR", InhParamAttrs);
EmitAttrList(OS, "MS_INHERITABLE_ATTR", MSInhAttrs);
EmitAttrList(OS, "INHERITABLE_ATTR", InhAttrs);
EmitAttrList(OS, "ATTR", NonInhAttrs);

Something you should probably take note of is how easy the code is to read and understand. This is the rule rather than the exception. In any case, the final result of tblgen (in this case) is a header file, AttrList.inc, part of which is reproduced here:

MS_INHERITABLE_ATTR(MultipleInheritance)
MS_INHERITABLE_ATTR(SingleInheritance)
MS_INHERITABLE_ATTR(UnspecifiedInheritance)
LAST_MS_INHERITABLE_ATTR(VirtualInheritance)

I know, it's not very impressive. But check out what I can do now (this probably doesn't actually work):

static const char * MsAttrs[] = {
#  define MS_INHERITABLE_ATTR(Name) #Name,
#  define LAST_MS_INHERITABLE_ATTR(Name) #Name
#    include "clang/Basic/AttrList.inc"
#  undef MS_INHERITABLE_ATTR
#  undef LAST_MS_INHERITABLE_ATTR
};

int main() {
  std::cout << "All MS attrs" << std::endl;
  for( auto attr : MsAttrs)
    std::cout << attr << std::endl;
}

By listing a bunch of domain-specific "records" in a manner that can be used by the preprocessor, I can now do whatever I like with those records! I've used this to great effect in my own code (from my JIRA time tracker). Here is a snippet of JiraTaskAttrs.inc which contains a list of attributes for a task:

JIRA_TASK_ATTR(Key)
JIRA_TASK_ATTR(Summary)
JIRA_TASK_ATTR(Description)
JIRA_TASK_ATTR(Type)
JIRA_TASK_ATTR(TypeIconUrl)
JIRA_TASK_ATTR(ProjectKey)
JIRA_TASK_ATTR(ProjectName)
JIRA_TASK_ATTR(ProjectImageUrl)
JIRA_TASK_ATTR(Priority)
JIRA_TASK_ATTR(PriorityIconUrl)
JIRA_TASK_ATTR(Status)
JIRA_TASK_ATTR(StatusIconUrl)
JIRA_TASK_ATTR(Resolution)
JIRA_TASK_ATTR(Security)
JIRA_TASK_ATTR(Assignee)
JIRA_TASK_ATTR(Reporter)
JIRA_TASK_ATTR(Labels)
JIRA_TASK_ATTR(AffectsVersions)
JIRA_TASK_ATTR(FixVersions)
JIRA_TASK_ATTR(Components)
JIRA_TASK_ATTR(TimeEstimated)
JIRA_TASK_ATTR(TimeSpent)
JIRA_TASK_ATTR(TimeRemaining)

Using this header file, I can generate enumerations:

enum TaskAttr{
# define JIRA_TASK_ATTR(Name,...) TaskAttr_##Name,
#   include "JiraTaskAttrs.inc"
# undef JIRA_TASK_ATTR
  NumTaskAttrs
};

Or function definitions:

#define IMPL(Enum)                                                      \
  QString                                                               \
  KJiraRepositoryType::TaskAttr_ ## Enum() const {                      \
    return toQString(model::Task::getAttrColumnName(JiraRepositoryType::TaskAttr_ ## Enum)); \
  }

#define JIRA_TASK_ATTR(Name,...) IMPL(Name)
#include "common/connections/jira/JiraTaskAttrs.inc"

Here is another example from the same project of how I use this trick to enumerate configuration values: Source, C++. I've used this in numerous other places, but this should give you an idea of the types of things that are possible using just the preprocessor and a little forethought.

Conclusion

A good way to become a better craftsman is to learn from others. Software developers are kind of lucky because we have come up sharing the tools of the trade. The open source Clang compiler offers the combined experience of many intelligent software engineers and as a result, there are many tips to be learned that can make you better at your own job. So start reading today!