Come to the Dark Side, Luke -- its the only way 1

This page is devoted to the deepest, darkest, blackest magic in C++. Reading this page could be dangerous to your psyche -- you might well end up with your head turned around backwards. You might end up with an irresistable desire to expel large quantities of split pea soup.

I know I did.

Macros and Templates and Scripts -- Oh My! 2

For purposes of example, this page will show the implementation of a simplistic form of serialization (program data persistence) implemented using macros, templates, and static initialization. This particular implementation is not likely to be actually useful for reasons that I'll discuss below, but the techniques used in its implementation can be generally applicable whenever you need to do something really sneaky.

Here's a little trick our mother never taught you 3

Here are the big items discussed in the following sections: And, from earlier discussions, we now that we have used scripts to create text of the following form:
PERSISTENT_MEMBER(int, member_)
We have also said that the declaration of PERSISTENT_MEMBER results in an expansion something like this:
Persistable<int> member_;
Ok, so how do we modify the Persistable template so that it can know the name of the member which is being passed in as well as the offset to the member within the owning class?

It turns out that the answers to these seemingly similar questions are quite different: In the case of the class member variables, the offset to the beginning of the owning class can be computed when the member variable is constructed -- assuming that you somehow know the beginning of the structure -- all you have to do is to subtract the "this" pointer from the class member from the "this" pointer to the class itself.

But getting the class member name is not so easy. But how would the constructor to your template class object, Persistable<int> member_, know the class member name? Or what can we change about the definition of the Persistable object so that it can easily know the member name?

Without going into a long discussion of alternatives, here is the easy way:

A class can have static methods.

Classes can have nested classes within them.

Nested classes can define inline static methods.

Since we are using the macro, PERSISTABLE_MEMBER(type,name) to declare our class members we can have it do two things instead of one:

  1. we can have it declare the Persistable<T> object that pretends to be our data member but also builds other tables as needed
  2. we can have PERSISTABLE_MEMBER declare, for each member variable name, a corresponding struct which can then be passed as a template parameter to Persistable<T> -- and that struct can have a static member function that can be used to obtain the member variable name within the owning class

During the first construction of any Perisitable<T> members, the corresponding data can be built that can later be used to read/write the object to a stream.

Note that any object that is never constructed, doesn't really _need_ tables, now does it?

To summarize, we need the PERSISTENT_MEMBER(int,memberName_) macro to define two things:

And here is a macro that defines a struct with a static inline method that returns a character string which is a macro parameter:

#define PERSISTABLE_MEMBER(type, name) \
struct memberName_##name \
{ \
static char const *name() \
{ \
return #name; \
} \
}; \
Persistable<int, memberName_##name> name
Lets say, for sake of argument, that the Persistable template is defined like this:
template<class Type, class Namer>
struct Persistable
{
char const *memberName()
{
  return Namer::name();
}
};
This partial definition of Persistable at least shows how the PERSISTABLE_MEMBER macro can make the name of the class member available to the Persistable template.

Collecting Information About Class Members

The whole point of this discussion of wrapper classes and the passing of data to templates using static methods of classes which are parameters to the template is to let us automatically collect information about the members of classes which support serialization.

The class definitions in our imaginary program have been modified to look like this:

struct MyClass : public Persistent<MyClass> { struct memberName__member_ { static char const *name() { return "member_"; } }; Persistable<int, memberName__member_> member_; public: MyClass(int v) : member_(v) { } }; Of course, in practice, must of this complexity is hidden in macros.

Connecting the member to the class in the persistence tables

At this point, though, we really don't have any connectivity between the Persistable class and the MyClass object. That is, Persistable doesn't know the name "MyClass" by which to associate the information about member_ to it.

To solve this problem, we must pass MyClass as a template argument to Persistable. Then, using typeid on that template argument, we get get the name, "MyClass", and we can add a description of member_ to that class' serialization tables.

It would, however, be unfortunate if we had to pass MyClass as an explicit template argument because it would mean tranforming our original source text like this:

PERSISTENT_MEMBER(int, member_, MyClass)
Luckily, there is a short cut for this. We absolutely must have the Persistable class accept the name of the class as a parameter, but there is a way that we can trick the compiler into giving it to us without having to pass it to the PERSISTENT_MEMBER() macro.

Since MyClass is going to be derived from Persistent, we can make Persistent typedef MyClass to well known name, Self, that can be used by PERSISTENT_MEMBER() without having to know which Self we are talking about. For example:

template<class T> struct Persistent { typedef T Self; void serialize(Archive &amp;archive) { ... } }; And we modify Persistable so that it takes the owning class, MyClass, as a template parameter and uses it to get the actual text of the class name: template<class T, class Namer, class Owner> struct Persistable { static void describeMe() { string myTypeName = typeid(T).name(); string ownerTypeName = typeid(Owner).name(); cout << "Persistent " << myTypeName << " " << ownerTypeName << "::" << Namer::name() << endl; } }; Note that this works correctly even if the Owner class passed into the Persistable is
Persistent::Self

Of course, the Persistable class can do more than just print things out -- it can store the class name, member name, and the name of the member's data type in global variables associated with the class name. That is, a class' serialization tables could be a list of the members -- and we collect this information at static initialization time.

Getting the offset within the class to the member

We are now down to the final piece of wizardy -- the final piece of the serialization puzzle. That is, how does the Persistent<T>::serialize() method know where within MyClass the member_ lies? Unfortunately, the compiler can't be tricked into giving us the address of the member at compile time. We'll have to defer the collection of this information until the first time that the class members are constructed.

But runtime is a tricky thing -- a class might never be constructed, so we might never be able collect this all important information. On the other hand, if we never construct objects of MyClass, then we really don't need the information -- do we?

But even if we do hook into the constructor for member_ (ie put code in Persistable::Persistable()) we can only know the address of member_ -- not the address of the MyClass object of which this member is a part.

That's where the Persistent<MyClass>::Persistent() comes in -- it does know the address of the current MyClass object at the time the Persistable<int> object is being constructed. So, we can subtract the address of the MyClass object from the address of the Persistable object (member_). We just need to have Persistent::Persistent store its this pointer in a static variable which is available to Persistable::Persistable.

Of course, we'll only want to do all this during the construction of the very first MyClass object (and its members, of course).

The End is Near!

Lets review: We've been challenged to change 1,000,000 lines of code as quickly as possible such that we add serialization -- a completely new feature for this code base that could theoretically envolve hand editing thousands of classes.

Being both lazy and devious, we decide to summon our daemon (as opposed to summoning a demon).

Our magical friend has us do the following:

  1. Modify the class definitions in our code base using scripts to simplify most of the changes.
  2. Leave macros in the code which defines our program's classes and their members so that we can easily add new features without subsequent edit cycles.
  3. The changes that we make will have our program automatically collect information needed to perform serialization.
  4. We won't force the use of virtual methods on classes that might well exist in the millions instead, we'll use template base classes which are parameterized on our derived classes.
  5. We'll change the data type of our class members so that they pretend to be the current types, but the new types will use static initialization to collect information needed to read and write all classes in our product the serialization stream.
  6. We'll use template specialization of the classes we invent for these purpose such that builtin types like "int", and "string" can be serialized seemlessly using the same tools that we use on our high level application objects.

I think that this hard working genie deserves its freedom -- don't you?

Notes

1. Darth Programmer: Come to the dark side, Luke, it is the only way.

2. Dorothy the programmer: Macros and templates and scripts! Oh My!

3. Avatar the programmer: Here's a little trick our mother never taught you!

4. Ron W the programmer: There wasn't a programmer what went bad that didn't use macros!
5. Harry Marco, the programmer:

Wing feather, bat leather, hollow bone
gift of Icarus and Oberon
Dream of the Earthbound
Spin and Flow
Fledge and furl
Fold and GO!