Come to the Dark Side, Luke -- its the only way
1
This page is devoted to the deepest, darkest, blackest magic in C++. Reading
this page could be dangerous to your psyche -- you might well end up
with your head turned around backwards. You might end up with an irresistable desire to
expel large quantities of split pea soup.
I know I did.
Macros and Templates and Scripts -- Oh My!
2
For purposes of example, this page will show the implementation
of a simplistic form of serialization (program data persistence) implemented
using macros, templates, and static initialization. This
particular implementation is not likely to be actually useful for reasons
that I'll discuss below, but the techniques used in its implementation can
be generally applicable whenever you need to do something really sneaky.
Here's a little trick our mother never taught you
3
Here are the big items discussed in the following sections:
- using scripts to edit source code
- using template base classes which are parameterized on the current class (this is sometimes called the "Curiously Recursive Template Pattern")
- using static initialization in template classes to collect information about classes in your program
- using wrapper classes -- classes whose primary purpose is to pretend to be some other class -- but with a few extra features.
- using macros to declare class member variables
- passing data to templates without actually using variables or function parameters
- using partial template specialization to select just the right algorithm for
just he right kind of data.
Why would I want to do any of that?
These techniques will be presented in the context of the following programming challenge:Imagine if you will, that you are lazy programmer working on a project with about 1,000,000 lines of code. One day, your customer demands that your program be able to save its current state to a file for later retrieval. This customer just wants to be able to snapshot the current state of your program's execution because your product runs through a sequence of steps, sometimes running for hours to accomplish each step and the customer knows that the input data to some steps doesn't change between runs -- and wants not to have t pay the penalty for recomputing stuff he knows hasn't changed. That is, the requirement is to save the current program state to a file and to restore it from the file at a some later time.
Before discussing the sneaky solution, here's the normal way you pull this kind of thing off:Your boss says, "make it happen -- but don't change anything!" It's the "don't change anything part that gets you" because there are plenty of approaches for saving the current state of the program. For example, you are quite familiar with MFC archives, but they require major hand editing of the source code. Actually, your boss accepts that you do in fact actually have to change the source code, but he wants it kept to a minimum. Neither you nor the boss are in the mood to spend 6 months debugging a bunch of hand written code.
How Serialization Normally Works
Normally, to add the serialization feature to a class, that is, the ability to be read and written to a file, you would add a "serialize()" method to the class and have that method do all the saving and restoring. For example, suppose you have the following class:class SomeClass
To make this class serializable, you'd add a serialize method:
{
int member_;
public:
SomeClass() {}
};
class SomeClass
And you'd implement the serialize() method in some .CPP file like this:
: public SERIALIZABLE
{int member_;
public:SomeClass() {}
};
void serialize(ArchiveStream &s);
void SomeClass::serialize(ArchiveStream &s)
Because of schema migration issues, this is probably the best approach to serialization. However, your boss isn't asking for the best approach -- you are being asked to get this done ASAP with a minimal likelihood of mistake. And it will take a lot of hand editing to make this work -- although a good script guru will get you very close. But what about subsequent editing of the source code? Every time you add a new class member, you'll need to remember to edit the serialize() method and add it there.
{SERIALIZE(member_);
}But being lazy, you wonder if there is a way that requires less hand maintenance.
Editing with scripts
Ok, so you have access to a perl or sed guru, and she's pretty sure that she can use the etags program to find the definitions of all classes and member variables in your header files and can use that information to transform the "class Name" definitions and the "type memberName' definitions into something -- but what?Of course, you realize that 80% accuracy is about all you can expect from a script, so you prepare yourself for editing some fraction of 200,000 lines instead of all 1,000,000.
Without explaining why, at this point, let us just say that using scripts, we will change our normal class definitions, which look like this:
class SomeClass
Into this:
{int member_;
public:
SomeClass(int x)
};
: member_(x)
{}
4 class SomeClass
: public PERSISTENT_CLASS(SomeClass)
{PERSISTENT_MEMBER(int,member_);
public:
SomeClass(int x)
};
: member_(x)
{}
Using template base classes instead of virtual methods
A standard OOP technique, and one available in C++, is to derive your own classes from pre-existing base classes that give your new classes standardized behavior. Of course in this case, we are inventing new functionality, but we still want all our "serializable" classes to have standard features, so we want to create a new "serializable" base class to add these new features.We could of course use a virtual method of the base class to implement the serializable feature. On the othe hand, this increases the size of all serializable objects by the size of pointer. Most of the time this doesn't matter much, but in our program we are dealing with massive numbers of objects -- and every 8 bytes count. So we are going to use an approach that doesn't require virtual methods.
Note that in addition to a size savings, the template base class method allows us to add methods to the class which are specific to its type -- unlike in the case of virtual methods where only the base class type is actually known.
Instead, we are going to modify our existing code to make our program's classes derive from class persistent<T> -- where T is the class of interest:
class MyClass
: public persistent<MyClass>
{
...
};
Given that Persistent<MyClass> is a base class of MyClass, and that Persistent<MyClass> knows about the type of its descendent class, MyClass, it can have a member function, serialize(Archive &) that knows how to read and write MyClass objects to the stream.
How do you ask? How does this template base class know how to write a MyClass object without any virtuals? Because it will have access to tables describing the MyClass object. That is, the persistent<T>::serialize(Archive&) method will know how to find the table describing the members of MyClass. The tables will contain their names, their types, and their offsets within the MyClass objects.
These tables will be created automatically because of the way that the members of MyClass are declared. Consider the following:
Statically Creating Data Structures Describing Program Classes
In C++, static variables can have constructors -- and these constructors can do things beside initialize the variable. Generally this is a bad idea, but sometimes there is no other way to pull off something sneaky.Template classes and functions can also have static members. Using the
typeid
feature of C++, you can get the names of classes from template parameters. This means that if a template has a parameter of a type of interest, you can collect information about that type -- such as its name, sizeof, etc.Back to our programming problem: if we could easily change the declaration of the class members in our program in such a way that that their mere data type caused them to automatically create the tables used our template base class, persistent, then we are almost done -- but how?
Wrapper Classes with Static Initialization Behavior
At this point, we are looking for a simple transformation of our source text that will convert a normal class member into some other data type. This other data type must pretend to be exactly like the original data type. The member name must remain the same. The new data type will also do us the favor of collecting data about the type of the member and its offset within the class that owns it. This data will be stored in the tables that the Persistent<>::serialize() will use to write our class object to the stream.So basically, we want to change this:
int member_;
Into this:
Persistable<int> member_;
But for sake of minimizing the amount of re-editing with scripts that we fear that we might have to do, we'll use this instead:
4 PERSISTABLE_MEMBER(int, member_)
Where, PERSISTABLE_MEMBER is a macro that implements "Persistable<int> member_".
On the other hand, its going end up doing a lot more than that was we shall see.
Template Wrapper Classes
Before continuing, lets take a better look at wrapper classes.The general idea is to make "derived" object that pretends to be its base class in every way, but which has some extra features. This would seem to be easy -- you just derive from the desired class. However, C++ throws up a couple important roadblocks:
- you can't derive from atomic data objects: int, short, char, double, etc.
- you can't inherit constructors or assignment operators.
Bummer dude! This could be a real problem. Luckily, template member functions let you approximate this well enough to be useful for non-atomic objects. For example:
template<class T>
While this is not a perfect solution -- and no such thing exists -- this solution will handle most cases.
struct Wrapper
: public T // Wrapper<T> IS a T
{
Wrapper(): T() {}
};
template<class U> Wrapper(U u): T(u) {}
template<class U, class V> Wrapper(U u, V v): T(u, v) {}
template<class U, class V, class W> Wrapper(U u, V v, W w): T(u, v, w) {}
...
// about 10 parameters ought to be enough for anybody
template<class U> T &operator= (U u) { return ((T&)(*this)) = u; }
//
// Now, put any EXTRA features you need from the wrapper
//
Template Specialization
The definition of a wrapper class is fine for normal classes, but doesn't work for atomic types -- such as int, float, char, short, etc. It doesn't work because you can't inherit from an atomic data type. Plus, they've got lots of operator methods that have to be simulated by your wrapper.To work around these problems you will have to use template specialization for the builtin types. Instead of inheriting from the builtin types, the wrapper must be re-implemented to contain the atomic data type as a data member. It must also implement all the operators that you expect int, double, etc to have.
And here is the code for atomic classes -- at least for
int
:template<>
struct Wrapper<int>
// Wrapper<int> CONTAINS an int and PRETENDS to be a int
{
int t_;
};
operator int&() { return t_; }
operator const int&() const { return t_; }
int* operator&() { return &t_; }
int const* operator&() const { return &t_; }
Wrapper(): t_() {}
template<class U> Wrapper(U u): t_(u) {}
template<class U> int &operator= (U u) { return t_ = u; }
template<class U> int operator+(U const &u) { return t_ + u; }
template<class U> int& operator+=(U const &u) { return t_ += u; }
// duplicate for all operators
//
// Now, put any EXTRA features you need from the wrapper
//
Using template parameters to pass values
The following paragraphs discuss reasons for needing a template's parameter list to provide constant values to the algorithms in the template, and techniques for passing said data. The basic idea though, is that a template parameter can be the name of a class and the class can have a static method that returns the value of interest. Alternatively, a class can have an enumeration type that defines an integer constant value. But in the case of a non-integer value, consider the following method:struct SomeClassType
This class can be used as a parameter to a template like this:
{
static string someText() { return "some text"; }
};
templateName<PolicyClass>
struct UsingClass
{
void function()
};{
cout << "constant value is " << PolicyClass::someText() << endl;
}At this point in our imaginary design walkthrough, our classes which are going to support serialization are derived from a template base class which adds important members -- such as a function which can write our class to the serialization stream. But how? That class must either have a function to call or tables on which to operate in order to send our class to the stream.
As mentioned earlier, the standard approach to serialization is to have a function which is written specifically for the class of interest perform the reads and writes of the class to the stream. However, in this example, we are trying to automatically collect information that can be used to write our classes to the stream. That is, we are specifically trying to avoid having to hand-code the serialize() method.
To avoid this, we are making the assumptions that
- we can safely write all the class members of every class to the stream
- we can use a script to edit the source code
- to convert normal class member declarations into macro references which evaluate into
the
- declarations of template class object declarations which both pretend to be the data types of interest
- populate any tables that we might need
For each member variable, in a class, we need an algorithm for reading and writing that data to the serialization stream. We also need to know the offset within the class where the variable lives. If we make the assumption that our serialization stream is XML, then we also need to know the name of the class member.
We are thus left with two problems:
- How do we declare a class member such that it will automatically create table entries for a class
- How do we populate the table entry with all the right data
Here's what we are stuck with:
- We have 1 line of code or so describing each class member variable
- the constructors for the owning class need to look like this:
: memberName_(value)
- for each member variable, the algorithm for reading and writing the variable
- the offset with the owning class
- the name of the member variable
PERSISTENT_MEMBER(int, member_)We have also said that the declaration of PERSISTENT_MEMBER results in an expansion something like this:
Persistable<int> member_;Ok, so how do we modify the Persistable template so that it can know the name of the member which is being passed in as well as the offset to the member within the owning class?
It turns out that the answers to these seemingly similar questions are quite different: In the case of the class member variables, the offset to the beginning of the owning class can be computed when the member variable is constructed -- assuming that you somehow know the beginning of the structure -- all you have to do is to subtract the "this" pointer from the class member from the "this" pointer to the class itself.
But getting the class member name is not so easy. But how would the constructor to your template class object, Persistable<int> member_, know the class member name? Or what can we change about the definition of the Persistable object so that it can easily know the member name?
Without going into a long discussion of alternatives, here is the easy way:
A class can have static methods.Classes can have nested classes within them.
Nested classes can define inline static methods.
Since we are using the macro, PERSISTABLE_MEMBER(type,name) to declare our class members we can have it do two things instead of one:
- we can have it declare the Persistable<T> object that pretends to be our data member but also builds other tables as needed
- we can have PERSISTABLE_MEMBER declare, for each member variable name, a corresponding struct which can then be passed as a template parameter to Persistable<T> -- and that struct can have a static member function that can be used to obtain the member variable name within the owning class
During the first construction of any Perisitable<T> members, the corresponding data can be built that can later be used to read/write the object to a stream.
Note that any object that is never constructed, doesn't really _need_ tables, now does it?
To summarize, we need the PERSISTENT_MEMBER(int,memberName_) macro to define two things:
- a structure nested within the outer class whose only purpose is to provide a static inline method which will give you the name of the member
- a Persistable<T, namingClass> that defines memberName to be an int but which also collects streaming table info about memberName_.
And here is a macro that defines a struct with a static inline method that returns a character string which is a macro parameter:
#define PERSISTABLE_MEMBER(type, name) \Lets say, for sake of argument, that the Persistable template is defined like this:
struct memberName_##name \
{ \static char const *name() \}; \
{ \
return #name; \} \
Persistable<int, memberName_##name> name
template<class Type, class Namer>This partial definition of Persistable at least shows how the PERSISTABLE_MEMBER macro can make the name of the class member available to the Persistable template.
struct Persistable
{char const *memberName()};
{
return Namer::name();
}
Collecting Information About Class Members
The whole point of this discussion of wrapper classes and the passing of data to templates using static methods of classes which are parameters to the template is to let us automatically collect information about the members of classes which support serialization.The class definitions in our imaginary program have been modified to look like this:
Connecting the member to the class in the persistence tables
At this point, though, we really don't have any connectivity between the Persistable class and the MyClass object. That is, Persistable doesn't know the name "MyClass" by which to associate the information about member_ to it.
To solve this problem, we must pass MyClass as a template argument to Persistable. Then,
using typeid
on that template argument, we get get the name, "MyClass", and we
can add a description of member_ to that class' serialization tables.
It would, however, be unfortunate if we had to pass MyClass as an explicit template argument because it would mean tranforming our original source text like this:
PERSISTENT_MEMBER(int, member_, MyClass)Luckily, there is a short cut for this. We absolutely must have the Persistable class accept the name of the class as a parameter, but there is a way that we can trick the compiler into giving it to us without having to pass it to the PERSISTENT_MEMBER() macro.
Since MyClass is going to be derived from Persistent
Of course, the Persistable class can do more than just print things out -- it can store
the class name, member name, and the name of the member's data type in global variables
associated with the class name. That is, a class' serialization tables could be a list of
the members -- and we collect this information at static initialization time.
But runtime is a tricky thing -- a class might never be constructed, so we might never be able
collect this all important information. On the other hand, if we never construct objects of
MyClass, then we really don't need the information -- do we?
But even if we do hook into the constructor for member_ (ie put code in Persistable::Persistable())
we can only know the address of member_ -- not the address of the MyClass object of which this
member is a part.
That's where the Persistent<MyClass>::Persistent() comes in -- it does know the
address of the current MyClass object at the time the Persistable<int> object is being
constructed. So, we can subtract the address of the MyClass object from the address of the
Persistable object (member_). We just need to have Persistent::Persistent store its
Of course, we'll only want to do all this during the construction of the very first MyClass
object (and its members, of course).
Being both lazy and devious, we decide to summon our
daemon
(as opposed to summoning a demon).
Our magical friend has us do the following:
I think that this hard working genie deserves its freedom -- don't you?
2. Dorothy the programmer:
Macros and templates and scripts! Oh My!
3. Avatar the programmer:
Here's a little trick our mother never taught you!
4. Ron W the programmer:
There wasn't a programmer what went bad that didn't use macros!
Persistent
Getting the offset within the class to the member
We are now down to the final piece of wizardy -- the final piece of the serialization puzzle.
That is, how does the Persistent<T>::serialize() method know where within MyClass the
member_ lies? Unfortunately, the compiler can't be tricked into giving us the address of the
member at compile time. We'll have to defer the collection of this information until the
first time that the class members are constructed.
this
pointer in a static variable which is available to Persistable::Persistable.
The End is Near!
Lets review:
We've been challenged to change 1,000,000 lines of code as quickly as possible
such that we add serialization -- a completely new feature for this code base that
could theoretically envolve hand editing thousands of classes.
Notes
1. Darth Programmer:
Come to the dark side, Luke, it is the only way.
5.
Harry Marco, the programmer:
Wing feather, bat leather, hollow bone
gift of Icarus and Oberon
Dream of the Earthbound
Spin and Flow
Fledge and furl
Fold and GO!