BOOST_FOREACH Implementation Techniques
This web page documents some of the C++ programming tricks needed to implement the BOOST_FOREACH macro. The examples shown here are not always exactly what you see in the BOOST header because this page attempts to explain why things are done. The actual boost code often encapsulates the solution to more than one problem at a time which makes it hard to understand why the code works the way it does.This page brings the totality of these problems into view and the actual details can be found directly in the boost headers.
After I wrote this page, I discovered that most or all of this material was originally described in the C/C++ User's Journal. Here is a link to the article written by the principle author of the BOOST_FOREACH macro:
Eric Niebler's article
BOOST_FOREACH is an amazingly powerful and complicated macro that provides a seemingly simple service: It lets you iterate over the array or container and process all the items therein. Here are some example uses:
The macro accomplishes the above while also providing extremely good potential for compiler optimization. Despite having to declare loop variables and perform some fairly complicated function calls and cast operations, the g++ compiler, at least, is able to completely eliminate empty BOOST_FOREACH loops.
Difficulties Making This Work
To do all the above with a single macro is fairly complicated. It might seem simple since many other languages provide this as a builtin behavior and also because it is so easy to imagine writing such a macro, quite trivially, yourself.However, to accomplish all the things that BOOST_FOREACH can actually do, several temporary variables have to be created and these variables are of profoundly different types and syntaxes. For example, to iterate over an array, you might declare variables like this:
To iteratate over an stl std::vector, you could write code something like this:
- Iterators and pointers are of different types and sizes, and there is different syntax for getting the beginning and ending pointers for arrays and stl containers.
- In C++, to declare a variable, you must know its type (at least until the next release of the language becomes widely available and we can use the new features of the auto keyword). However, the type of the return value from a container's begin() function is not so easy to know in advance in general. Note that the BOOST_FOREACH macro does not require that the iterator type be passed in as a parameter.
- Loop variable declarations could easily foul up the namespace of the function you are in.
- The std::set and possibly other containers do not have normal iterators that return modifiable references to the containers contents. Alorithms that rely the normal behavior won't compile for sets.
- The BOOST_FOREACH macro does not allow you to violate the constness of contained objects. This adds complexity to the helper methods that the macro invokes.
- It is generally preferable for loop variables to be references to objects in the container rather than be copies of the object. Speed is the obvious reason. However, a normal for-loop won't let you do this because the loop variable is only initialized once, at the beginning of the loop, but we need the reference variable to refer to each element member in sequence.
How boost solves these problems
Eliminating Namespace Polution
The normal way that programmers avoid the pollution of namespaces is to use curly braces to limit the scope of variables:This is exactly what BOOST_FOREACH does -- it defines a template wrapper type, AnyType<T>, which will be used for these loop control variables. AnyType variables always evaluate to false in if statements because its operator bool() is overloaded to do so.
So the macro invocation BOOST_FOREACH(int x, array) expands to something very roughly like this:
Further, the functions named above, deref, notDone, and iterate, are all trivial inlines.
Declaring Variables Without Knowing The Type
The authors of BOOST_FOREACH worked very hard to avoid requiring that the iterator type be passed in as a parameter. Instead, a fairly sophisticated use of const reference variables is employed to avoid it. Luckily, the compiler is able to completely eliminate most of the reference variable arithmetic so that BOOST_FOREACH generates amazingly small assembly language when the -O2 optimization level is used.You can't currently declare variables without knowing the exact type thereof. In a future version of C++, this limitation will be removed -- it is ontrack for inclusion in the language but roll out through the industry will take a decade.
Because of this lack, the macro can't just declare iterators for the array or container. You'd like to be able to write this code:
Note: the value of begin is only valid inside the if-statement tree, you don't know for sure when the value will be destructed if you don't guarantee its scope by putting it in the if statement like this. Compilers very in their destruction logic except for this case.
The function deref, a template, is given the typeless reference to the iterator and the full type of the vector that the iterator points into (ie a reference to vec). This parameter provides the type information needed to cast the typeless reference to an iterator to a proper vector<int>::iterator reference. Here's a rough mocked up implementation of deref():
In the above, function getBegin() returns an object of type
IteratorWrapper< vector< int > >
Eliminating Unnecessary Function Calls
This is fine for arrays and vectors, but suppose the array being iterated over by the macro invocation is actually a function call with side effects. You wouldn't want to be calling that function several times during the body. Instead, a trick envolved in using the ternary operator is used to get the type of the array expression but not actually invoke it:- if array is declared like this: int array[2], then the above expression has the type int (*)[2].
- if the declaration is vector<int> array, then the expression evaluates to vector<int>*
Array Syntax Versus Container Syntax
To get a pointer to the first and last elements of a stl container, the following syntax is used:- deref
- getBegin()
- getEnd()
- notDone()
- iterate()
Allowing Reference Variables As Iterators
The macro allows you to write code like this:Handling iterators returning const references
The deref() function example, shown above, has an important flaw: its return type. As you recall from above:templateIn the above example, the deref() method, which lets you fetch the current iterator variable value, won't compile for std::set because, while std::set<T>::value_type is the proper way to refer to the data type in the set (just like it is for all other stl containers), the member type std::set<T>::iterator is really just a synonym for it's const_iterator.// container inline typename Cont::value_type & deref(AnyTypeBase const &iterBase, Cont &) { ... return *ref.value_; // won't compile for std::set }
This is important because operator*() on the set returns a const value_type &, not a mutable reference like most iterators. The deref function above won't compile because the statement, *ref.value_ is attempting to return a mutable &.
The solution to that problem is to make the return value from deref() as well as some of its temporary variables be either value_type & or const value_type & based on the return type of Cont::iterator::operator()
One way to figure out the constness of the return value from operator* is to use a trick based on the rule that template parameter substituion failure is not an error. Consider the following example:
Here's how the above template works:
- You need compile time constants in order to specialize other templates.
- Enums work fine for that
- The single enumeration value, value is defined to be either 0 or 1 based on the size of an expression. If the size is one byte, then value is 1, otherwise it is 0.
- The sizeof() expression gives you compile time constant values which indicate the size its parameter expression. Normally, we use it on structs, to get their size in bytes, but it also works on any old expression, including function calls -- in which case, the compiler returns the size of the object returned by the function -- without actually calling the function.
- The function being evaluated for return size is is_const_tester() which is a static member function inside the class. Note that this function does not actually exist anywhere -- we are interested in the return size it might have if it actually existed, but since we never really call it, then we don't need it to exist.
- is_const_tester() is passed a value which is computed by dereferencing a static iterator variable, as defined by the Container class. This variable is never instantiated either.
- Note that there are two definitions of is_const_tester, one takes a const reference parameter and the other takes a non-const reference. Based on the actual behavior of the iterator, the compiler will select either the one or the other. The two functions return different kinds of return values -- one returns a char, and the other returns an int. Based on what sizeof() tells us about the return value of Containter::iterator::operator*(), we choose a value for value.