Specializing C++ Template Functions for Arrays, Containers, and Strings

When designing general purpose template functions that are meant to apply to arrays, strings, and class objects which are containers, there is considerably more effort involved than is generally assumed.

The following language constructs have commonly understood meanings, uses, and expected behaviors that make it difficult to write generalized template functions:

The Nature of the Beast

There are actually several different kinds of problems which template function writers must solve. They are divided into the following two major categories: Example function definitions that overcome many of the problems can be found here.

The above difficulties are discussed in the following sections.

Language limitations

The C++ language has several basic limitations with regards to the use of templates that programmers must work around:

Template function call resolution limitations

When invoking a function with a parameter, such as in the following example: SomeType t; function(t); C++ uses fairly complex rules for selecting among the alternative definitions of function() that might possibly apply. In C there would be at most one definition of function(), but in C++, with function name overloading there could be many.

In order to make templates more generally applicable, many of the normal lookup rules are relaxed. While usually helpful, this increases the chance that multiple template functions will apply equally to a given function call. This results in an ambiguity, and thus compile errors. The programmer is responsible for tweaking the overloaded function definitions to make it work -- or by adjusting the parameters to the function to resolve the ambiguity.

lvalues and rvalues can't be distinguished

Technically, since there is no portable work around for it, the worst problem is that the C++ language does not allow template function specialization to tell the diference between an object of a given type (an rvalue) and a reference to an object of that same type type (an lvalue) -- though it can tell the the difference between a const & and a normal object.

This means that you will get compile errors if you define and try to use the following templates:

// // bad example, won't compile: causes ambiguities // template<class T> void function( T t) { } template<class T> void function( T &t) { }

On the other hand, calling this the "worst" problem is a bit excessive. It is often the case that you do not in fact need to know the difference. Typically, if you need to deal with rvalues, that is function return values or compiler temporaries, a template that accepts a const& parameter will suffice. These two function basic types of template functions handle most of the non-string, non-array data types:

template<class T> void function( T &t) { } // mutable variables template<class T> void function( T const &t) { } // const's and rvalues As a side note, the BOOST_FOREACH() macro implementation does have a way of figuring out if its container parameter is a function return value. It uses a technique called an "rvalue probe". Unfortunately, its implementation differs from compiler to compiler. See the Eric Niebler's article on the subject for details.

Arrays and strings get confused

The following templates apply to arrays of any type: template<class T, int N> void function( T (&t)[N] ) { } template<class T, int N> void function( T const (&t)[N] ) { } In absence of any other overloads of "function()", the above two signatures would apply to this code: char x[40]; function(x); function("stuff"); However, it treating character constants as a special case is fairly common. To do that, you att the following to the mix: void function( char const * ) { } // rvalues and lvalues However, this definition catches all constant character arrays as a C-style string: char const bill[2]="23"; function(bill); // bill used to be an array, but now is a pointer // with this new definition. Unfortunately, adding a & to the definition doesn't fix the problem: void function( char const *& ) { } // lvalues ONLY This definition won't accept character constants -- it will only get const character variables.

There is no good solution to this problem. Either:

Helping the compiler with StringWrapper and ArrayWrapper
This is how BOOST_FOREACH solves the problem. To use the wrapper, you'd have to change the calling functions like this: char const yc[2]="ab"; function( ArrayWrapper(yc) ); function( ArrayWrapper("text") ); function( StringWrapper(yc ); function( StringWrapper("something") );

Note that the ArrayWrapper class and the StringWrapper are very trivial -- they exist only to superseed the type of the parameter in such a way that function specialization will will not get confused. For example, the wrappers and the specializations that deal with them might be implemented something like this:

struct ArrayWrapper { char const *address_; size_t length_; template<int N> ArrayWrapper(char const (&array)[N]) { address = &array[0]; length_ = N; } }; struct StringWrapper { char const *address_; size_t length_; ArrayWrapper(char const *p) { address_ = p; } }; void function(ArrayWrapper const &array) { /* use array.address_ and array.length_ */ } void function(StringWrapper const &string) { /* use only array.address */ }

Automatic Data Type Promotions Don't Apply to Templates

The compiler does not use automatic promotions when instantiating templates like it does when performing normal function calls. This takes some getting used to when you start writing template functions.

For example, if you have a function defined like this:

   int function( int x )
   {
     return x;
   }
This function will convert its parameter to an int from whatever it actually is. For example:

   long y = 14;

   int xvalue = function(y);  
                // xvalue is 14 after this statement, but it is an INT 14!

But, this will NOT happen if function() were written as a template. such as the following: template<class T> T function(T x) { return x; } In this case, function() would not convert the actual function parameter to some predefined data type, but would, rather, attempt to create the specialization:
   long func(long x)
   {
      return x;
   }
Which might work just fine, but there are numerous cases where this failure to normalize to a standard type fails badly. For example, suppose you write your template like this:
   template< class T>
   T function(T &t)
   {
      // is T a const or mutable type here?

      return t;
   }
And further suppose that you invoke it the following two ways;

   // with a mutable parameter

   char z = 8;
   int  y = function(z);


   // with a const parameter

   const char q = 9;
   int r = function(q);
Here, there are big differences in the two functions():

The fact that type T is const in some situations can cause trouble for you if you write your template by declaring variables of type T. For example:

   template <class T>
   T function(T &t)
   {
      T temporary = t;

      ++temporary;      // won't compile if T is const!

      return temporary;
   }
Here, the function will compile only for mutable T. If the function is on variable, q, as above where it is constant, then the function won't compile.

You can't reasonably argue that this is an obscure annoyance that you will rarely have to deal with. In any large program, your co-workers will have seen fit to use many different combinations of data types in many different ways, many of which are guaranteed to cause problems for a naively written template function.

Solutions to the const parameter problem
There are a couple of obvious solutions to the const parameter problem:

Different Data Type Interfaces

Different kinds of data naturally have different interfaces. The most obvious problem in writing generic code is that arrays and strings have a profoundly different programmatic interface than most class type containers -- even though one can iterate over the elements of all these constructs. For example: To make matters worse, arrays of char or wchar_t are not always meant as null terminated strings.

To deal with these differences, use a plethora for overloaded functions that accept the different data types as a parameters. See this section for examples.

Taking the annoyances into to account

To reiterate, if you write a template function that handles arrays, strings, and containers, you will have to deal with a variety of special cases that don't immediately pop into the minds of beginning template writers.

The following example program points out the major annoyances you will have to deal with, but the function bodies shown below don't mean that in fact you must deal with them exactly as shown. Think of the following as a list of the problems, not necessarily the right solutions.

Also, this list is a good reference for function signatures for the various special cases you will be eventually forced to deal with -- they aren't all obvious:

// function overloads needed for containers, arrays, and the // special cases of character and wide character nul terminated // strings. // // NOTE: there's a nasty gotcha with this list, see the next // paragraph. template<class T> something<T> function(T const &) // the TRULY generic function with // constant parameters { ... // some implementation } template<class T> something<T> function(T &) // the TRULY generic function with // mutable parameters { ... // some implementation } template<class T, unsigned N> void function(T (&)[N]) // specialization for an array of // mutable T of size N. // // Does not apply // to strings, see non-template overloads // of the function below. { ... // some implementation } template<class T, unsigned N> void function(T const (&)[N]) // specialization for an array of // const T of size N // // Does not apply // to strings, see non-template overloads // of the function below. { ... // some implementation } template<class T, unsigned N> void function(T (*)[N]) // specialization for pointer to an array of // mutable T of size N { ... // some implementation } template<class T, unsigned N> void function(T const (*)[N]) // specialization for pointer to an array of // const T of size N { ... // some implementation } void function(char const *p) // a non-template dealing character string // literals as well as const character // pointer variables, function return // values and char const arrays. { ... } void function(wchar_t const *p) // a non-template dealing wide character string // literals as well as const wide character // pointer variables, function return // values and wide char const arrays. { ... } Limitation with the above:

If we want to treat character string literals as a special case, then we are out of luck for treating const arrays of characters as arrays using the same function name ("function"), without some kind of wrapper interface, see above.