Specializing C++ Template Functions for Arrays, Containers, and Strings
When designing general purpose template functions that are meant to apply to arrays, strings, and class objects which are containers, there is considerably more effort involved than is generally assumed.The following language constructs have commonly understood meanings, uses, and expected behaviors that make it difficult to write generalized template functions:
- const-ness
- rvalues (function return values)
- character string literals
- character arrays
- wide character literal constants
- wide character string arrays
- arrays in general
The Nature of the Beast
There are actually several different kinds of problems which template function writers must solve. They are divided into the following two major categories: Example function definitions that overcome many of the problems can be found here.
The above difficulties are discussed in the following sections.
Language limitations
The C++ language has several basic limitations with regards to
the use of templates that programmers must work around:
- Function Call Lookup Rules
- lvalues and rvalues can't be distinguished
- Arrays and strings get confused
- Automatic Type Promotion limitations
Template function call resolution limitations
When invoking a function with a parameter, such as in the following example:function()
that might possibly apply. In C
there would be at most one definition of function()
, but
in C++, with function name overloading there could be many.
In order to make templates more generally applicable, many of the
normal lookup rules are relaxed. While usually helpful, this increases the
chance that multiple template functions will apply equally to a given
function call. This results in an ambiguity, and thus compile errors.
The programmer is responsible for tweaking the overloaded function
definitions to make it work -- or by adjusting the parameters to the
function to resolve the ambiguity.
lvalues and rvalues can't be distinguished
Technically, since there is no portable work around for it, the worst
problem is that the C++ language does not allow template function
specialization to tell the diference between an object of a given type
(an rvalue) and a reference to an object of that same type type (an lvalue)
-- though it can tell the the difference between a const &
and a normal object.
This means that you will get compile errors if you define and try to use the following templates:
On the other hand, calling this the "worst" problem is a bit excessive. It
is often the case that you do not in fact need to know the difference.
Typically, if you need to deal with rvalues, that is function return
values or compiler temporaries, a template that accepts a const&
parameter will suffice. These two function basic types of template functions
handle most of the non-string, non-array data types:
Arrays and strings get confused
The following templates apply to arrays of any type:There is no good solution to this problem. Either:
- accept character arrays being treated like they were strings
- or accept strings being treated as arrays (by leaving out the new definition).
- Or, add some kind of wrapper that must be inserted in the code to tell the compiler which function to use.
Helping the compiler with StringWrapper and ArrayWrapper
This is how BOOST_FOREACH solves the problem. To use the wrapper, you'd have to change the calling functions like this:
Note that the ArrayWrapper
class and the
StringWrapper
are very trivial -- they exist only
to superseed the type of the parameter in such a way that function
specialization will will not get confused. For example, the
wrappers and the specializations that deal with them might be
implemented something like this:
Automatic Data Type Promotions Don't Apply to Templates
The compiler does not use automatic promotions when instantiating templates like it does when performing normal function calls. This takes some getting used to when you start writing template functions.
For example, if you have a function defined like this:
int function( int x ) { return x; }This function will convert its parameter to an
int
from whatever it
actually is. For example:
long y = 14; int xvalue = function(y); // xvalue is 14 after this statement, but it is an INT 14!But, this will NOT happen if
function()
were written as a template.
such as the following:
function()
would not convert the actual function parameter to
some predefined data type,
but would, rather, attempt to create the specialization:
long func(long x) { return x; }Which might work just fine, but there are numerous cases where this failure to normalize to a standard type fails badly. For example, suppose you write your template like this:
template< class T> T function(T &t) { // is T a const or mutable type here? return t; }And further suppose that you invoke it the following two ways;
// with a mutable parameter char z = 8; int y = function(z); // with a const parameter const char q = 9; int r = function(q);Here, there are big differences in the two
functions()
:
- In the call to
function(z)
, the class T is mutable - But in the call to
function(q)
, the class T is const
The fact that type T is const in some situations can cause trouble for you if you write your template by declaring variables of type T. For example:
template <class T> T function(T &t) { T temporary = t; ++temporary; // won't compile if T is const! return temporary; }Here, the function will compile only for mutable T. If the function is on variable, q, as above where it is constant, then the function won't compile.
You can't reasonably argue that this is an obscure annoyance that you will rarely have to deal with. In any large program, your co-workers will have seen fit to use many different combinations of data types in many different ways, many of which are guaranteed to cause problems for a naively written template function.
Solutions to the const parameter problem
There are a couple of obvious solutions to the const parameter problem:- Write two functions -- one that takes only const parameters, and
the other that takes only mutable parms. For example:
template<class T>
void function(T const &t)
{
// const form of the function
....
}
template<class T>
void function(T &t)
{
// mutable form
....
}
- Use a template meta-programming trick to find the true type of the
parameters, without the constness, and declare any local variables using
this "base" type. The best way to do this to include the boost
metaprogramming header files -- or some other "class traits" detecting
template library, and write code that looks like this:
template<class T>
void function(T &t)
{
typedef typename removeCV<T>::type BaseType;
BaseType tmpVar;
...
++tmpVar;
....
}removeCV<T>
, defines a member type,type
, which is the same as T, except that it is neitherconst
, norvolatile
, even if T is.
Different Data Type Interfaces
Different kinds of data naturally have different interfaces. The most obvious problem in writing generic code is that arrays and strings have a profoundly different programmatic interface than most class type containers -- even though one can iterate over the elements of all these constructs. For example:- Arrays have a start address and a number of elements
- Containers have a
begin()
and anend()
- C style strings have a start address and are null terminated
- wide character strings are similar to C style strings but have a different number of bytes per character
- MFC containers have a
getLength()
method and STL containers have asize()
method.
char
or wchar_t
are not always meant as null terminated strings.
To deal with these differences, use a plethora for overloaded functions that
accept the different data types as a parameters. See this section
for examples.
Taking the annoyances into to account
To reiterate, if you write a template function that handles arrays, strings,
and containers, you will have to deal with a variety of special cases that
don't immediately pop into the minds of beginning template writers.
The following example program points out the major annoyances you will have to deal with, but the function bodies shown below don't mean that in fact you must deal with them exactly as shown. Think of the following as a list of the problems, not necessarily the right solutions.
Also, this list is a good reference for function signatures for the various special cases you will be eventually forced to deal with -- they aren't all obvious:
// function overloads needed for containers, arrays, and the // special cases of character and wide character nul terminated // strings. // // NOTE: there's a nasty gotcha with this list, see the next // paragraph. template Limitation with the above:something function(T const &) // the TRULY generic function with // constant parameters { ... // some implementation } template something function(T &) // the TRULY generic function with // mutable parameters { ... // some implementation } template void function(T (&)[N]) // specialization for an array of // mutable T of size N. // // Does not apply // to strings, see non-template overloads // of the function below. { ... // some implementation } template void function(T const (&)[N]) // specialization for an array of // const T of size N // // Does not apply // to strings, see non-template overloads // of the function below. { ... // some implementation } template void function(T (*)[N]) // specialization for pointer to an array of // mutable T of size N { ... // some implementation } template void function(T const (*)[N]) // specialization for pointer to an array of // const T of size N { ... // some implementation } void function(char const *p) // a non-template dealing character string // literals as well as const character // pointer variables, function return // values and char const arrays. { ... } void function(wchar_t const *p) // a non-template dealing wide character string // literals as well as const wide character // pointer variables, function return // values and wide char const arrays. { ... } If we want to treat character string literals as a special case, then we are out of luck for treating const arrays of characters as arrays using the same function name ("function"), without some kind of wrapper interface, see above.