Named Array Elements

The Goal

Have you ever wanted to have your cake and eat it too? Here's a way to have an array appear to be both an array and collection of named variables -- without having to use the array operator or to use function call operators.

That is, this page describes a technique lets you have the following interfaces to a class?

array index operator
named class members that you don't have to call like functions

For exaple, your class could be accessed like this:

Class variable;

variable[4] = 30;

variable.X = 9;

Where variable.X and variable[4] refer to the same memory location.

To be clear, then, this page describes a technique whose sole purpose is the improvement of the aesthetics of a class' interface without any increase in functionality -- but with little or no performance cost.

One might ask, "Why would I want such a thing?" The answer typically runs like this: Suppose that you originally wrote some code using named variables then decide later that you can perform some operations more effectively if these variables were part of an array. In this case, you'ld be faced with editing a lot of existing code and changing from references of the form "object.VariableName" to "object[VariableIndex]". While this works fine, some people object on aesthetic grounds. For example, if you were dealing with cartesian coordinates -- X, Y, Z -- you might object to writing "object[X_index]" instead of "object.X".

Alternatives and their issues

Before describing the technique, lets look at some of the less than aesthetically appealing alternative ways to approximate the naming of array locations:

First, you might try this:

#define array index names and use the array operator

For example:

class C { int array[3]; int &operator[] (int i) { return array[i]; } #define X 0 #define Y 1 #define Z 2 }; ... C variable; for(i=0; i < 3; variable[i++] = 9); cout << variable[X] << endl; cout << variable[Y] << endl; cout << variable[Z] << endl;

This works fine, of course, but has a couple of negatives:

#define's pollute the global name space
The array index operators are ugly when used a lot in expressions. Suppose you're naming the array locations so that you can write equations like this:
Q = variable[X] + variable[Y] * variable[Z] + sqrt(variable[X]);
Woudln't it be preferable to write this instead?
Q = variable.X + variable.Y * variable.Z + sqrt(variable.X);

Alternatively, you could do this:

Use class scope constants and array indices

For example:

class C { int array[3]; int &operator[] (int i) { return array[i]; } enum constants { X, Y, Z }; }; ... C variable; for(i=0; i < 3; variable[i++] = 9); cout << variable[C::X] << endl; cout << variable[C::Y] << endl; cout << variable[C::Z] << endl;

This works fine, of course, and you no longer have the problem of polluting the global namespace, but you still have the following negatives:

You have to use rather bulky syntax to specify the desired locations
You're still having to use the array index operators to get to the desired locations

So, you might try this:

Use member functions instead of the array index operators

A common solution looks like this:

class C { int array[3]; int &operator[] (int i) { return array[i]; } int &X() { return array[0]; } int &Y() { return array[1]; } int &Z() { return array[2]; } }; ... C variable; for(i=0; i < 3; variable[i++] = 9); cout << variable.X() << endl; cout << variable.Y() << endl; cout << variable.Z() << endl;

This works fine, of course, and most of the problems are gone -- you are just stuck with using () after the member names.

So you might think of doing this:

Eliminating ()'s using reference members

A way to get rid of the ()'s is to add extra member variables to the class member:

class C { int array[3]; int &operator[] (int i) { return array[i]; } int &X; int &Y; int &Z; C() : X(array[0]), Y(array[1], Z(array[2]){} }; ... C variable; for(i=0; i < 3; variable[i++] = 9); cout << variable.X << endl; cout << variable.Y << endl; cout << variable.Z << endl;

And while this does work, it doubles the size of the class -- references are implemented as pointers (under the covers).

So this might seem preferable:

Overload the array index operator with if statements

Another way to be able to use a collection of named variables as if they were an array, is to overload the class' array index operator such that the members correspond to array locations -- using if statements based on the array index:

class C { int X; int Y; int Z; int &operator[] (int i) { if (i == 0) return X; else if(i == 1) return Y; else if(i == 2) return Z; throw InvalidArrayIndex(__FILE__, __LINE__); }; ... C variable; for(i=0; i < 3; variable[i++] = 9); cout << variable.X << endl; cout << variable.Y << endl; cout << variable.Z << endl;

While this technique does work some uses of the array index operator, it does not work if you have some existing library of routines that the elements of an array are really contiguous. That is, it assumes that the array index operator really refers to a real array.

You really can't trust the above code to work consistently across platforms if you write code like this:

extern void function(int *first, int *last);

C object;

function(&object[0], &object[0]+3);

In the definition of class C, above, there's really no guarantee that variables X, Y, and Z are exactly 1 integer apart in memory. Likely they are, but there is no guarantee.

Now if you are using the Microsoft compiler, there actually is a pretty good but non-standard solution:

A Microsoft only alternative

The Microsoft Visual C++ compiler has a language extension that can be used to accomplish the target interface -- including the contiguity of memory locations.

The standard C++ language allows for unnamed unions. VC++ also allows for unnamed structs. Members of an unnamed struct or union can be addressed directly without the use of the "." syntax. for example:

class C { public: union // unnamed { int array[3] ; struct // unnamed struct: NONSTANDARD! { int X; int Y; int Z; } } int operator[] (int i) { return array[i]; } };

Because the union and the struct are unnamed (and this only works on Microsoft's compiler) you can write code like this:

Class c; c[1] = 1; cout << c.Y; // will print 1

And there's really nothing wrong with this -- except that it isn't portable. For that, you'll have to try something like this:

A Portable Solution

The portable solution to this problem eliminates both the wasted space and the in-elegant interfaces and can be used on all C++ compilers. It involves unnamed unions and templates.

But How?

The basic idea is this: objects in a union are guaranteed by the language standard to all begin at the same memory address -- even if they are of different types. Unnamed unions do not add any intervening scopes, so the names of all the union members appear at the outer scope level -- even though they are of different types. Consider this class definition:

struct Owner { union { int array[5]; float Bob; }; };

In this case, struct Owner has to members -- both beginning at the same address in member. Their names are:

Owner::array
Owner::Bob

Most classes, and in particular class "int" and class "float" have no knowledge of where the variables lie. However, it is possible to write a high level class object which makes the assumption that its 'this' pointer happens to lie at the exact same address of some other object -- and take advantage of that fact. If one does that, of course, the class becomes very easy to mis-use -- but that is what comments are for.

So then, the trick to naming array elements is to create class objects which are assumed to align with the beginning of the array and to make these object act as if they were references to some particular index into that array. For example:

struct Owner { union { double array[3]; Loc0Ref X; // Same as array[0] Loc1Ref Y; // Same as array[1] Loc2Ref x; // Same as array[2] }; };

Assuming that there is in fact some valid implementations of types Loc0Ref, Loc1Ref, and Loc2Ref, then you could write code like this:

Owner obj; for(int i=0; i < 3; obj.array[i] = i); cout << obj.X << endl; // prints 0 cout << obj.Y << endl; // prints 1 cout << obj.Z << endl; // prints 2

Implementing the Location References classes

Here are the key features of a location reference class:

An object of the class pretends to be something else -- a double in the example above.
The object assumes that it's this pointer is aligned with an array of doubles (in this example).

To make a class object that pretends to be some other type, there are several operators that must be implemented:

operator double() lets object be converted to the desired type
operator = () lets the object be assigned from the desired type
operator & lets you get a pointer to the object as if were the desired type.
the comparison operators, the ! operator, and a conversion to bool

Note that constructors are not necessary or helpful because reference class objects will never be constructed directly because they will always appear in a union -- and objects with constructors cannot appear in unions!

Here is an example implementation of location reference class that pretends to be a double stored at location 4 in an array which begins at the same location as the this pointer for the class object:

struct DoubleLocation4Ref
  // Pretend to be a double stored at location 4
  // in an array which begins at 'this'
{

char trash; // needed to force minimal size of this object
operator double()
  // Lets us write: double y = object.Member;
{

// let a location reference be convertible to a double

double *array = reinterpret_cast<double*>(this);

return array[4];

}

double &operator= (double const &rhs)
  // Lets us write: object.Member = 4.0;
{

// allow assignment to location 4 from a double value.

double *variable = reinterpret_cast<double*>(this)[4];

*variable = rhs;

return *variable;

}

double *operator&()
  // Lets us write: double *p = &object.Member;
{

return reinterpret_cast<double*>(this) + 4;

}

bool operator< (const double &rhs) const
  // Lets us write: if(object.Member < 1.0)
{

double *array = reinterpret_cast<double*>(this);

double lhs = array[4];

return lhs < rhs;

}

...

};

Note that this class has only one data member -- and it is not actually used. This member is only included so that the class can have a non-zero size. Most compilers abhor the number 0 for a class size and will force it to be sizeof(int) if not otherwise specified.

Using templates to prevent code duplication

Obviously, we don't want to have to write a version of this class for every location and data type combination. So, we use templates -- specifically we make the index into the array be a template parameter as will be the data type being emulated by the Location Reference object. Here's the template implementation of the location class:

template<size_t Index, class T> struct NamedArrayElement { char trash; operator T&() // allows: double d = object.Member; { return ((T*)(this))[Index]; } T &operator=(T const &rhs) // allows: object.member = 1.0; { T &me = ((T*)(this))[Index]; me = rhs; return me; } T* operator&() // allows: double *p = &object.Member; { return &((T*)(this))[Index]; } bool operator<(T const &rhs) // allows: if(object.Member < 1.0) { return ((T*)(this))[Index] < rhs; } ... };

And of course, here's an example use. Notice that the NamedArrayElement objects appear in the same unnamed union as the array that they are overlaying.

struct ExampleClass { union { double m_array[5]; NamedArrayElement<0, double> fred; NamedArrayElement<1, double> bill; // order doesn't matter NamedArrayElement<2, double> susan; }; double &operator[] (int i) { return m_array[i]; } ExampleClass() { m_array[0]=10; // just some default values m_array[1]=11; m_array[2]=12; } };

One final note of confusion

Class objects with constructors can't be in unions

In C++, class objects with constructors can't appear in unions. Its ok to put class objects without constructors in them, though -- as we've seen above. This restriction means that this trick won't work with an array of strings -- or any other class object with a constructor.

Using pointers as a work around

However, if you are willing to implement your array of strings as a pointer to an array of strings, you can modify the above code to make that work. A class using this technique on strings would look something like this:

struct ExampleClass { union { std::string *stringArray; NamedIndirectElement<0, std::string> title; NamedIndirectElement<1, std::string> description; NamedIndirectElement<2, std::string> body; } string &operator[] (int i) { return (*stringArray)[i]; } ExampleClass() { stringArray = new std::string[3]; } ~ExampleClass() { delete[] stringArray; } };

Here, class NamedIndirectElement would not be assuming that its this pointer would be aligned with an array of strings, but rather with a pointer to an array of strings. Here's an example of such implementation:

template<size_t Index, class T> struct NamedIndirectElement { operator T&() { T** me = reinterpret_cast<T**>(this); return ((*me))[Index]; } ... };