Writing Copy Constructors and Assignment Operators

We laudable zeal for reducing duplicate maintenance, some developers make the mistake of writing their class' copy constructors using the assignment operator instead of the other way around. This results in very subtle program bugs that often seem to appear out of no where. Sometimes the program throws an exception, and sometimes it just works wrong without any obvious reason why it should.

Bad Example

Consider the following code:


struct UtilityClass
{
    int *data_;        // initialize during construction

    UtilityClass()
    : data_(new int(0))
    {
    }

    ~UtilityClass()
    {
	delete data_;
    }


    UtilityClass(UtilityClass const &rhs)
    {
      operator=(rhs);  //BAD CODE -- dont' use operator= 
		       //            in the copy constructor!
    }

    UtilityClass &operator=(UtilityClass const &rhs)
    {
      if(!data_)  
	throw "What the heck!  Where's the data_ pointer?";

      *data_ = *rhs.data_;

      return *this;
    }

    void set(int v)   { *data_ = v; }
    void get() const  { return *data_; }


};

This is a bad example because the copy constructor is written in terms of operator= instead, perhaps, of the other way around. (And, importantly, because neither of the two methods properly constructs class.)

Of course it is not necessary to write one in terms of the other but it generally reduces maintenance if you do.

But wait! I've never seen this problem before!

Like many bugs, they only occur in certain circumstances. In most circumstances, the bug that the above code causes will not actually appear. Especially if older C++ coding style is used. If you use the modern, and highly recommended style, the problem is more likely to pop up.

Lets consider two classes that have members of type UtilityClass. One class does not demonstrate the bug and the other does:


  struct NonFailingExample
  {
      UtilityClass m_;  

      // old constructor syntax

      NonFailingExample(NonFailingExample const &rhs)
      {
	  m_.set(rhs.m_.get()); 

	  // relies on default construction
	  // of m_ -- which will not call
	  // the copy constructor and 
	  // will thus will initialize m_.data_
	  // properly before the set method is
	  // called.
      }

  };

  struct FailingExample
  {
      UtilityClass m_;  

      // new constructor syntax

      FailingExample(FailingExample const &rhs)
      : m_(rhs.m_)
      {
	  // CRASHES
	  //
	  // Here, the UtilityClass' copy constructor
	  // is invoked, not its default constructor
	  // and since data_ is uninitialized in the
	  // above example, the assignment operator
	  // causes the crash.

      }

  };

Steps towards a fix

An ugly technical fix for this problem would just make the copy constructor ensure construction before calling operator=, like this:


  UtilityClass(UtilityClass const &rhs)
  {
    data_ = new int;   // functional but ugly fix

    operator=(rhs);
  }

Interestingly, you can't make that fix in operator=. The reason is straightforward: assignment can only occur after construction and the assignment operator is allowed to assume that construction is complete.

Suppose you tried this instead:


  UtilityClass &operator=(UtilityClass const &rhs)
  {
    data_ = new int;   // BAD CODE, leaks horribly!

    *data_ = *rhs.data_;

    return *this;

  }

Every time you perform an assignment, with this bad code, you will get a new heap packet allocated for data_ -- thus leaking memory like a sieve.

But why can't you test the pointer for null and only allocate the memory if data_ is null? Well, uninitialized memory is unlikely to be null -- so you are unlikely to actually allocate the needed memory.

A cleaner solution

Here's a cleaner solution that both ensures programatic integrity and minimizes duplicate code:


struct UtilityClass
{
    ...

    UtilityClass(UtilityClass const &rhs)
    : data_(new int(*rhs_.data_))
    {
      // nothing left to do here
    }

    UtilityClass &operator=(UtilityClass const &rhs)
    {
	//
	// Leaves all the work to the copy constructor.
	//

	if(this != &rhs)
	{
	    // deconstruct myself

	    this->UtilityClass::~UtilityClass();  // NOTE:  this idea is wrongly disparaged by Herb Sutter
						  // See next section.
	    // reconstruct myself by copying
	    // from the right hand side.

	    new(this) UtilityClass(rhs);
	}

	return *this;
    }

    ...

};

Incorrect arguments against the above implementation

The complaint that it fundamentally doesn't work

Herb Sutter, and other writers complain bitterly against the implementation mentioned in the previous paragraph. The problem with their line of argument is that they give a slightly different example of code. Instead of destructing only the base class, using the following syntax:

this->UtilityClass::~UtilityClass(); // first implementation

They instead give an example that really is bad -- without considering the simple fix to their buggy code. Here's the buggy code they actually give:

this->~UtilityClass(); // second, BAD implementation

In the first place, this code won't even compile on many older compilers. Secondly, the difference between the two implementations is profound.

The first (and good) implementation destructs only the UtilityClass and the second (REALLY BAD) example, destructs the derived class parts of data structure -- assuming that the UtilityClass has a virtual destructor. But, again, the first implementation does not -- not even if the UtilityClass destructor is virtual.

Consider this BAD EXAMPLE;


struct BadUtilityClass
{
    ...

    BadUtilityClass(UtilityClass const &rhs)
    : data_(new int(*rhs_.data_))
    {
      // nothing left to do here
    }

    virtual ~BadUtilityClass()
    {
      delete data_;

      data_ = 0; // hopefully this is done but it
		 // still doesn't fix the problem
    }

    BadUtilityClass &operator=(UtilityClass const &rhs)
    {
	//
	// Leaves all the work to the copy constructor.
	//

	if(this != &rhs)
	{
	    // deconstruct myself

	    this->~BadUtilityClass();  
	      // BAD: destructs derived class objects 
				       
	    new(this) BadUtilityClass(rhs);
	      // BAD:: reconstructs derived class as
	      // ONLY the base class parts!
	}

	return *this;
    }

    ...

};

struct Derived: 
public BadUtilityClass
{
  int* myMember_;

  Derived()
  : myMember_( new int(4) )
  {
  }

  ~Derived()
  {
    delete myMember_;
    myMember = 0;
  }

  Derived &operator=(Derived const &rhs)
  {
    this->BadUtilityClass::operator=(rhs);

    *myMember_ = *rhs.myMember_;  // myMember is null here!
  }

}

Hopefully the comments, above explain, but here is a quick summary:

The BAD example of using destruct/reconstruct logic to implement operator new goes too far -- it destructs not only the base class parts but also the derived class parts -- which invalidates the logic of any derived class operator= that does NOT use this logic.
Derived classes could use this operator= logic themselves and not have this problem, but if they don't, they are hosed. For example, if derived class's operator= looked like this, there would be no problem:
```
  Derived &operator=(Derived const &rhs)
  {
    this->Derived::~Derived(); // destructs base and drived parsts

    new(this) Derived(rhs); // reconstructs base and derived parts
  }
```

The complaint that operator= is not the same as destruct/reconstruct

This argument goes like this: Look, assignment is ABSOLUTELY NOT the same as destroy followed by reconstruct!

Most of the time it is. In fact the containers often use something very similar in their resize logic. Look at how your vector class is implemented when it comes to resizing a smaller vector to make it larger. They use the copy constructor to populate the uninitialized memory in the new heap packet -- they don't first default construct the newly allocated memory then call operator=, they copy construct it into place.

If your object can fit into an STL container, then you CAN use this logic (assuming that you destruct only the base class parts in your operator=).

A reasonable (if not compelling) argument against doing this!

Ok, not all their arguments are bogus. When you destruct an object then try to reconstruct it, you run the risk that it will throw an exception. If you writing high reliability code, you might want to seriously consider not using this approach -- most people aren't writing such code, but those who do might think twice -- despite the maintenance cost advantages.

That having been said, there are probably work arounds: After you destruct self, initialze self to all zeros -- that is likely to mean that if the copy constructor you use to reconstruct from the right hand side throws an exception, the pointers *this are safe to destruct even if they aren't initialized (because they are all zeros). You can then to catch and ignore the exception thrown in the copy constructor -- or rethrow it and let the caller deal.

Another, possible show stopper for using this algorithm is if your derived class does something stupid. For example, if your derived class uses pointers into the base class data structures like this:


struct BaseClass
{
  std::set data_;

  BaseClass() {}

  BaseClass &operator=(BaseClass const &rhs) { above good implementation }

  std::string const *find(std::string const &key)
  {
    return &*data_->find(key); // skip error checks for this example
  }
};

struct Derived
: public BaseClass
{
  std::string *specialValue;

  int otherData_;


  Derived()
  {
    specialValue = find("someValue"); 
      // BAD derived data dependent on base class layout  

    otherData_ = 10;
  }

  // Bad Operator=!

  Derived &operator=(Derived const &rhs)
  {
    this->BaseClass::operator=(rhs); // destructs base class and SPECIAL VALUE

    otherData_ = rhs.otherData_;  // this part works ok

  }

}

This example absolutely won't work because calling operator= destructs the base class part and thus invalidated the "specialData" pointer.

But ask yourself: what was supposed to happen? If you keep a pointer into base class data, what guarantee that operator= on the base class won't delete that data? Its a mutating member function after all. Any non-const member of a class can invalidate its internal layout. Despite the fact that people write this code all the time, it is still dangerous -- and not a worthy candidate for a complaint against a programming technique.

Performance Considerations

The above design is good for maintenance because it eliminates the maintance of operator= completely. Almost the exact same code can be used for every class -- only the name of the destructor changes in the operator= implementation.

Performance wise, this is an OK but not great solution. In this approach, operator= is guaranteed to destroy self and reconstruct from scratch. For objects with array members, this likely means 2 or more passes over the assigned data. A hand written operator= might reduce this to a single pass.

However, for most classes, this approach reduces maintenance and performs reasonably well.