Speeding up C++ string conversions

How to get sprintf speeds out C++ operator<<
And
use some sneaky C++ magic while doing it
By Lowell Boggs
download


In plain old C, string conversions are pretty fast and are accomplished in a small number of ways: C++provides all these and adds a far more flexible mechanism -- operator<< as applied to iostream objects.

iostream conversion operators have the advantage of flexibility and extensibility but they are not the fastest mechanisms in the world -- somes being 5 times slower than sprintf to do the same job.

Using stringstream to perform string conversions

A common way to perform in-memory string conversions (and doing so in the spirt of C++ instead of just calling one of the faster C routines mentioned above) is to use a stringstream as a buffer like this:
  #include <sstream>            
  #include <iomanip>            
				      
  using namespace std;                
				      
  ...                                 
				      
  stringstream buffer;                
				      
  int i = 20;                         
  buffer << setw(4) << i; 
				      
  string v = buffer.str();            
  //                                  
  // v now contains "  20"            
  //

This code is used in a lot of places -- such as BOOST's lexical_cast interfaces. It works fine but is 3 to 5 times slower than if you had done this:

  #include <stdio.h>
  
  ...
  
  int i = 20;
  char buffer[40];
  sprintf(buffer, "%4d", i);

The reason, of course, that C++ programmers prefer the first, slower, syntax is that C++ provides a wonderful mechanism for developers to overload the operator<< for classes of their own and thus you only need to learn the one syntax for all kinds of string conversions -- not two -- one for builtins and the other for user defined data structures.

A faster way to use stringstream

Luckily, however, it is possible to have the best of both worlds in some limited but useful situations. The underlying mechanism for this is to use a static variable of type stringstream and before you perform a string conversion, reset that varible to be empty. The reason that a static variable is preferable is simple: the constructor for a stringstream object is relatively slow due to the need get locale information properly initialized. Static variables only get initialized once during a program run, so this penalty must only be paid once.

The following code runs almost as fast as simply using sprintf:


  #include <sstream>            
  #include <iomanip>            
				      
  using namespace std;                
  static stringstream buffer; // moved up here 
				      
  ...                                 
				      
  // first use                        
  int i = 20;                         
  buffer << setw(4) << i << '\0'; 
				      
  string v = buffer.str().c_str();    
  //                                  
  // v now contains "  20"            
  //                                  
				      
  // second use                       
  string tom("tom");                  
  buffer.seekp(ios::beg); 
  buffer << setw(10) << tom; 
				      
  v = buffer.str().c_str();           
  //                                  
  // v now contains "       tom"      
  //

Basically, all you have to do is to seek the static buffer back to the beginning before performing a new conversion.

Note that the reason that we use the .c_str() method on the string returned by buffer.str() is that the seekp function does not reduce the total size of the string kept by the stringstream. We put the nul character in the buffer and let the conversion from char const* to string copy only the desired characters out to string 'v'.

However, in practice it is a little more complicated because:

A cleaner interface

Because of these complexities and because using multiple statements to perform a string conversion is undesirable, a function that encapsulates all this is desireable. For example:

  string tmp = Sformat(expression);

On the other hand, having the function take only 1 parameter seems like a bad idea because it would make it impossible to set the string format. So, what we really need is a family of function overloads that let you do things like this:

  string tmp;
  tmp = Sformat(expression);
  tmp = Sformat(hex, expression);
  tmp = Sformat("label", hex, expression);
  tmp = Sformat("Error ", hex, expression, ", at line 12");

Of course, Sformat needs to be a template because we don't know in advance what it's parameters are going to be. One possible implementation is as follows:

  template<class T, class U>            
  string Sformat(T const &t, U const &u)      
  {                                           
    buffer.seekp(ios::beg);                   
    // set the ios state back to defaults here
    buffer << t << u;       
    return buffer.str().c_str();              
  }                                           
Of course, each overload of the Sformat signature would have a different number of parameters and the implementation would only vary by the actual conversion:

  buffer << t;
  buffer << t  << u;
  buffer << t  << u  << v;
  ...

A More Complete Solution

A more complete solution might theoretically have a variety global variables and helper functions that let you do things like clean up memory associated with the string buffer in the event it got too large. For example, suppose some database's operator<< was used to convert the entire database into a string. We'd need to have some mechanism to occaisionally check for excessive buffer size and restore it back to the defaults.

In this case, a class which encapsulates all the parts discussed above would be preferable. However, it would not be desireable to have to use syntax like this:

  string tmp = Sformat::format(i);
Where "format" would be the class member that does the conversion.

Luckily there is an easy solution: classes can define automatic conversion operators and it isn't necessary to declare a named variable to use them. For example, consider this implementation of class Sformat:


  class Sformat                                                
  {                                                            
  public:        
							       
      operator char const *() const { return buffer_.str().c_str(); }
      operator std::string () const { return buffer_.str().c_str(); }
							       
      template<class T>                                        
      Sformat(T const &t)                                      
      {             
	reset();                                               
	buffer_ << t; 
      }                                                        
		    
      template<class T, class U>                                        
      Sformat(T const &t, U const &u)                                      
      {             
	reset();                                               
	buffer_ << t << u;
      }                                                        
  private:
      std::stringstream buffer_;                               
      bool firstTime_;                                         
							       
      void reset()                                             
      {             
	if(!firstTime_) buffer_.seepk(std::ios::beg);          
							       
	firstTime_ = false; 
		    
      }                                                        
  };                                                           
							       
  bool Sformat::firstTime_ = true;                             
  std::stringstream Sformat::buffer_;                          

This class can be used exactly like the Sformat function described earlier because C++ lets you have unnamed variables. Consider this code:
string tmp = Sformat(i);
This looks like you are calling function Sformat giving it the parameter i and having it return a string. However what is really going on is this:
  1. an unnamed variable of type Sformat is being constructed given parameter i.
  2. the compiler notices that you are trying to convert an Sformat object into a string object using the operator=. It finds that your unnamed Sformat object has a conversion operator for this and calls that function.

To clarify: the constructors for the Sformat class perform all needed conversions given their parameters and modify the static member buffer_ so that the constructed object is truly empty. The Sformat object exists only as a convenient way to call the conversion operator whose job is to return the value of the static buffer.

Here it is important to note that a single C++ statement should not call Sformat more than once unless you wrap the results in a string, like this:

string tmp = string(Sformat(i)) + string(Sformat(j)) + string(Sformat(k));

Eliminating the need for an object module

As described above, we have a class with static data members. Normally this requires that we have a .cpp file which can be compiled into an object module and linked with a program.

However, a slight change to the above class can eliminate the necessity of the object module -- giving us a single header file for the complete implementation.

The change is to convert the Sformat class definition into a template then let the linker automatically instantiate the static data members for us.

But, we don't really need the class to be a template -- nor would that help the interface. Yes, the constructors in the class need to be templates, but making the class as a whole a template won't help.

Luckily, a typedef combined with template specialization can give us the desired results with no obvious user impact (other than possibly more confusing error messages -- sigh, but that is the nature of templates). Consider the following definition:


  template<class T> class Sformatter;     
						
  typedef Sformatter<void> SFormat;       
    // here's the interface we'll actually use. 
    // Sformat is just a synonym for our new    
    // Sformatter class -- but it doesn't feel  
    // like a template when you use it...       
						
  template<>                                    
  class Sformatter<void>                        
  {                                             
    // same body as class Sformat described     
    // above except that the name is            
    // Sformatter not Sformat                   
  }                                             

  bool Sformatter<void>::firstTime_ = true;   
  std::stringstream Sformatter<void>::buffer_;

Getting it to compile with the Microsoft Compiler

The above logic works fine with the GNU compiler, and it should work with all compilers, but there is a bug in the Microsoft VC++ 7.0 compiler that prevents the Sformat() templatized constructors from accepting a small but important class of parameters: the io manipulators hex, dec, left, right, etc.

Unfortunately, the 7.0 compiler simply can't figure out what you mean when you pass a parameter list like this:

string tmp = Sformat(std::hex, 40);
This works fine with the GNU compiler and should work with the MS compiler but instead it gives you an error telling you that it does know how to select the proper signature for the std::hex object. This same problem occurs for all the parameterless io-manipulators. The parameterized manipulators, like setw(), setprecision() work fine.

The reason for the difference is that the parameterized manipulators return a data structure holding the parameter. This data structure is a nice simple data type. However, std::hex is not -- it is declared in file <ios> something like this:

ios_base &hex( ios_base & );
Who knows why this confuses Microsoft. However it presents a significant problem for the above-described relative attractive interface:

Basically, we want to be able to freely intersperse io manipulators with convertible values in a call to Sformatter's constructor. As describe above, the Sformatter constructors look basically like this:


  template<class T, class U, class V, class W>         
  Sformatter(T const &t, U const &u, V const &v, W const &w)    
  {           
    reset();                                                 
    stream << t << u << v << w ;     
  }                                                          
							     
  // remember: the actual conversion to string occurs in the nested    
  // cast operator!                                          

It turns out that even though the automatic specializer in the compiler gets confused over io-manipulators, the compiler doesn't get confused if we manually specify an overload for Sformatter() that contains explicit references to the io-manipulator data type:

  template<class T, class U, class V, class W>       
  Sformatter(T const &t, U const &u, V const &v, W const &w)  
  {

    reset();                                               
    stream << t << u << v << w ;

  }
  
  typedef ios_base (*Manipulator)(ios_base &);
template<class U, class V, class X> Sformatter(Manipulator m1, U const &u, V const &v, Manipulator m2, X const &x) { reset(); stream << m1 << u << v << m2 << x ; }
This makes the code compile but requires a significantly larger number of function signatures for the Sformatter constructor. For every possible combination of values and formatters that might go into a Sformat() call, you must have this combination specifically listed as an overload for the Sformatter constructor. Just typing them in becomes unwieldly for more than about 5 parameters to the constructor -- but at least it works.

Done!

And we are done -- just #include sformat.h in your program and use Sformat(stuff) like it was strtol but with the advantage its parameters can be any C++ class for which you have defined an operator<<.

See the top of this web page to download a small project directory containing the Sformat.h file and an example calling program -- see junk.cpp.