Compose (Variadic Template Function)

2015-02-06

 

I have used a string composition function for quite some while. It is a reimplemented and simpler version of the String Composition Library.

std::cout << compose("Wrote %0 objects into %1.", objects.size(), filename) << std::endl;

"Wrote 548 objects into level1.jsn."

The problem with C++'s stream library is that is almost impossible to localize strings with the stream insertion operators. The core of the problem is that they chop up the sentence into fragments, such as:

std::cout << "Wrote " << objects.size() << " objects into " << filename << ":" << std::endl;

When you use something like gettext the strings that may be translated are:

"Wrote "
" objects into "
"."

Not only is it really difficult for translators to understand what is meant (they see the strings within hundred others), but it also does not allow them to reorder the sentence, as it may be necessary. For example in German I would reorder to to be:

"%0 Objekte wurden in %1."

So it makes sense to keep the string together. This is one of the reason why the printf family of functions still is so popular when it comes to localisation.

But in C++ there are other types and extending the printing mechanism is quite simple, just implement the stream insertion operator. For example:

struct Vector3
{
    float x, y, z;
};

std::ostream& operator << (std::ostream& os, Vector3 v3)
{
    os << "(" << v3.x << ", " << v3.y << ", " << v3.z << ")";
}

It is reasonable to want to print something like,

compose("Killed %0 at %1.", object.name, object.location);

"Killed Troll3 at (254.125, 0.025, 1.36)."

The compose function merges the extanability of the C++ stream API with the elegance of printf.

C++11 allowed me to simplify the compose function. By using variadic template function I was able to remove the hack that enabled the function to take an unspecified number of arguments.

The actual compose function is as follows:

template <typename ... Args>
std::string compose(const std::string& format, Args ... args)
{
    std::vector<std::string> sargs;
    unpack(sargs, args...);

    std::string result;

    for (unsigned int i = 0; i < format.size(); i++)
    {
        if (format[i] == '%')
        {
            if (i + 1 < format.size())
            {
                int idx = char_to_int(format[i + 1]);
                result += sargs.at(idx);
                i++;
            }
            else
            {
                throw std::logic_error("% at end of string.");
            }
        }
        else
        {
            result.push_back(format[i]);
        }
    }

    return result;
}

The unpack function write the arguments as strings into the sargs vector. The remainder writes, character wise, into the result. If it encounters a % sign, it will try to convert the next character into a number and then take the argument at the given index and put it into the result.

The unpack function implements the common idiom on how to access individual elements form a variadic template argument.

template <typename Arg>
void unpack(std::vector<std::string>& args, Arg value)
{
    args.push_back(to_string(value));
}

template <typename Arg, typename ... Args>
void unpack(std::vector<std::string>& args, Arg value, Args ... remainder)
{
    args.push_back(to_string(value));
    unpack(args, remainder...);
}

That is, it implements a lisp (car / cdr) like list iteration, where one function takes an argument and the remainder as a variadric arguments. Once the work for the first argument is done, it calls itself with the remainder. The loop is terminated by having a aliased function that only takes one argument.

This may sound like many function calls, but since the number of arguments are known at compile time, the compiler will have no trouble inlining the code.

The to_string function is a small helper that converts an arbitrary value to a string, using a string stream.

template <typename T>
std::string to_string(T value)
{
    std::stringstream buff;
    buff << value;
    return buff.str();
}

I have implemented two special cases, booleand and strings. Boolean values should be printed as 'true' and 'false' and strings need not be run through a string stream.

template <>
inline std::string to_string(bool value)
{
    return value ? "true" : "false";
}

template <>
inline std::string to_string(const std::string& value)
{
    return value;
}

The char_to_int function is a simple conversion function:

int char_to_int(char c)
{
    switch (c)
    {
        case '0':
            return 0;
        case '1':
            return 1;
        case '2':
            return 2;
        case '3':
            return 3;
        case '4':
            return 4;
        case '5':
            return 5;
        case '6':
            return 6;
        case '7':
            return 7;
        case '8':
            return 8;
        case '9':
            return 9;
        default:
            throw std::logic_error("not a number");
    }
}

The full implementation can be found in this gist.

Considerations

The current character wise push_back into the result string is quite inefficient. It may be a good idea to search for the next % symbol and copy the string chunk wise into a string stream. But so far the compose function was never piece of code that determined the runtime, so I never came about to optimize it. (The string stream may actually be slower, since it has a higher setup cost.)

Furthermore the function only takes ten arguments. With the new implementation the limit is that there are only 10 number characters. It could be easily extended to sixteen by including the character A to F, but I had never the need for more than ten arguments. (The most I ever used was probably 6.) It could be extended to take more characters as number, but that just unnecessarily complicates everything without the benefit.