Volker Simonis1
Wilhelm-Schickard-Institut für Informatik
Universität Tübingen, 72076 Tübingen, Germany
E-mail: simonis@informatik.uni-tuebingen.de
29 November 1999
The motivation for this article arose when I tried to implement Java Serialization in C++[Ser-Spec]. Because in C++ there's no runtime system which, like the Java Virtual Machine, can create new objects and set or query their data fields, this functionality has to be provided by the C++ classes themselves.
Therefore, in my library, every C++ class which wants to be serialized has to be derived from the abstract base class Serializable. Serializable declares a set of pure virtual functions for setting and querying the object's data fields. These functions must be implemented by all derived classes individually.
For a class List defined as follows:
class List : public Serializable { List *next; // ... }; |
void List::setValue(const string& var, Serializable *val) { if (var == "next") { next = dynamic_cast<List*>(val); } } |
This worked fine as long as a class contained only members that were also derived from Serializable. But what if it had primitive data members, or owned members of third party classes for which we had no source code? Since the function setValue() is virtual, we can't write something like:
template <class T> virtual void setValue(const string&, T*) = 0; |
virtual void setValue(const string&, Serializable*) = 0; virtual void setValue(const string&, int*) = 0; virtual void setValue(const string&, double*) = 0; ... |
The second solution is to use only a void* as data type, and cast the pointer according to the desired result. This would be the typical C approach, but now we can't use dynamic_cast<> anymore, and therefore we have no guarantee of the correctness of the cast2.
To solve the problems mentioned above, we created a new class called Value. An arbitrary variable v of type T can be assigned to an instance of Value, and thereafter the Value object itself can be assigned to any instance of type T, just as if it were v itself. If the caller tries to assign a Value object to a variable of a type other than T, it will throw an Incompatible_Type_Exception.
We first look at an example:
Value v1(999.999); Value v2; v2 = (string)"hello"; string s = v2; try { int i = v1; } catch (Incompatible_Type_Exception) {} |
With this in mind, we can now rewrite our setValue() function from section 1 in the following way:
virtual void setValue(const string& s, Value val) = 0; |
First of all, we define the exception class. Its only purpose is to be thrown when a Value object is assigned to an object of incompatible type. In this case, errorType will hold the type name of the object we were trying to assign to.
class Incompatible_Type_Exception { private: string errorType; public: Incompatible_Type_Exception(const string& s) { errorType = s; } string getError() const { return errorType; } }; |
class Value { private: enum Action { SET, GET }; template <class T> T& value(T t = T(), Action action = GET) throw (Incompatible_Type_Exception&); public: Value() {} // Default constructor template <class T> Value(const T&) { value(t, SET); } // Generic constructor template <class T> operator T() const throw (Incompatible_Type_Exception&) { return const_cast<Value*>(this)->template value<T>();// const_cast is safe } template <class T> T& operator=(const T &t) { return value(t, SET); } }; |
Furthermore, we can see that all public methods of the class contain only a call to the private method value() which is itself a template function. Also notice that Value has no data members; i.e. a Value object uses no memory. Unfortunately, since objects are known to the compiler only by address, even stateless objects such as these are given an address in memory. That's why even objects like this use at least one byte of memory (try sizeof()).
Now let's look at value(), since it seems to contain all the magic of our class:
template <class T> T& Value::value(T t, Action action) throw (Incompatible_Type_Exception&) { static map<Value*, T > values; switch(action) { case SET : { values[this] = t; return t; } case GET : { if (values.count(this)) return values[this]; else throw Incompatible_Type_Exception(typeid(T).name()); } } } |
Since we declared the conversion operator operator T() as a template function, we can assign a Value object to another object of any arbitrary type. The compiler will generate the right conversion function for us under the hood. The same applies to the parameterized assignment operator operator=(). It allows assignments of arbitrary data types to our value object. These two operator functions just call the values() function of the appropriate type, which queries the map of the right type. Notice that we need values() as an auxiliary function, since we must have only one map for each type. Since the Value class itself is not parameterized, the only place we can store this map is in a static variable in a parameterized function.
I will try to explain this in more detail now. Recall the code example from the beginning of section 2. After compiling and starting it, we will have a memory layout similar to the one shown in figure 1 (a). Notice that the compiler has created three static maps of the types map<Value*, double>, map<Value*, string> and map<Value*, int> respectively. The first one was created because v1 is initialized with the double value 999.999. This means the parametrized constructor of the Value class will be called with a double argument and will call the value() method for the type double. When the new value() function is instantiated, the static variable values, which in this case is of type map<Value*, double>, will also be instantiated.
The same description applies to the <Value*, string> map, since in line 3 of the code example we assign a string object to the Value variable v2. The line "int i = v1" finally leads to the creation of the third map, since it will call the int() operator of the Value class, which in turn will call the value() method with an int as the template argument.
Figure 1(b) shows the situation after entering the try block. The <Value*, double> map holds an entry which maps the address of v1 to the double value 999.999 and the <Value*, string> map holds an entry which associates v2 with the string object ``hello''. As mentioned earlier, the objects v1 and v2 haven't changed at all (in fact, they cannot change because they have no data fields).
All this shows that the assignment of any type T to a Value object will always succeed, since in fact it only executes :
case SET : { values[this] = t; return t; } |
case GET : { if (values.count(this)) return values[this]; else throw Incompatible_Type_Exception(typeid(T).name()); } |
The maps are declared as static local variables in the value() method because this way the compiler will generate all the necessary maps for us in the same way that it creates instantiations of all the necessary parameterized methods. Also, we need exactly one map of every type, which is why we cannot use different methods to set and query one Value object's value. This is why the value() method has to use a dispatching technique to simulate two (later in this paper we will expand it to three) different functions.
Logically, these maps don't inherently belong inside the Value class. They could just as appropriately be declared as global variables4, but then the programmer must make sure that all the necessary global variables have been defined. Even worse, it's not possible to have objects of different type but the same name in one name space. We can only overload function names, not variable names.
Now we have a working system, but there are still two major problems we have to solve. First, we must correctly handle assignment from one Value object to another, and second, we must somehow handle object destruction.
So we must think of something else. The solution is to store in each Value object a pointer to a member template function that it calls to delete its value. But once again, since the object itself is not parameterized, the signature of this function must be independent of the template argument. The idea behind this is to enable the Value destructor to call a function at destruction time which knows internally what type the calling object actually refers to, but can itself be called through a ``generic'' pointer to member function. Thus we use a function like the following:
template <class T> inline void Value::deleteValue() { static T t; // Used only as type selector. value(t, DELETE); } |
typedef void(Value::*FuncPointer)(void); FuncPointer fp_DELETE; |
Now we can add a destructor to the class definition for Value:
~Value() { if (fp_DELETE != NULL) (this->*fp_DELETE)(); } |
template <class T> T Value::value(T t, Action action) throw (Incompatible_Type_Exception&) { static map<Value*, T > values; switch(action) { case SET : { values[this] = t; if (fp_DELETE == NULL) fp_DELETE = &deleteValue<T> else if (fp_DELETE != (FuncPointer)&deleteValue<T>) { (this->*fp_DELETE)(); // Delete old value of type != T for this obj fp_DELETE = &deleteValue<T> // Remember delete function of right type } return t; } case DELETE : { // only called by destructor if (values.count(this)) values.erase(this); return t; } case GET : { if (values.count(this)) return values[this]; else throw Incompatible_Type_Exception(typeid(T).name()); } } } |
If value() is called with the new action DELETE, it simply removes the entry for the calling object from the internal map. Furthermore, the SET action is changed to assign the right function to the class variable fp_DELETE. If fp_DELETE points to a function of another type, that function is called first to remove the old value of the object.
With this solution, a Value object can hold no value at all, or exactly one value of arbitrary type. Before these changes, it could actually hold one value for every distinct data type. If we want to preserve this behavior now, we must keep not just one pointer to a cleanup function, but a list of pointers, one for every type actually held by the object. This is possible, but would result in an assignment time linear to the number of values the object holds, since this list would have to be adjusted every time.
Now we can implement the assignment of one Value object to another. As with the destructor in the previous section, a Value object doesn't know which of the existing maps contains a reference to it, if indeed any of them do. But again, this is the information we need if we want to achieve a behavior like the following:
Value v1(1234567);
Value v2(999.999);
v2 = v1;
int i = v2;
|
The solution is to explicitly define the copy assignment operator. Because it must not be a template function ([ANSI-CPP] 12.8), a specialization of our template assignment operator is not enough. So we add the following declaration to Value:
Value& operator=(const Value&); |
Value& Value::operator=(const Value &val) { if (this != &val && val.fp_CLONE != NULL) { (val.*val.fp_CLONE)(this); return *this; } } |
The pointer to member operators .* and ->* are binary operators, which take two arguments. The left argument is a pointer or a reference to an object (since every member function needs an implicit this pointer), while the right argument is a pointer to member ([ANSI-CPP] 5.5).
Finally, the definition of fp_CLONE:
typedef void(Value::*ClonePointer)(Value*) const; ClonePointer fp_CLONE; |
template <class T> void Value::cloneValue(Value *val) const { val->template value<T>(const_cast<Value*>(this)->template value<T>(), SET); } |
Finally, we have to change the SET action so that it sets fp_CLONE to point to a cloneValue() function of the right ``internal'' type. We can accomplish this as follows:
fp_CLONE = &cloneValue<T> |
After the implementation of the copy assignment operator with the help of the cloneValue() method, the realization of the copy constructor imposes no new problems. Again, because a template constructor can never be a copy constructor ([ANSI-CPP] 12.8), a specialization of our generic constructor is not enough. Instead we have to define it in the usual way:
inline Value::Value(const Value &val) : fp_DELETE(NULL) { if (val.fp_CLONE != NULL) { (val.*val.fp_CLONE)(this); } else fp_CLONE = NULL; } |
inline Value::Value() : fp_DELETE(NULL), fp_CLONE(NULL) {}
|
For polymorphic objects, C++ offers the possibility to query the exact dynamic type of an object at runtime. This can be done by using the builtin typeid() operator.
It would be convenient if this also would be possible for chameleon objects, since they also have the ability to dynamically change their internal type. Unfortunately it is not possible to overload the typeid() operator so we have to come up with an own solution.
According to the typeid() operator we define the following method typeId():
const type_info& Value::typeId() const throw (bad_typeid&) { if (myType) return *myType; else throw bad_typeid(); } |
const type_info* myType;
|
Because type_info has only a private copy constructor and assignment operator ([ANSI-CPP] 18.5.1), it is not possible to copy type_info objects. Therefor, only pointers to such objects can be stored.
Another missing feature for chameleon objects is the inability to print them. As stated already, this is not straight forward because the Value class is not aware of the type of the object it stores. Again, the only solution is to store a pointer to a parameterized method in the object, which implicitly knows how to print the wrapped data.
typedef ostream&(Value::*PrintPointer)(ostream&) const; PrintPointer fp_PRINT; |
ostream& Value::printValue(ostream& os) const { return os << const_cast<Value*>(this)->template value<T>(); } |
ostream& operator<<(ostream& os, const Value& e) { if (e.fp_PRINT) return (e.*e.fp_PRINT)(os); else return (os << "nil"); } |
At this point we have a fully working type safe wrapper class. Of course this functionality was bought with an increasing amount of memory and runtime overhead. The time overhead is caused mainly by the additional cost which result from querying the static data containers of the appropriate type. On the other hand we have a space overhead resulting from the introduced pointers to members. These overheads are especially severe, if the Value class will be used to wrap small data types like for example the builtin types int, double or char. On the other hand, wrapping an object of type string which itself already has a size of some hundred bytes, results only in a modest space overhead of some percent.
Nevertheless, there exists a possibility to improve both, space requirements and performance of the Value class. It was implemented in [Nseq], where also some performance tests can be found.
The idea behind this is to replace the static containers which hold the objects of the appropriate types. Instead, in every Value object a private void pointer will point to the corresponding data object.
Because all the conversions are still done in the parameterized operators of the Value class, type safety is still guaranteed. The solution is of course not as general as the one presented before, since we loose now the possibility of ``storing'' objects of different types in one Value object. On the other hand, we can omit a pointer to member function for deleting the object, since the builtin delete operator will free the space occupied by the object. The objects destructor however will not be called in this case, so this will work for classes with trivial constructors ([ANSI-CPP] 12.4:3) only. In general it is not wise to delete class objects through void pointers ([ANSI-CPP] 5.3.5:3) and it may still be useful to retain the fp_DELETE pointer.
The implementation of this version of the Value class can be found in the file Value_void.hpp at [SOURCE].
Another problem of the implementation presented so far is the fact that a Value object grows with every new pointer to member added to it. But from a logical point of view, these pointers don't belong to every class object, but should be shared by all objects which hold a value of the same type. Exactly this behavior can be modeled by introducing a structure which holds all the desired pointers and members necessary for a Value object. In turn, the only member required by a Value object is a pointer to the corresponding structure.
This technique is equivalent to the one used for virtual function dispatch. In that case, every class object which contains at least one virtual function, will contain a pointer to a virtual function table. At runtime, all polymorphic objects of the same dynamic type will point at the same unique virtual function table. In our case, all Value objects which store a value of the same type will have a pointer to a unique structure which contains the appropriately typed methods for manipulating this value.
Because the Value class is not parameterized, this pointer must have a fixed type. So we declare a struct VTable as follows:
struct VTable { const type_info* myType; typedef void(Value::*FuncPointer)(void); FuncPointer fp_DELETE; typedef void(Value::*ClonePointer)(Value*) const; ClonePointer fp_CLONE; typedef ostream&(Value::*PrintPointer)(ostream&) const; PrintPointer fp_PRINT; VTable(const type_info* ti, FuncPointer fP, ClonePointer cP, PrintPointer pP) : myType(ti), fp_DELETE(fP), fp_CLONE(cP), fp_PRINT(pP) {} }; |
template<class T> struct Spec_VTable : public VTable { Spec_VTable() : VTable(&typeid(T), &(Value::template deleteValue<T>), &(Value::template cloneValue<T>), &(Value::template printValue<T>)) {} }; |
At the time of the writing in June 1998, the code in this article could be compiled and tested only with version 2.38 of the EDG compiler front end for Linux [EDG], using egcs-1.0.3 and a patched version of libstdc++.2.8.1 compiled with the EDG frontend and egcs-1.0.3. The following compiler switches were used : ``-x'' to enable exception handling, ``-B'' to enable implicit inclusion of template definition files (not needed by the examples, but by the STL classes used) and ``-tlocal'' for local template instantiation.
In September 1998, version 1.1 of the egcs [EGCS] compiler was released, and it also was able to compile this code. The only change necessary was to replace the line
val->value<T>(const_cast<Value*>(this)->value<T>(), SET); |
T t = T(); val->value(const_cast<Value*>(this)->value(t, GET), SET); |
val->template value<T>(const_cast<Value*>(this)->template value<T>(), SET); |
Also notice that all section numbers in references to [ANSI-CPP] referred to in this article may be slightly different in other editions of the standard. The source code presented in this paper is available online from [SOURCE].
In this article we proposed a technique for building generic, but unparameterized, classes, which are able to store arbitrary typed objects and still maintain type safety. In addition to being useful in daily programming tasks, this example shows how an ordinary unparameterized class can change its ``internal type'' (i.e., the type of member data it holds) at runtime. We achieve this goal first by using generic pointers to template member functions, which only internally use their type information but externally preserve a constant signature, and second by using only one parametrized method which supplies the functionality of three basic functions through a dispatching mechanism and a static local variable. Furthermore, using operator overloading, these so-called Chameleon objects can be made transparent to the programmer.
The Value object is not aware of its internal type at any time, but it always has enough information to call the right functions at run time to properly access its data. During compilation, the compiler will detect and instantiate these functions for all types that the program stores in a Value object.
Thus we have something similar to the virtual function table, which is built up at compile time, but consulted for every function call at runtime. The difference in our case is that the data type that controls the dispatching can change during program execution.
This gives us a kind of ``dynamic runtime polymorphism'' not for function calls, as provided by inheritance and the virtual function mechanism, but for the data type of an object. This is achieved by combining the template mechanism, which is widely known as ``static polymorhism'', with a self managed, dynamic data structure.
First of all I want to thank the people at Edison Design Group for their compiler front end [EDG] , since it was the first- and for a long time the only- compiler that translated my code. Neither Microsoft VC++ nor g++ work at present time (October 1998). Furthermore I want to thank Prof. Küchlin (University of Tübingen) for supervising my master's thesis, Prof. Loos (University of Tübingen) and Prof. Musser (Rensselaer Polytechnic Institute) for supplying me with a copy of the new C++ standard, and Dr. Suworow, V. Kalinenko and J. Höfler from debis Systemhaus for their support. And last but not least, many thanks to my friend Roland Weiss for many discussions (not only about computer science) and for reviewing these pages.
This document was generated using the
LaTeX2HTML
translator Version 99.2beta5 (1.37)
and ProgDOC the Program Documentation System
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -html_version 4.0 -split 0 -show_section_numbers -no_navigation -no_footnode -image_type gif -numbered_footnotes value.tex
The translation was initiated by on
[1]