As written, it "smells," but that might just be the examples you gave. Storing data in generic object containers, then casting it to get access to the data is not automatically code smell. You will see it used in many situations. However, when you use it, you should be aware of what you're doing, how you're doing it, and why. When I look at the example, the use of string based comparisons to tell me what object is what is the thing that trips my personal smell meter. It suggests you're not entirely sure what you're doing here (which is fine, since you had the wisdom to come here to programmers.SE and say "hey, I don't think I like what I'm doing, help me out!").
The fundamental issue with the pattern of casting data from generic containers like this is that the producer of the data and the consumer of the data must work together, but it may not be obvious that they do so at first glance. In every example of this pattern, smelly or not smelly, this is the fundamental issue. It is very possible for the next developer to be completely unaware that you are doing this pattern and break it by accident, so if you use this pattern you must take care to help the next developer out. You have to make it easier for him to not break the code unintentionally due to some detail he may not know existed.
For example, what if I wanted to copy a player? If I just look at the contents of the player object, it looks pretty easy. I just have to copy the attack
, defense
, and tools
variables. Easy as pie! Well, I'll find out quickly that your use of pointers makes it a little harder (at some point, it's worth looking at smart pointers, but that's another topic). That is easily resolved. I'll just create new copies of each tool, and put those in my new tools
list. After all, Tool
is a really simple class with only one member. So I create a bunch of copies, including a copy of the Sword
, but I didn't know it was a sword, so I only copied the name
. Later, the attack()
function looks at the name, sees that it is a "sword", casts it, and bad things happen!
We can compare this case to another case in socket programming, which uses the same pattern. I can set up a UNIX socket function like this:
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
sockaddr_in serv_addr;
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(portno);
serv_addr.sin_addr.s_addr = INADDR_ANY;
bind(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr));
Why is this the same pattern? Because bind
doesn't accept a sockaddr_in*
, it accepts a more generic sockaddr*
. If you look at the definitions for those classes, we see that sockaddr
has only one member the family that we assigned to sin_family
*. The family says which subtype you should cast the sockaddr
to. AF_INET
tells you that the address struct is actually a sockaddr_in
. If it was AF_INET6
, the address would be a sockaddr_in6
, which has larger fields to support the larger IPv6 addresses.
This is identical to your Tool
example, except it uses an integer to specify which family rather than a std::string
. However, I'm going to claim it doesn't smell, and try to do so for reasons other than "its a standard way to do sockets, so it shouldn't 'smell.'" Obviously its the same pattern, which is why I claim that storing data in generic objects and casting it is not automatically code smell, but there's some differences in how they do it which make it safer.
When using this pattern, the most important information is capturing the conveyance of information about the subclass from producer to consumer. This is what you're doing with the name
field and UNIX sockets do with their sin_family
field. That field is the information the consumer needs to understand what the producer had actually created. In all cases of this pattern, it should be an enumeration (or at the very least, an integer acting like an enumeration). Why? Think about what your consumer is going to do with the information. They're going to need to have written out some big if
statement or a switch
statement, as you did, where they determine the correct subtype, cast it, and use the data. By definition, there can only be a small number of these types. You can store it in a string, as you did, but that has numerous disadvantages:
- Slow -
std::string
typically has to do some dynamic memory to keep the string. You also have to do a full text comparison to match the name every time you want to figure out what subclass you have.
- Too versatile - There's something to be said for putting constraints on yourself when you're doing something exceedingly dangerous. I've had systems like this which looked for a substring to tell it what type of object it was looking at. This worked great until the name of an object accidentally contained that substring, and created a terribly cryptic error. Since, as we stated above, we only need a small number of cases, there's no reason to use a massively overpowered tool like strings. This leads to...
- Error prone - Let's just say that you will want to go on a murderous rampage trying to debug why things aren't working when one consumer accidentally sets the name of a magic cloth to
MagicC1oth
. Seriously, bugs like that can take days of head-scratching before you realized what happened.
An enumeration works much better. It's fast, cheap, and much less error prone:
class Tool {
public:
enum TypeE {
kSword,
kShield,
kMagicCloth
};
TypeE type;
std::string typeName() const {
switch(type) {
case kSword: return "Sword";
case kSheild: return "Sheild";
case kMagicCloth: return "Magic Cloth";
default:
throw std::runtime_error("Invalid enum!");
}
}
};
This example also shows off a switch
statement involving the enums, with the single most important part of this pattern: a default
case that throws. You should never get in that situation if you do things perfectly. However, if someone adds an new tool type, and you forget to update your code to support it, you'll want something to catch the error. In fact, I recommend them so much that you should add them even if you don't need them.
The other huge advantage of the enum
is that it gives the next developer a complete list of valid tool types, right up front. There's no need to go wading through the code to find Bob's specialized Flute class that he uses in his epic boss battle.
void damageWargear(Tool* tool)
{
switch(tool->type)
{
case Tool::kSword:
static_cast<Sword*>(tool)->damageSword();
break;
case Tool::kShield:
static_cast<Sword*>(tool)->damageShield();
break;
default:
break; // Ignore all other objects
}
}
Yes, I put in an "empty" default statement, just to make it explicit to the next developer what I expect to happen if some new unexpected type comes my way.
If you do this, the pattern will smell less. However, to be smell-free the last thing you need to do is consider the other options. These casts are some of the more powerful and dangerous tools you have in the C++ repertoire. You shouldn't use them unless you have a good reason.
One very popular alternative is what I call a "union struct" or "union class." For your example, this would actually be a very good fit. To make one of these, you create a Tool
class, with an enumeration like before, but instead of subclassing Tool
, we just put all the fields from every subtype on it.
class Tool {
public:
enum TypeE {
kSword,
kShield,
kMagicCloth
};
TypeE type;
int attack;
int defense;
};
Now you don't need subclasses at all. You just have to look at the type
field to see which other fields are actually valid. This is much safer and easier to understand. However, it has drawbacks. There are times you don't want to use this:
- When the objects are too dissimilar - You can end up with a laundry list of fields, and it can be unclear which ones apply to each object type.
- When operating in a memory-critical situation - If you need to make 10 tools, you can be lazy with memory. When you need to make 500 million tools, you're going to start caring about bits and bytes. Union structs are always bigger than they need to be.
This solution is not used by UNIX sockets because of the dissimilarity issue compounded by the open endedness of the API. The intent with UNIX sockets was to create something which every flavor of UNIX could work with. Each flavor could define the list of families they support, like AF_INET
, and there would be a short list for each. However, if a new protocol comes along, like AF_INET6
did, you might need to add new fields. If you did this with a union struct, you'd end up effectively creating a new version of the struct with the same name, creating endless incompatibility issues. This is why the UNIX sockets chose to use the casting pattern rather than a union struct. I'm sure they considered it, and the fact that they thought about it is part of why it doesn't smell when they use it.
You could also use a union for real. Unions save memory, by only being as larger as the largest member, but they come with their own set of issues. This probably is not an option for your code, but its always an option you should consider.
Another interesting solution is boost::variant
. Boost is a great library full of reusable cross-platform solutions. Its probably some of the best C++ code ever written. Boost.Variant is basically the C++ version of unions. It is a container which can contain many different types, but only one at a time. You could make your Sword
, Shield
, and MagicCloth
classes, then make tool be a boost::variant<Sword, Shield, MagicCloth>
, meaning it contains one of those three types. This still suffers from the same issue with future compatibility that prevents UNIX sockets from using it (not to mention UNIX sockets are C, predating boost
by quite a bit!), but this pattern can be incredibly useful. Variant is often used, for example, in parse trees, which take a string of text and break it up using a grammar for rules.
The final solution which I'd recommend looking at before taking a plunge and using the generic object casting approach is the Visitor design pattern. Visitor is a powerful design pattern that takes advantage of the observation that calling a virtual function effectively does the casting you need, and it does it for you. Because the compiler does it, it can never be wrong. Thus, instead of storing an enum, Visitor uses an abstract base class, which has a vtable that knows what type the object is. We then create a neat little double-indirection call which does the work:
class Tool;
class Sword;
class Shield;
class MagicCloth;
class ToolVisitor {
public:
virtual void visit(Sword* sword) = 0;
virtual void visit(Shield* shield) = 0;
virtual void visit(MagicCloth* cloth) = 0;
};
class Tool {
public:
virtual void accept(ToolVisitor& visitor) = 0;
};
lass Sword : public Tool{
public:
virtual void accept(ToolVisitor& visitor) { visitor.visit(*this); }
int attack;
};
class Shield : public Tool{
public:
virtual void accept(ToolVisitor& visitor) { visitor.visit(*this); }
int defense;
};
class MagicCloth : public Tool{
public:
virtual void accept(ToolVisitor& visitor) { visitor.visit(*this); }
int attack;
int defense;
};
So what's this god aweful pattern? Well, Tool
has a virtual function, accept
. If you pass it a visitor, it is expected to turn around and call the correct visit
function on that visitor for the type. This is what the visitor.visit(*this);
does on each subtype. Complicated, but we can show this with your example above:
class AttackVisitor : public ToolVisitor
{
public:
int& currentAttack;
int& currentDefense;
AttackVisitor(int& currentAttack_, int& currentDefense_)
: currentAttack(currentAttack_)
, currentDefense(currentDefense_)
{ }
virtual void visit(Sword* sword)
{
currentAttack += sword->attack;
}
virtual void visit(Shield* shield)
{
currentDefense += shield->defense;
}
virtual void visit(MagicCloth* cloth)
{
currentAttack += cloth->attack;
currentDefense += cloth->defense;
}
};
void Player::attack()
{
int currentAttack = this->attack;
int currentDefense = this->defense;
AttackVisitor v(currentAttack, currentDefense);
for (Tool* t: tools) {
t->accept(v);
}
//some other functions to start attack
}
So what happens here? We create a visitor which will do some work for us, once it knows what type of object it's visiting. We then iterate over the list of tools. For argument's sake, let's say the first object is a Shield
, but our code doesn't know that yet. It calls t->accept(v)
, a virtual function. Because the first object is a shield, it ends up calling void Shield::accept(ToolVisitor& visitor)
, which calls visitor.visit(*this);
. Now, when we're looking up which visit
to call, we already know that we have a Shield (because this function got called), so we will end up calling void ToolVisitor::visit(Shield* shield)
on our AttackVisitor
. This now runs the correct code to update our defense.
Visitor is bulky. It's so clunky that I almost think it has a smell of its own. It's very easy to write bad visitor patterns. However, it has one huge advantage none of the others have. If we add a new tool type, we have to add a new ToolVisitor::visit
function for it. The instant we do this, every ToolVisitor
in the program will refuse to compile because it's missing a virtual function. This makes it very easy to catch all cases where we missed something. It's much harder to guarantee that if you use if
or switch
statements to do the work. These advantages are good enough that Visitor has found a nice little niche in 3d graphics scene generators. They happen to need exactly the kind of behavior Visitor offers so it works great!
In all, remember that these patterns make it hard on the next developer. Spend time making it easier for them, and the code won't smell!
* Technically, if you look at the spec, sockaddr has one member named sa_family
. There's some tricky being done here at the C level that doesn't matter for us. You're welcome to look at the actual implementation, but for this answer I'm going to use sa_family
sin_family
and others completely interchangeably, using whichever one is most intuitive for the prose, trusting that that C trickery takes care of the unimportant details.