Is it better to encapsulate dictionaries in library code?

Question

I work on (big) codebase that consists out of multiple modules that are all built on top of shared library. In this shared library, is it better to encapsulate dictionaries as return values, instead of returning dictionaries directly?

I do not mean dictionaries that are used internally in one function, or even as a private member in a class, but when it is passed between multiple classes / modules.

I sometimes see an overuse of dictionaries, which makes code hard to read, maintain and bug prone in my opinion. I've seen dictionaries like this being passed around: Dictionary<string, Dictionary<string, Dictionary<string, string>>>. Obviously, this is incomprehensible of what data is actually being represented.

Some examples where we use dictionaries (but there are many more):

A lot of our objects have an IDictionary<CultureInfo, string> that represents labels in multiple languages/cultures.
When we do queries to check if multiple objects exist by id (there are two types, a number and a string id), a IDictionary<long, bool> or IDictionary<string, bool> is returned.
In some parts a 'context' is used which usually is something like IDictionary<string, string> or IDictionary<string, object>.

Some functions have one or more dictionaries as a parameter, to keep track of what a dictionary represents and to avoid confusion, we need to properly name those variables. In my opinion it would be better to encapsulate these dictionaries in specific new types. I like to have code (very) strongly typed, and I think that other languages like Haskell encourage this much more.

It would have the following advantages:

It makes the code more readable and maintainable.
It allows us to extend these in the future with operations that could be useful.
It allows us to document and test these types.
Dictionaries with the same key and value type could overwrite each others variables, while they represent totally different data.
When a dictionary with a string as key is used, you can never be sure up front if it is case sensitive or not, but with a specific type it would be.
It allows us to switch the internal container, or the type of key, while giving other modules the time to migrate by still supporting the old operations.

What I would like to do is e.g. replace the first example by a Labels or LabelCollection class. This way we could also support extra functionality like GetDefaultLabel, and we could be less reliant on .NET specifics such as InvariantCulture or even CultureInfo itself in the future.

However, I think it will be hard to convince everyone about this idea. It will take more time to implement these types, than just passing dictionaries directly, and there will not be a netto benefit instantly. Also, a lot of of the advantages are not required right now. There will be no new operations or changes to the container at this moment.

Am I over-engineering this?

You have to talk with your team. It doesn't help you to convince random people on the internet of your viewpoint. Talk with your team about your suggestions, and if they have any concerns or misgivings, you have to understand and address them. — JacquesB, Sep 27 '19 at 09:33
seems like a really weird way to do localization for starters — jk., Sep 27 '19 at 11:34
Related, but not an *exact* duplicate: [Is it good practice to inherit from generic types?](https://softwareengineering.stackexchange.com/questions/266672/is-it-good-practice-to-inherit-from-generic-types) — John Wu, Sep 28 '19 at 00:14

Flater · Answer 1 · 2019-09-27T10:17:06.013

My feedback on your arguments, in no particular order

What I would like to do is e.g. replace the first example by a Labels or LabelCollection class.

So rather than using a pre-existing generic collection, you want to custom define a class which removes the generic (and thus requires you to copy/paste it for every type you want to use it with) and then just make it a shell around an already existing collection interface?

That's a lot of work for no good reason. That's not to say it might not be warranted in your case, but some contextual information is lacking here. In general, your approach isn't the best, but there may be other factors which make this case an exception to an otherwise general rule.

This way we could also support extra functionality like GetDefaultLabel, and we could be less reliant on .NET specifics such as InvariantCulture or even CultureInfo itself in the future.

Does this mean you're arguing to roll your own collection implementation? I ask this because:

If your class still uses .Net's collections internally, nothing has changed about the .Net specifics you'll rely on.
If your class does not use .Net's collections internally, that means you're rolling your own collection implementation.

Either you're making an empty wrapper, or you're trying to reinvent the wheel. Neither is particularly good.

It makes the code more readable and maintainable.

I agree up to a point. There is a reasonable argument for class wrapping values that become too cumbersome to type out. For example, I would much rather create a LatLon class instead of having to pass float lat, float lon everywhere.
If it's used only once or twice, it doesn't matter. If half the codebase uses latlon pairs, it very much starts to matter.

But the benefit must outweigh the effort of implementing said class and the impact on readability. While a properly named class name might be readable, it is still adding an extra cog to the machine. Even if all classes are perfectly named, the more classes you have, the more complex your codebase becomes.

There is an additional argument of introducing additional type safety. In our projects, for example, we don't pass around entity ids by their bare int or Guid type, but rather wrap those in distinct FooKey, BarKey, ... structs. Even though all these structs have the exact same definition other than their name, we use this specifically to enforce key type safety so that we e.g. don't mistakenly pass a person's ID into a method which requires a user's ID.

But we established through long debugging experiences that the pros very much outweigh the cons. I can't confirm whether that same evaluation applies to your case.

It allows us to extend these in the future with operations that could be useful.

Don't write method signatures based on what you might have tomorrow, you aren't going to need it. Write them based on what you have today.

I find the expectation that you will need to extend dictionaries a bit curious. You may need more input parameters in the future, but should that second input parameter really be lumped together with the dictionary, as opposed to being a second parameter by its own right? I don't think you can even make that judgment call before that second parameter exists.

I also find your extension argument counterintuitive. Using an IDictionary<T,U> keeps things more extensible compared to using MyCustomDictionaryClass, not less. It allows you to pass anything that satisfies the interface without requiring it to inherit from one specific custom built class.

It allows us to document and test these types.

The nice thing about using framework components such as a dictionary is that you don't need to test it at all. The documentation is also already available (via MSDN in this case).

Dictionaries with the same key and value type could overwrite each others variables, while they represent totally different data.

You can't stop developers handling their data badly or making a mistake (e.g. assigning a value to the wrong dictionary), and trying to prevent it is going to cost a lot of effort. In doing so, you're going to be creating arbitrary roadblocks and in the end you can never really fix it anyway.

The same effort spent on unit tests would be a lot more valuable, and such a unit test would catch this issue as well.

When a dictionary with a string as key is used, you can never be sure up front if it is case sensitive or not, but with a specific type it would be.

Whether the keys are case sensitive or not is up to the dictionary to decide. If it's really that important to you, simple documentation on the method in question would already fix this issue.

Besides, even if you made your own class, you would still need to document its behavior (case sensitivity) anyway.

It allows us to switch the internal container, or the type of key, while giving other modules the time to migrate by still supporting the old operations.

This doesn't make sense. If you change the type of key, everyone who uses that dictionary cannot meaningfully interact with it anymore (other than blindly passing it along). You can't get or set values without using a key.

If you have to support different versions of your API/contracts/..., do not start wrapping one version in another! Put them next to each other and use IOC/DI to swap behaviors based on the required version.

It will take more time to implement these types, than just passing dictionaries directly, and there will not be a netto benefit instantly. Also, a lot of of the advantages are not required right now.

There seems to be nothing but drawbacks here. The only option you've indirectly left open is that there might be a long term benefit.

But I highly contest that notion. Your classes are going to add to the bulk of your codebase, making it a lot heavier to handle. There's more moving bits to the machine, developers have to know/use more classes, and your rigid implementations are going to end up decreasing extensibility in the future.

Am I over-engineering this?

In my opinion, yes.

I think you're distrusting future development work to such a degree that you're trying to lock things down now "so the guys in the future can't start doing it wrong".
But you're not really considering that maybe the guys in the future who want to change their approach are doing so because of things you (in the past) didn't realize back then.

If you write your codebase the way you're suggesting now, future reworks of the codebase are going to be massively impacted in a negative way.

I sometimes see an overuse of dictionaries, which makes code hard to read, maintain and bug prone in my opinion.

Overuse of dictionaries can be indicative of an issue. Because of their key-value-pair structure, dictionaries can start acting as a poor man's DTO. In that sense, a class may be more warranted than a dictionary.

But if that is the case, the solution is not to wrap it in a class which still claims to be a collection. Rather, the solution is to create a class which has specific properties (instead of dictionary keys)

Of course, this only applies to cases where the dictionary contains a fixed/expected set of keys, as these keys can be converted to be the properties of a class.

If the dictionary is of an actually variable length and can be dynamically created with no predetermined keys in mind, then this advice does not apply.

A lot of our objects have an IDictionary<CultureInfo, string> that represents labels in multiple languages/cultures.

If your supported languages are set in stone (at compile time), and there is a general expectation that every label has been translated for every supported language, it may be warranted to wrap these values in a class with specific language properties and forgo the whole collection system as far as your domain is concerned. For data storage, I would of course still suggest a collection approach.

If your supported languages are not set in stone, it may still be beneficial to create a wrapper around the dictionary which automatically fetches the current language (e.g. from session data, or wherever you store it) The wrapper then externally acts as if it's a single value, but internally it translates a list of translated values to the correct value based on the currently selected language.

When we do queries to check if multiple objects exist by id (there are two types, a number and a string id), a IDictionary or IDictionary is returned.

This just seems like a very weird use case. My instinct tells me that the existence and usage of this method needs to be reevaluated.

In some parts a 'context' is used which usually is something like IDictionary or IDictionary.

"Context" is very vague. If this is being used as e.g. an entity cache, it can be a valid use case for dictionaries, wrapping it isn't quite necessary.

However, your question here seems to imply that this context isn't just an internal state tracker, but is actively being passed around. It's not quite clear what this context is, what it's used for, and whether better alternatives exist.

I virtually disagree with every point you named against replacing a dictionary with a concrete type. You would certianly advise against primitive obsession so why do you support it when collections are involved? It's the same thing, just different types. — t3chb0t, Aug 13 '23 at 14:12
@t3chb0t: I don't understand your point. I would argue that dictionaries are more of a primitive obsession than a concrete type (which might contain a dictionary) is. — Flater, Aug 13 '23 at 22:09

Doc Brown · Answer 2 · 2019-09-27T19:17:17.140

Using of lots of dictionaries is not a problem per se (library or not).

Overuse of dictionaries is a problem, especially when they are used for passing and emulating data structures in an anonymous manner. When the alternative would be to create a new struct or DTO class, with a proper name for the class and its attributes, some software engineers tend to get lazy and try to circumvent the problem of finding a good name. Then a dictionary may seem to be a quick-and-dirty solution to pass a whole bunch of data without investing too much effort in inventing names.

Moreover, this saves a little bit boilerplate code. It is just like declaring a local array string a[5] in a method for 5 unrelated strings, because you need 5 string variables but are too lazy to find 5 self-describing names for those variables (don't laugh, I had to deal with such code in the past, and when the variables were used and reused for different purposes within a 200 LOC method, trying to improve things gets really uncomfortable).

Honestly, such practices will sooner or later haunt back on your team. Especially in large code bases, I would heavily recommend to make your data structures self-describing, use explicit names whereever it makes sense and use dictionaries only if this does not obfuscate the input/output parameters of the code.

Said that, it is debatable and sometimes just a matter of taste if using a IDictionary<CultureInfo, string> for a set of labels indexed by certain a culture is ok. Instead of introducing whole new collection class LabelCollection, maybe a IDictionary<CultureInfo, MyLabel> would be sufficient. In the end, you will have to ask you colleagues what they prefer and where they draw the line between readable and umcomprehensible code.

Is it better to encapsulate dictionaries in library code?

2 Answers2

My feedback on your arguments, in no particular order