39

I'm not sure what to do with the following:

We take data from an external tool within our own tool. This data is written in Dutch. We are writing our Java code in English. Should we then translate this Dutch to English or keep it Dutch? For example, we have 2 departments: Bouw (Construction in English) & Onderhoud (Maintenance in English).

Would it then be logical to create:

public enum Department { BOUW, ONDERHOUD }

or:

public enum Department { CONSTRUCTION, MAINTENANCE }

or even:

public enum Afdeling { BOUW, ONDERHOUD }

(afdeling is Department in Dutch)

Martijn
  • 1,016
  • 9
  • 14
Jelle
  • 2,014
  • 2
  • 12
  • 19
  • 3
    Possible duplicate of [Non-English Naming Conventions](http://softwareengineering.stackexchange.com/questions/152891/non-english-naming-conventions) – gnat Nov 28 '16 at 11:40
  • 3
    I guess it's not a duplicate because I'm talking about external data and not our own application data, which is named in English. – Jelle Nov 28 '16 at 11:42
  • 1
    If using non-English data objects or source in general, it helps to have a translation reference table of the rough English equivalent for each function and data object. This is especially relevant for function and object names that use multiple words, which is fairly common in some languages. I've had to fix bugs not in my native tongue with absolutely no knowledge of the language, but due to having a translation dictionary for that program, it was trivial. Typically programmatic translation libraries are only included in projects which properly localized their software. – kayleeFrye_onDeck Nov 28 '16 at 17:21
  • 3
    Does the rest of your program (apart from standard libraries) use English or Dutch identifiers? – user253751 Nov 28 '16 at 20:45
  • For now, we've used English only, but the Departments are currently the only hard-coded user data, because the Department of a certain project plays a big role in our application. Other Dutch values are saved within our database, so they're not hardcoded. – Jelle Nov 29 '16 at 10:12
  • if you have many `if (project.getDepartment().equals(Department.XYZ))` statements, there may actually exist other issues with the application design. Look at Template Method, Strategy, and other patterns to get rid of them. – Bernhard Hiller Nov 29 '16 at 10:32
  • You could also use aliases... in Python it's as easy as: `class Depts(Enum): bouw = 1; construction = 1; onderhoud = 2; maintenance = 2 `. Don't Java enums have some way of specifying two different names for the same enum value? – Bakuriu Nov 29 '16 at 11:03
  • In Java I believe you can create souped up enums; you could have a method that returns the English name for each value. – marczellm Nov 30 '16 at 09:13

7 Answers7

60

English is a lingua franca/lowest common denominator for a reason. Even if the reason is conceptually as weak as "Everybody does it", that's still a rather important reason.

Going against common practice means that you have to understand Dutch to make sense of the data structures in yor software. There's nothing wrong with Dutch, but the probability that any given engineer who'll have to interact with the code base speaks it is still lower than that for English.

Therefore, unless you're a Dutch-only shop, and don't plan to expand internationally ever, it's almost always a good idea to keep your codebase monolingual, and use the most popular coding language.

Note: This advice applies to program code only. User data should definitely not be translated, but processed "as is". Even if you have a customer "Goldstein", clearly you should not store their name as "golden stone".

The trouble is that there is a continuum of terms between "user-supplied, don't touch" and "code fragment, use English at all times". Customer names are very near the former end of the spectrum, Java variables near the latter end. Constants for enum values are slightly farther away, particularly if they denote well-known, unique external entities (like your departments). If everyone in your organisation uses the Dutch terms for the departments, you don't plan on confronting anyone with the code base who doesn't, and the set of existing departments changes rarely, then using the accepted names of the department may make more sense for enum constants than for local variables. I still wouldn't do it, though.

Robert Harvey
  • 198,589
  • 55
  • 464
  • 673
Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
  • 3
    +1, using English in that case gives you code Readability and Reusability, which is disclosed in this answer. While making it Dutch breaks them in some way. – Mikhail Churbanov Nov 28 '16 at 11:49
  • 3
    Thanks for your reply. But what about the fact that you're making up terms that do not exist in the first place? Within our company, there no such thing as a 'Build' department. This is the reason why I asked the question. – Jelle Nov 28 '16 at 11:55
  • 2
    At Siemens (long ago) they used to translate everything to German. It was a nightmare, –  Nov 28 '16 at 12:14
  • 4
    @Jelle Does the name have a semantic significance to the code? If yes, translate it - you need a translation of the concept anyway. If not, why do you have an `enum` for it? That might just be a sign that you're trying to model data in code, which may be a bad idea. – Luaan Nov 28 '16 at 17:47
  • 29
    I strongly disagree with this idea of generally translating domain-specific terminology. In some domains, for instance the railway industry, the glossaries of different languages or even territories differ so much that any attempt to translate even a single term will warp the meaning so much that you prevent anyone from understanding it. Unless you're absolutely sure that the application domain allows for lossless translation, **don't translate domain terminology**. – Rhymoid Nov 28 '16 at 22:30
  • Definite points for pointing out not only the extremes, but the continuum between. So many rules are more natural when you realize they're not cut and dry! – Cort Ammon Nov 28 '16 at 23:34
  • 6
    I also heard from my project leader that on another project, developers we're translating some domain objects from Dutch to English, and it became unclear later in the project what these objects meant because of these custom translations. – Jelle Nov 29 '16 at 10:20
  • 4
    Read the comments by @Rhymoid and Jelle again. Never make your own translations of domain terminology! If you decide to use english terms for dutch-named entities make sure to use an official translation, not your own. – Guran Nov 29 '16 at 10:37
  • If we're talking about domain specific problems, this is making some pretty strong claims that are unlikely to be true. No, if your program deals with the Dutch railway system, chances are that there are more programmers comfortable with reading the Dutch spec than English speakers who can perfectly (and good luck if there are synonyms) translate the Dutch spec into English to correlate it to the program code. – Voo Nov 29 '16 at 12:41
34

In this scenario, I would leave the enum values in Dutch:

public enum Department { BOUW, ONDERHOUD }

Because the logic using these constants will be matching against data that is also in Dutch. For example, if the input is "bouw", the comparison code might look like:

if (Department.BOUW == input.toUpper())

I find it easier to debug when the values match (even if I don't know what the values mean). Translation just adds a mental hoop I, as a developer, should not have to jump through to prove correctness.

Nevertheless, you can just comment the code if it helps others understand the context of the data:

public enum Department { 
    BOUW, /* build */
    ONDERHOUD /* maintenance */
}
bishop
  • 730
  • 6
  • 12
  • 2
    Thanks, this is indeed also a good reason to pick the language from the external data so it's not necessary to write alot of if/else logic for translating. – Jelle Nov 28 '16 at 14:04
  • 3
    @Jelle Of course, if you ever end up expanding internationally, you might be glad of the translation logic anyway. YMMV on YAGNI. – Williham Totland Nov 28 '16 at 15:20
  • 6
    You should not be comparing your enums to strings directly anyway. What if you have to compare a string with several words with an enum value? – Jørgen Fogh Nov 28 '16 at 15:40
  • Note that your documentation should go **before** the value, not after. Right now the `build` comment applies to the `ONDERHOUD` value. – Reinstate Monica Nov 28 '16 at 15:41
  • 25
    I would **never ever** compare strings using .toUpper() and ==, especially when working with user-input and localized strings. School-book example of this is the Turkish "i" character. – Adriano Repetti Nov 28 '16 at 16:53
  • 4
    @ABoschman End of line comments universally refer to the line they're on. I've seen this comment type for simple descriptions of list items hundreds of times… – Weaver Nov 29 '16 at 00:15
  • 13
    In our shop, we do the opposite of what's suggested here: the enum/const names are in English (or what passes for English), and the comments are "localized". Which is good. Otherwise we'd have all these consts with names like `PAAMAYIM_NEKUDOTAYIM`. – sq33G Nov 29 '16 at 07:48
  • @StarWeaver My bad, I mistook them for Javadoc (`/** ... */`). – Reinstate Monica Nov 29 '16 at 08:05
  • 2
    Echoing the sentiments of @Rhymoid above, you should not translate domain-specific terms. From personal experience, it's unbelievably exasperating seeing translations when they shouldn't apply. I'm a native English-speaker and I personally would leave it in Dutch, working with it just fine. It doesn't help readability, in fact it obscures it. – Stephen O'Flynn Nov 29 '16 at 13:52
  • @sq33G what about English words with double meanings like `Pool`, `Nails`, etc? – Jelle Nov 29 '16 at 16:01
  • @AdrianoRepetti, but this is Dutch. I think we should only worry about the quirks of the Turkish language when programming a system with Turkish input. – Arturo Torres Sánchez Nov 29 '16 at 16:55
  • 3
    I think we should not worry about the quirks of any _foreign_ culture only if we program a system that will never be used outside one country and only with input granted to be in that language. However we all know requirements aren't set in stone, right? What if that company open a department in Turkey? Not to mention that in Dutch a word may start with an apostrophe. Globalization is never easy. There aren't shortcuts. – Adriano Repetti Nov 29 '16 at 17:19
  • @Jelle Not so much because of double meanings, which are usually clear enough from context, but because of complex domain concepts, const and enum names (and variable names for that matter) often get longer names than `nail` - more like `cancelledNailRequest` or `pendingTransactionNail`. – sq33G Nov 30 '16 at 08:36
  • @sq33G It's worth pointing out that `cancelledPoolRequest` would do nothing to resolve the ambiguity, since both meanings I know of are the same part of speech --- nouns. With nail, you get lucky, as one is a noun and the other a verb. – jpaugh Nov 09 '18 at 19:57
15

Avoid translation where possible, because every translation is additional effort and may introduce bugs.

The key contribution of "Domain Driven Design" to modern software engineering is the concept of a Ubiquitous Language, which is a single language used by all stake holders of a project. According to DDD, translation should not occur within a team (which includes domain experts, even if present only by proxy of a specification document), but only between teams (further reading: "Domain Driven Design" by Eric Evans, in particular the chapters about Ubiquitous Language and strategic design).

That is, if your business experts (or your specification document) speak Dutch, use their (Dutch) terminology when expressing business concerns in source code. Do not needlessly translate into English, because doing so creates an artificial impediment for communication between business experts and programmers, which takes time and can (through ambiguous or bad translation) cause bugs.

If, in contrast, your business experts can talk about their business in both English and Dutch, you are in the fortunate situation of being able to pick the project's ubiquitous language, and there are valid reasons for preferring English (such as "internationally understandable and more likely to be used by standards"), but doing so does not mean that coders should translate what the business people are talking about. Instead, the business people should switch languages.

Having a ubiquitous language is particularly important if requirements are complex and must be implemented precisely, if you're just doing CRUD the language you use internally matters less.

Personal anecdote: I was in a project where we exposed some business services as a SOAP endpoint. The business was entirely specified in German, and unlikely to be reused as is in english, because it was about legal matters specific to a particular jurisdiction. Nevertheless, some ivory tower architects mandated that the SOAP interface be English to promote future reuse. This translation occurred at hoc, and with little coordination among developers, yet alone a shared glossary, resulting in the same business term having several names in the web service contract, and some business terms having the same name in the web service contract. Oh, and of course some names where used on either side of the divide - but with different meanings!

If you choose to translate anyway, please standardize the translation in a glossary, add compliance with that glossary to your definition of done, and check it in your reviews. Don't be as careless as we have been.

meriton
  • 4,022
  • 17
  • 18
  • 5
    The business experts will speak English. English proficiency amongst the educated Dutch in the workforce is 100%. – MSalters Nov 29 '16 at 08:22
  • 4
    Speaking english is one thing. Being able to make quality translations of dutch domain terminology to english is quite another. – Guran Nov 29 '16 at 11:19
  • 1
    @MSalters: Proficient at what level? On the project I talked about, everyone was able to speak English, but they were nowhere as proficient as in German. For instance, there was a method `getAdminRoll` that checked the admin role ... (the German word is "Rolle", and they dropped the wrong letter :-) – meriton Nov 29 '16 at 17:54
  • @Guran: Actually, that's usually the other way around: your domain expert may botch the English grammar and have challenges with small talk, but they will know their domain terminology in English. The programmers may be the bigger problem: their domain is software, which means they know _that_ vocabulary, but not necessarily the business vocabulary. – MSalters Dec 01 '16 at 01:55
  • @meriton: That's actually not such a weird error, considering that "roll" is an English suffix, e.g. _payroll_, from French _rolle_. On average, English fluency in the Netherlands is significantly higher than in Germany. For instance, II wouldn't yet expect German universities to switch to English as the spoken language. And submitting a thesis in German is still considered normal, I think? – MSalters Dec 01 '16 at 02:05
  • @MSalters I guess that depends on the domain and the individuals.I've seen too many business experts using made up english terms. (And probably done so myself, not being a native english speaker). But I digress... – Guran Dec 01 '16 at 06:48
9

The correct solution is to not hard-code the departments at all:

ArrayList<String> departments = (... load them from a configuration file ...)

Or, if you absolutely need a department type:

class Department { String name; Department(String name) { this.name = name; } ... }
HashMap<String, Department> = (... generate from configuration file ...)

If you find the need to test against specific departments in your code, you have to ask more generically what is special about that department, and accept configuring that department as having that property. For example, if one department has weekly payroll, and that's what the code cares about, there should be a WEEKLY_PAYROLL property that can be attached to any department by the configuration.

DepressedDaniel
  • 936
  • 5
  • 6
  • This. What happens when a depart gets split up or combined or a new one forms? This code will adapt more or less automatically; turning it into an enum means you need a new build or it will explode. – jpmc26 Nov 29 '16 at 04:39
  • 1
    This would be a solution if the departments didn't play such a big role in our application. We have alot of `if (project.getDepartment().equals(Department.XYZ))` statements. – Jelle Nov 29 '16 at 10:13
  • @Jelle how about a `project.isForDepartment("XYZ")`, which in turn uses Daniel's hashmap (which is injected in Project, or something) – SáT Nov 29 '16 at 10:37
  • 2
    @SáT, That is just asking for typo's, honestly... – Jelle Nov 29 '16 at 10:43
  • @Jelle Yes, but it can be caught runtime. Tests could catch them in compilation time too. (Though I understand where you're coming from, and I do sort of agree.) – SáT Nov 29 '16 at 10:55
  • This really is the best answer, separating code from data, and separating code that knows about the data (business logic) from code that doesn't need to. – Jared Smith Nov 29 '16 at 19:13
3

For any people wondering: we've chosen for the first option, mainly because we think that you should not make up terms for the sake of translating. However, if sometime, an international developer would be working on the project, we've added some documentation to explain so:

/** The possible departments of a project, given in the Dutch language. */
public enum Department { BOUW, ONDERHOUD }
Jelle
  • 2,014
  • 2
  • 12
  • 19
  • Glad you found a satisfactory approach. The accepted answer seems to differ from your chosen approach, though. Please consider changing the accepted answer to one of the others that matches with your chosen approach. – bishop Nov 29 '16 at 14:45
  • I changed the accepted answer. However, given the range in upvotes, I think it's also a personal choice, and I chose for this approach. – Jelle Nov 29 '16 at 15:08
2

If you are worried about having a string representation to show the user or something, just define a descriptions array inside your enum and expose a method.
Eg: Department.BUILD.getDescription(); will output "BOUW"

public enum Department { 
    BUILD,
    MAINTENANCE;

    private String[] descriptions = new String[] {
        "BOUW",
        "ONDERHOUD"
    };

    public String getDescription() {
        return descriptions[ordinal()];
    }
}

I know you chose otherwise, but just in case the google vortex throws people here by accident.

EDIT: As noted by Pokechu22 you can use enum constructors and private properties like this:

public enum Department {
    BUILD("BOUW"),
    MAINTENANCE("ONDERHOUD");

    private final String description;

    private Department(String description) {
        this.description = description;
    }

    public String getDescription() {
        return description;
    }
}

which will also achieve that effect.

SparK
  • 2,027
  • 1
  • 11
  • 9
  • 1
    You don't need an array. In java, enums can have (private) constructors and fields. – Pokechu22 Nov 29 '16 at 15:26
  • 1
    @Pokechu22, but is the value or ordinal available at the constructor to be able to match it to the description? I mean, you would still need an array inside de constructor to get the right description, right? – SparK Nov 29 '16 at 16:33
  • 1
    Nope, you can do it like this: `public enum Department { BUILD("BOUW"), MAINTENANCE("ONDERHOUD"); private final String description; private Department(String description) { this.description = description; } public String getDescription() { return description; } }` – Pokechu22 Nov 29 '16 at 16:49
  • @Pokechu22 Added to the answer. I also noticed in case the array increases, my implementation might break and increase by 2 lines each time, while yours will increase 1 line and won't break references. – SparK Nov 29 '16 at 20:17
0

Certain invariants of your code are expected to hold. One of those invariants is that a program will not behave differently when an identifier is renamed. In this case in particular, when you have an enum, and you rename any member of that enum, and update all the uses of that member, you would not expect your code to start functioning differently.

Parsing is the process of reading data and deriving datastructures from it. When you take the external data, read it, and create instances of your enum, you are parsing the data. That parsing process is the only part of your program responsible for maintaining the relation between the data representation how you receive it, and the shape and naming of the members of your datatypes.

As such, it shouldn't matter what names you assign to the members of the enum. That they happen to coincide with strings used in the data you read is coincidental.

When you design your code to model the domain, names of members shouldn't be related to the serialization format of the data. They should neither be the Dutch terms, nor should they be translations of the Dutch terms, but they should be what you decide fits the domain model best.

The parser than translates between the data format and your domain model. That's the last of the influence the data format should have on your code.

Martijn
  • 1,016
  • 9
  • 14