Colons in internationalized UI

Question

When using gettext or a similar framework, should the translation strings include ':' (like "Label:") for generating labels for UI or should I add ':' in UI code itself?

I _think_ you may have a decent conceptual programming question here, but it could really use some additional details. Please [edit] your question to provide more specifics and what you've tried so the community can better answer your question. — , Apr 02 '14 at 15:40
Is the colon in common usage in all the languages you intend to translate to? If NOT, then you may need to include the colon or its equivalent in the resource strings for each language. SO that should be your first question. — paulkayuk, Apr 02 '14 at 15:56
@paulkayuk If you write code which depends on the answer to that question, then it means you're taking on the internationalization, rather than leaving it to the translators. — Kaz, Apr 04 '14 at 17:29
... and it prevents from extending the target locales to such who don't use colons. — JensG, Apr 04 '14 at 20:20

score 11 · Accepted Answer · answered Apr 04 '14 at 17:53

The colon can be regarded as a punctuation symbol. It is a convention which is part of the text, just like a period or question mark.

We wouldn't leave out the question mark from "Are you sure you wish to exit?" and then write code to add the question attributes in a language-dependent manner to the translated string. Doing so means that the UI code is unnecessarily taking on the responsibilities of knowing how to punctuate sentences in various languages: a responsibility which can be handled in the message substitution.

There is an intermediate possibility. It is likely that, just like all labels have the colon feature in English, in other languages labels also have some lexical element in common. That element, if present, is probably some kind of prefix or suffix, or both.

You could represent the labels without the adornment, and then have an additional translatable string which provides a schema for adding the label adornment. As an example, suppose that C sprintf style formatting is available in the UI application. The English version of the label-generating format string would be "%s:", and that could be the default as well, since it works in some other languages. The UI translators can replace this "%s:" with whatever they see fit. One of the possibilities is that they can replace it with just "%s" (do nothing) and then specify the full, adorned representation of each label in the translated string table. So this approach even handles some strange possibilities where the lexical marker which denotes a label has to be inserted into the middle.

This approach doesn't seem worthwhile, if all it achieves is a slight compression in the representation of label strings: the removal of one colon character. If you have to write 100 characters of extra code for this, you have to remove colons from 100 labels just to break even: and that doesn't even take into consideration justifying the time spent.

There has to be some payoff for this: namely that the application uses the strings for purposes other than just generating labels, such as generating sentences which refer to the UI fields by name. Say that a dialog box has a "User ID:" label for entering text. If you have generic logic which produces the message "You have entered an invalid user ID." by combining a sentence boilerplate text with the name of a UI element, then it's necessary to have the unadorned string "user ID", and pass it through a label-making function to generate "User ID:".

You've already had to modify the string "User ID" to fit into the english sentance by changing the U to lower case. I imagine in other languages grammar rules mean it might need modifying in other ways to fit the context - or the context might have to be changed, e.g. perhaps the form of the verb 'entered' depends on the gender of 'user ID'. So I think it's generally a bad idea to concatenate strings to make a sentance in code if you need i18n to work. — bdsl, Nov 12 '20 at 09:16

score 10 · Answer 2 · answered Apr 02 '14 at 16:27

For many languages, there is no one-to-one translation from English word and phrases, but multiple translations that are context sensitive.

To make the life of translators easier, you should provide as much context for the strings as possible. That includes colons in labels and contextual information where those labels are being used.

As ground rules, in an internationalized UI you should

not modify translated strings, except to fill in parameters with their actual values. So, don't add the colons after the fact.
not cut strings into parts around parameters. Especially if there are multiple parameters to be filled in, you can be sure that there will be at least one language where it would be more natural to have the parameters the other way around.
be really careful with singular/plural forms. There is no common pattern how to create plurals from singulars, or even how many plural forms there are.

Filling in parameters is one heck of a task. Not from the programming side, mind you. Russian may have some complex rules on plurals, but they're still better than the requirements written by a typical manager. Translators just don't understand parameters. — MSalters, Apr 04 '14 at 16:02

score 8 · Answer 3 · answered Apr 04 '14 at 17:39

8

I've finally decided to use entire strings (strings with colons in this case) in my i18n files.

The reason for this that in French there should be a space before colon. So the best way to encode French is to put colons (with spaces before) in translation strings.

No, we do not translate to French. But this is an example for a general rule of behavior: Put colons in translation strings, not in UI code.

answered Apr 04 '14 at 17:39

porton

752
1
7
20

For full-flow prose, I think this is the best solution, as it avoids having to work out a whole lot of program rules, keeping everything much simpler for maintenance. There is enough workflow around internationalisation without programmers having to liaise with the minutiae of each language with translators. For those situations where strings are built, such as error messages, separate symbol strings for each scenario may be better. – Patanjali Nov 12 '20 at 08:08

Sharky · Answer 4 · 2016-09-08T09:11:04.553

An application language file is not just a dummy translation of words. It is a process where you translate the words and their punctuational "presentation" in the correct meaningfull context.

Hello? in Spanish is ¿Hola? in Arabic is مرحبا؟. As you can see you can't just store a Hello or Hola or مرحبا and then in the UI just do a the_hello_text + "?". It will not produce the correct output. It is obvious that punctuation need to be taken care at the language file. That means it is not the GUI's concern to "add" a questionmark or a colon at the end of a string.

Punctuation and everything must be inside the internationalization file, ready to be outputted to the UI.

The only thing UI should be concerned about, is the correct presentation of this ready-to-be-otputed text, like align right if is an RTL language. But that's another story and has nothing to do with plain-text internationalization language files per se.

Patanjali · Answer 5 · 2020-11-12T08:12:31.563

The optimal approach is to embed the characters in the string for each locale, as that typically ensures that the context is correct, assuming you have done your research as to what your target audience expects. It is also simpler to manage.

For program-built strings, such as error messages, it may be better to put the symbols in a separate internationalised string, as different languages use different symbols for the same grammatical scenario. For example, Armenian uses the colon as its full stop, so one would have a 'sentence terminator' string. Another is a 'word separator' string, which would be blank for many languages.

Each country typically provides a style guide, which dictates all the 'correct' places for punctuation. So, when I started to work out how I would handle quotes for different languages for some web design tools, I first looked at such style guides.

However, while written publications tended to follow style guides, the online world is quite different! Typically, a large number of non-English European and South American sites use US style quotes, as opposed to the guillemets (« ») of their style guides. Just shows how much the US domination of the early web permeates online language usage around the world.

The MIT Foreign Language News and Newspapers: Home has links to hundreds of online sites. Looking at these helped me find the best approach for my dilemma, which was to provide the facility for the site owner to select one of the 19 most popular sets of quotes-embedded quotes combinations appropriate for their target audience.

Chrome tries to automatically use a country's style guide, but fortunately it can be overridden by specifying a locale in the lang attribute of the q tag. This highlights the problem with automatic approaches that don't take into account the real world, but rely upon theory for their implementations.

To the OP, research those online newspaper sites to see what are actually used in various countries, so that you can see which approach will give the more consistent results.

While some languages have traditionally used a different character for the English colon, online usage may target audiences used to that colon. Also, different locales may have different usage, requiring specifying language strings by each full locale, rather than just by language.

score 2 · Answer 6 · edited Apr 04 '14 at 15:21

When I was doing my own internationalisation a few years ago, it worked something like this:

Messages in the source code were written in English
Messages were translated at runtime by applying a translation (which appeared in the source as translate("string") or more usually /"string"). There was expected to be a pre-built dictionary of messages and translations.
When translating a message, white space at each end was trimmed, and trailing punctuation was removed, as was any capitalisation. After translating what was left, these were put back.
To provide more context, I sometimes added a comment to the string, which was part of the translation process to help find the best match, but the comment was then discarded.

So, with a message such as " Disk: ", the string "disk" was translated into, say, "disquette", and then recomposed as " Disquette: ". This reduced the number of very similar messages.

I only did this for a small number of western European languages; probably there would be problems with more exotic ones. However I was using a scripting-type language for this so some string processing could be used for whatever problems came up: when I needed to translate "G" (short for Green), it appeared in the source as left(/"Green"), translated to something like "vert" and reduced to "V".

However I'm not familiar with current frameworks and how they might work; don't they provide any guidelines for dealing with these types of issues?

Just a note where your translation/capitalization scheme would likely break: In Dutch there is a digraph of i and j that should be treated as one character. For example, when translating "Ice" to Dutch, the proper translation is "IJs", not "Ijs". In other languages, similar issues with capitalization exist. — Bart van Ingen Schenau, Apr 05 '14 at 05:57
There’s a difference between upper casing and capitalising. That ij is capitalised to a letter Ij and uppercase to IJ. — gnasher729, Nov 12 '20 at 13:48

Konrad Morawski · Answer 7 · 2014-04-05T11:54:42.057

It may depend on the localization system that you use, but having other things equal, I would personally avoid adding any punctuation (unless within a phrase of course, where its use is dictated by the grammar), because I feel they're part of the presentation, like font size etc., and not really the content. So we're mixing different things with this approach.

After all, the same words and phrases may be needed both with punctuation and without. Eg. you can have "Enter subject:" caption next to a textbox, but also "Enter subject" as a window title.

Does it make sense to have both of them translated separately?

When you decide that colons actually look bad and redundant in the UI, you'll have to retranslate all language versions. Which is a bit silly.

PS. The "ground rules" given by @bartc are valid either way - whether or not you include punctuation marks in translated strings.

PPS. @paulkayuk, too, raises a good point (in his comment) - that culture specifics should be taken into consideration as well. If you've got things like mirrored question marks in Spanish, include them in your translation of course. My answer assumes uniform, language-agnostic punctuation, because that seems to be the debatable bit.

When talking about punctuation, usage of particular symbols can differ between languages -- Armenian uses the colon as a full stop -- so it is best to have a separate internationalised script for the symbols for each grammatical usage scenario, but perhaps with options for alternates, like I wrote about for quote symbols in my answer, just because actual usage is not necessarily 'obeying' long-held language grammatical 'rules'. — Patanjali, Nov 12 '20 at 07:41

Colons in internationalized UI

7 Answers7