Is it bad to use Unicode characters in variable names?

Question

I recently tried to implement a ranking algorithm, AllegSkill, to Python 3.

Here's what the maths looks like:

_{No, really.}

This is then what I wrote:

t = (µw-µl)/c  # those are used in
e = ε/c        # multiple places.
σw_new = (σw**2 * (1 - (σw**2)/(c**2)*Wwin(t, e)) + γ**2)**.5

I actually thought it is unfortunate of Python 3 to not accept √ or ² as variable names.

>>> √ = lambda x: x**.5
  File "<stdin>", line 1
    √ = lambda x: x**.5
      ^
SyntaxError: invalid character in identifier

Am I out of my mind? Should I have resorted for a ASCII only version? Why? Wouldn't an ASCII only version of the above be harder to validate for equivalence with the formulas?

_{Mind you, I understand some Unicode glyphs look very much like each other and some like ▄ (or is that ▗▖ ) or ╦ just can't make any sense in written code. However, this is hardly the case for Maths or arrow glyphs.}

Per request, the ASCII only version would be something along the lines of:

winner_sigma_new = ( winner_sigma ** 2 *
                    ( 1 -
                     ( winner_sigma ** 2 -
                       general_uncertainty ** 2
                     ) * Wwin(t,e)
                    ) + dynamics ** 2
                   )**.5

...per each step of the algorithm.

@Dominic You should have seen [the paper](http://research.microsoft.com/apps/pubs/default.aspx?id=67956). It's just eight pages... — badp, Nov 01 '10 at 10:56
Talking about unicode... http://www.codinghorror.com/blog/2008/03/i-entity-unicode.html — Sandeep Kumar M, Nov 01 '10 at 11:03
If the lines with μ are valid, then the problem isn't unicode in your code, the problem is that you're using a math symbol (which happens to be from unicode) as a variable name. — C. Ross, Nov 01 '10 at 12:09
Good question, but bad title - I've edited it to something more useful/descriptive, but if anyone thinks it could be improved further... — Peter Boughton, Nov 01 '10 at 13:26
@badp I mentioned this in my answer, but it was a bit off topic so I'll reiterate it here: you should check out Haskell. It allows you to define your own operators and use basically any symbol for a function name and it has a REPL so you can program interactively like in Python. Although functional programming requires a shift in mindset, I think you'd find it very accommodating to math problems like the one you posted. — CodexArcanum, Nov 01 '10 at 15:16
@DominicMcDonnell - It is not unreadable or unspeakable at all, read the math out loud and then the code. It actually reads out mostly the same. — Bjarke Freund-Hansen, Nov 01 '10 at 17:16
I find it a very good thing that Python doesn't accept arithmetic operations as variables. A square root sign should denote the operation of taking a square root, and should not be a variable. — David Thornley, Nov 01 '10 at 21:31
@David, there's no such distinction in Python. Indeed, `sqrt = lambda x: x**.5` gets me a _function_ (more precisely, a callable): [`sqrt(2) => 1.41421356237`](http://codepad.org/QafnDHQv). — badp, Nov 01 '10 at 21:34
Just for the record, I'd like it to be clear that the first code sample (with non-ASCII characters) is perfectly valid Python3 code. Python 3 does support σ and μ and γ and many other characters in variable names, but it doesn't support √. I think the list of allowed identifier characters is described by [this](http://www.python.org/dev/peps/pep-3131/) and is [this](http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html). — ShreevatsaR, Oct 21 '13 at 02:58
Reminds me of an article by P[oul-Henning Kamp](http://en.wikipedia.org/wiki/Poul-Henning_Kamp) titled [**"To move forward with programming languages we need to break free from the tyranny of ASCII."**](http://queue.acm.org/detail.cfm?id=1871406), where he discusses ASCII vs. Unicode in programming languages. — Vetle, Nov 01 '10 at 12:09
Python is right to not permit `²` in any variable name. When I see `x²` I think `x**2`. Anything else would be mightily confusing. — gerrit, Aug 08 '16 at 19:55
@gerrit that's an argument to have `²` be an built-in, then, rather than a thing you can't have (nevermind that `**` can be overridden) — badp, Aug 10 '16 at 07:01
I'm internally debating whether to name a function `_sRGB1_to_Jʹaʹbʹ` (vs something like `_sRGB1_to_J_a_b_` or `_sRGB1_to_Jpapbp`)... — endolith, Aug 15 '18 at 14:24
uber cool, it's only a matter of time until it catches on: https://rosettacode.org/wiki/Unicode_variable_names — Felipe G. Nievinski, Jul 01 '20 at 18:53
As someone who is having to code up some maths now, I love it! — Steve3p0, Jul 15 '20 at 06:17
In languages other than English there may be many identifiers not covered by ascii, so unicode will come handy there. For example RGB = Rot, Grün, Blau. — gnasher729, Apr 10 '23 at 22:13

Konrad Rudolph · Accepted Answer · 2023-04-10T15:37:56.327

59

I feel that just replacing σ with s or sigma doesn’t make sense and is counter-productive.

What’s the potential gain of such a replacement?

Does it improve readability? Nope, not in the slightest. If that were so, the original formula would have undoubtedly used Latin letters also.
Does it improve writability? At first glance, yes. But not really: because this formula is never going to change (well, “never”). There will normally be no need to change the code, nor to extend it using these variables. So writability is really not an issue.

But programming languages have one advantage over conventional mathematical notation: you can use meaningful, expressive identifiers. In mathematics, this isn’t normally the case, so we resort to one-letter variables, occasionally making them Greek.

But Greek isn’t the problem. Non-descriptive, one-letter identifiers are.

So either keep the original notation … after all, if the programming language does support Unicode in identifiers, there’s no technical barrier. Or use meaningful identifiers. Don’t just replace Greek glyphs with Latin glyphs. Or Arabic ones, or Hindi ones.

edited Apr 10 '23 at 15:37

answered Nov 01 '10 at 16:53

Konrad Rudolph

13,059
4
55
75

3

Some tools cannot read unicode characters, even though the programming language supports their use. I would not call it a brain-dead decision to use non-unicode variable names, and this still holds true 2.5 years after your post. – Gary S. Weaver May 10 '13 at 18:47
52

@Gary “Some tools cannot read Unicode” – so change the tools, they’re crap. Sorry, it’s 2013 and I have zero sympathy and even less patience for such tools. Incessantly catering to defective tools prevents progress. – Konrad Rudolph May 11 '13 at 14:54
What about e.g. Java Properties files which are required to be ISO-8859-1? Sure it is not source code nor do they have variable names, but they are key/value pairs of data. Unicode *must* be escaped. This is done deliberately to avoid interchange problems where one system uses one encoding, a different system uses another. Everyone has to use the same basic encoding which happens to be one of the most common 8-bit encodings in use. – Mar 04 '14 at 16:00
@John Well in that case the question doesn’t pose itself, you’ll surely agree: if a given encoding is required, you cannot use symbols from another encoding. – Konrad Rudolph Mar 04 '14 at 16:06
3

@KonradRudolph My point is that some tools do not and cannot support Unicode for whatever reason, so "change the tools" is not always the right answer. I agree that Unicode is good and tools should understand it, but that is not always an option. – Mar 04 '14 at 16:11
3

@John I maintain that “change the tools” is an appropriate answer. Your example in particular illustrates such a case: Java `.properties` files are trivial to parse. If you really happened to work with a tool chain which, backed by `.properties` files, didn’t support Unicode, it’s *entirely* reasonable to drop said tool chain (and either replace it yourself, find an alternative, or, in the worst case, commission one). Of course this doesn’t apply to legacy systems. But for legacy systems none of the considerations for best practices ever apply. – Konrad Rudolph Mar 04 '14 at 16:16
11

These "interchange" problems you speak of seem to be primarily the problem of Java and Windows developers. Most of the Linux world standardized on UTF-8 over a decade ago. It's definitely a toolchain problem. Stop using bad tools. – rich remer Jun 15 '14 at 03:30
@richremer most*nix systems including linux still runs on ascii not unicode. All devices (character,file etc.) carries byte based meaningful data (means ascii) not multibyte. So in linux world nothing is standartized on UTF-8 but still on ascii – obayhan Mar 02 '17 at 22:00
@obayhan No, that's completely wrong. Most Unix systems, including Linux, default to UTF-8 locales nowadays. – Konrad Rudolph Mar 02 '17 at 22:01
@KonradRudolph i am talking about something different. – obayhan Mar 03 '17 at 05:57
@obayhan What *are* you talking about, then? – Konrad Rudolph Mar 03 '17 at 07:56
You are talking about applications running on OS but i am talking about OS's itself. It is nice to see UTF-8 in userspace application support but for OS and its kernel,terminals, communicating mechanisms every character takes 1 byte not multibyte. What i am saying is UTF-8 likes a dream but in reality it is just translated stuff. So @richremer's " Most of the Linux world standardized on UTF-8 over a decade ago." suggestion is wrong. Maybe user space applications but not Linux World. – obayhan Mar 03 '17 at 08:27
@obayhan No, that's still wrong. Most parts of OS simply don't need to care what data they handle, they're encoding agnostic and just pass buffers through. But those services that do, handle the locale. Maybe you're confused because these APIs consume and produce `char*`. But that type is a badly named synonym for `byte*`, it doesn't imply that the API uses ASCII. – Konrad Rudolph Mar 03 '17 at 09:02
@KonradRudolph you skipped some parts. Most part of OS simply don't cares what data they handle is not true. For example naming in kernel modules still uses ascii because 1 char is 1 byte. And this is same for the rest of the core OS including modules,drivers, also for API's etc. Altough still there is no UTF-8 in core system. And when you unplug the core system rest is userspace applications. That is also what i said. – obayhan Mar 03 '17 at 12:28
1

@obayhan Your fundamental misunderstanding is the "1 char is 1 byte" part. In reality, while it's true that 1 `char` (the C type) = 1 byte, a character isn't limited to a `char` (the C type). Linux uses `char*` to represent *all* textual data, in particular also UTF-8. I am not aware of a single part in the Linux core that's tripped up by UTF-8 user input. Please feel free to provide counter-examples. – Konrad Rudolph Mar 03 '17 at 12:35
I am not discussing with you about what it must, i am discussing with you about what is it. If you have any example about usage of UTF-8 in core OS for example module naming etc., i will be glad to learn it. Otherwise there is no meaning for me to feed you example by example. – obayhan Mar 03 '17 at 12:44
@obayhan LOL no. *You* made the claim that Linux is tripped up by UTF-8. If you want to convince people, *you* need to provide evidence. If you don't care about convincing people, fine. But then why did you comment? — Furthermore, I can hardly provide negative evidence: I'm not claiming that kernel modules need to specifically handle UTF-8 (they generally don't), just that they don't fail with it. – Konrad Rudolph Mar 03 '17 at 12:53
@obayhan But you generally seem to confuse representation (storage) and interpretation (encoding) of data in all your comments so you need to understand that difference before further discussion methods sense. The simple fact remains, no part of the Linux core is tripped up by the presence of UTF-8 data, unlike many Windows applications. That's all we were discussing here before you arrived – Konrad Rudolph Mar 03 '17 at 12:56
lol kidding? In which part that i claimed that Linux is tripped up by UTF-8? I think you must read them all from beginning. Ok i tell you what i said simply. Get a cofee, relax and read slowly. " I'm not claiming that kernel modules need to specifically handle UTF-8 (they generally don't)." these are your words and i am saying the same thing after the richremer's sentence of "Most of the Linux world standardized on UTF-8 over a decade ago." (: From the beginning of our simple discussion i really didn't understand what are you trying to prove? – obayhan Mar 03 '17 at 13:02
@obayhan Then what the heck are you disputing here? The whole discussion is about whether using Unicode poses problems for tools due to lack of support. And **nothing else**. – Konrad Rudolph Mar 03 '17 at 14:55
@KonradRudolph I dont know how to make you understand but i disputed richremer's wrong sentence with noticing him "@richremer" then you jumped in. Dude i really dont care your patriotizm about UTF-8 and i am really bored with trying to make you understand something.My first sentence is really nothing about you or your toughts and we dont have to discuss it. Sorry i was trying to be kind but i am tired. – obayhan Mar 04 '17 at 20:35
@obayhan There's *still* nothing wrong in that comment. – Konrad Rudolph Mar 04 '17 at 20:52
@KonradRudolph goodnight. – obayhan Mar 04 '17 at 22:09

score 36 · Answer 2 · answered Nov 01 '10 at 11:21

36

Personally, I would hate to see code where I have to bring up the character map to type it again. Even though the unicode closely matches what's in the algorithm, it's really hurting readbility and ability to edit. Some editors might not even have a font that supports that character.

What about an alternative and just have up top //µ = u and write everything in ascii?

answered Nov 01 '10 at 11:21

TheLQ

13,478
7
55
87

Yeah, I used copy paste a bit for those sigmas. By the way, I just checked on the "dumb terminals" in Ubuntu, all glyphs except `γ` but including others not pictured like `µ` or `π` work. – badp Nov 01 '10 at 11:24
15

By the way, don't assume all keyboards expose standard coding keys comfortably. My keyboard layout needs _three_ keys to type `{` and `}` (which fails in ttys btw) and completely lack `\`` and `~`... how wouldn't any Bash script not require me to use a character map, if I wasn't using a custom keymap? :) – badp Nov 01 '10 at 11:31
4

I installed a greek keyboard alongside my native one, and can switch between those with a one keystroke. This is useful when talking about math on IM/email... and I already thought of using it in python scripts. – liori Nov 01 '10 at 12:20
19

Ugh. Just replacing the greek letters by plain ones? No gain whatsoever. Use meaningful variable names, or stick with the names from the paper. No reason to get creative. – Konrad Rudolph Nov 01 '10 at 16:46
12

Just don't mix up µ and μ... – endolith May 26 '11 at 21:38
4

Reasonable editors have reasonable input methods for Unicode which make it easy to edit code like this. For example, Emacs supports (among other things) the `TeX` and `rfc1345`. `TeX` is just what it sounds like; it lets you type `\sigma` for `σ` and `\to` for `→`. `rfc1345` gives you some combinations like `&s*` for `σ` and `&->` for `→`. As a rule of thumb, I do not worry about accommodating programmers using editors less capable than Emacs. – Tikhon Jelvis Dec 03 '12 at 06:33
4

Also, I think the Unicode makes mathematically oriented code *more* readable. It lets you get the meaning of the code at a glance, just like the formula it comes from. The letters already have well-known meanings from context. So if you're already familiar with the given formula or the general area, you can read the code without having to parse the identifiers. If you're not familiar with the formula, you should probably look it up even with long variable names. And once you've looked up and understood the formula, the Unicode version is again easier to read. – Tikhon Jelvis Dec 03 '12 at 06:38
@badp You could replace ` with the appropriate `$(` or `$)` which is more convenient anyway. And `~` is overrated, just use `$(if [ $UID -eq 0 ]; then echo /root; else echo /home/$USER; fi)`... (Or `$HOME` if you're not feeling like it) – Tobias Kienzler Feb 12 '13 at 07:11
6

If you're going to transcribe to Latin, at least have the decency to use *m* for µ, not *u*. – TRiG Jun 25 '14 at 11:12

Lie Ryan · Answer 3 · 2010-11-01T15:23:34.207

This argument assumes you have no problem with typing unicodes nor reading greek letters

Here's the argument: would you like pi or circular_ratio?

In this case, I'd prefer pi to circular_ratio because I've learned about pi since I was in grade school and I can expect the definition of pi is well ingrained to every programmers worth his salt. Therefore I wouldn't mind typing π to mean circular_ratio.

However, what about

winner_sigma_new = ( winner_sigma ** 2 *
                    ( 1 -
                     ( winner_sigma ** 2 -
                       general_uncertainty ** 2
                     ) * Wwin(t,e)
                    ) + dynamics ** 2
                   )**.5

or

σw_new = (σw**2 * (1 - (σw**2)/(c**2)*Wwin(t, e)) + γ**2)**.5

To me, both versions are equally opaque, just like pi or π is, except I didn't learn this formula in grade school. winner_sigma and Wwin means nothing to me, or to anyone else reading the code, and using neither σw doesn't make it any better.

So, using descriptive names, e.g. total_score, winning_ratio, etc would increase readability much better than using ascii names that merely pronounce greek letters. The problem isn't that I can't read greek letters, but I can't associate the characters (greek or not) with a "meaning" of the variable.

You certainly understood the problem yourself when you commented: You should have seen the paper. It's just eight pages.... The problem is if you base your variable naming on a paper, which chooses single-letter names for conciseness rather than readability (irrespective whether they're greek), then people would have to read the paper to be able to associate the letters with a "meaning"; this means you're putting an artificial barrier for people to be able to understand your code, and that's always a bad thing.

Even when you live in an ASCII-only world, both a * b / 2 and alpha * beta / 2 are an equally opaque rendering of height * base / 2, the triangle area formula. The unreadability of using single-letter variables grows exponentially as the formula grows in complexity, and the AllegSkill formula is certainly not a trivial formula.

Single letters variable is only acceptable as a simple loop counter, whether they are greek single-letters or ascii single-letter, I don't care; no other variables should consist solely of a single letter. I don't care if you use greek letters for your names, but when you do use them, make sure I can associate those names with a "meaning" without needing to read an arbitrary paper somewhere else.

When in grade school, I definitely wouldn't mind seeing mathematical expressions using symbols such as: +, -, ×, ÷, for basic arithmetics and √() would be a square-root function. After I graduated grade school, I wouldn't mind the addition of a shiny new symbols: ∫ for integration. Note the trend, these are all operators. Operators are much more heavily used than variable names, but they are less often reused for an entirely different meaning (in the case where mathematicians reuse operators, the new meaning often still holds some basic properties of the old meaning; this is not the case for when reusing variable names).

In conclusion, no, it's not bad to use Unicode characters for variable names; however, it's always bad to use single letter names for variable names, and being allowed to use Unicode names is not a license to use single letter variable names.

To be honest, the formulas here do not make more sense even if I were to use `error_on_measured_skill_with_99th_percent_confidence` instead of `sigma`. — badp, Nov 01 '10 at 15:33
@badp: Long names != Good names. Nevertheless, there are occasions where it is impossible for you to choose a good name (e.g. when you only understand the formula, but don't fully comprehend what each parts of the formula do (which takes a wholly different level of comprehension)), then in that case, the second best alternative is to cover up your ass with some comments (better than sending them off to an external paper). Add a data dictionary that explains what the variable names refers to, e.g. `// σw = skill level measurement error`, etc — Lie Ryan, Nov 01 '10 at 15:40
@badp: To be honest, with just that information, that sigma refers to some fudge factor (so to speak), it gives me slightly better understanding of the formula than what sigma strikes me. When the formula is hard to understand to begin with, you don't want to add more opaqueness on top of it. — Lie Ryan, Nov 01 '10 at 16:36
Yes. This. Unfortunately, I overlooked it when writing my answer. — Konrad Rudolph, Nov 01 '10 at 16:54
Well, anyone working in anything related to statistics knows that *σ* means "standard deviation". It's a very well known standard symbol in that domain. — TRiG, Nov 21 '12 at 00:58
If I'm working in a scientific context where I'm clearly doing something related to wavelength, then `λ_a` and `λ_b` may be more readable for physicists than `wavelength_a` and `wavelength_b`, in particular if both occur repeatedly in a long equation. There are good reasons why mathematicians and physicists prefer single-character symbols, possibly subscripted. — gerrit, Aug 08 '16 at 19:32

score 14 · Answer 4 · answered Nov 01 '10 at 11:19

14

Do you understand the code? Does everyone else who needs to read it? If so, there's no problem.

Personally I'd be glad to see the back of ASCII-only source code.

answered Nov 01 '10 at 11:19

Done. (I assume the last line was you asking to see the ASCII-only version of the code?) [ ](http://~) – badp Nov 01 '10 at 11:24
4

@badp: No, it was me asking to see the death of ASCII-only code. – Nov 01 '10 at 11:43
until you begin to see what happens to Unicode source files when landing on a Windows 1252 system... – Nov 01 '10 at 11:46
1

@Thorbjørn: if they contain the BOM, then hopefully nothing will happen. – Nov 01 '10 at 14:04

score 10 · Answer 5 · answered Nov 01 '10 at 12:16

10

Yes, you are out of your mind. I would personally reference the paper and formula number in a comment, and write everything in straight ASCII. Then, anyone interested would be able to correlate the code and the formula.

answered Nov 01 '10 at 12:16

zvrba

3,470
2
23
22

5

It was difficult for me to make sure the code and the formula _matched_ in the first place... – badp Nov 01 '10 at 15:17
This, absolutely. Unicode is a mess; it'll be 10 years before it's supported in most places, and probably 15 more years before there arises a clear winner between utf(8|16|32). – Paul Nathan Nov 01 '10 at 15:46
@badp: You could have introduced intermediate variables for subexpressions to make it easier. – zvrba Nov 01 '10 at 16:35
11

@Paul: luckily, Unicode is > 10 years old so that objection’s been taken care of. And although there’s no clear winner between the different UTFs, that’s not an issue: there wasn’t supposed to be one. Telling them apart is trivial for software. – Konrad Rudolph Nov 01 '10 at 17:00
1

@Konrad: I mean 10 years from *now*. A fair number of programs still don't support Unicode. Further, I disagree with your assertion - It is not trivial to write a generic reverse routine that handles all 3 utfs. There needs to be a clear winner. There's no sense in supporting 3 different UTFs (let us not consider the other code pages still extant). – Paul Nathan Nov 01 '10 at 22:57
3

@Paul: How often do you need to write a "generic reverse routine"? The three UTFs serve different purposes, and I don't think you're ever going to get your wish of consolidation. – Dean Harding Nov 01 '10 at 23:23
@Dean: it was an example of the difficulty of correctly writing unicode algorithms. And, if wishes were nickles, I'd be a rich, rich man. – Paul Nathan Nov 02 '10 at 00:23
8

@Paul: screw these programs. There are enough good editors that know how to handle Unicode. If some editor still hasn’t got on the bandwagon, let economic selection take care of it. And as Dean said, the UTFs serve different purposes. It’s a *good* thing that they exist. And I don’t see the point in your multiple reverse routines. You only need to write it *once* (ignoring normalization forms for now): for code points, not for individual UTFs. – Konrad Rudolph Nov 02 '10 at 07:43

score 5 · Answer 6 · answered Nov 01 '10 at 13:55

5

I would say using Unicode variable names is a bad idea for two reasons:

They're a PITA to type.
They often look almost the same as English letters. This is the same reason why I hate seeing Greek letters in math notation. Try telling rho apart from p. It's not easy.

answered Nov 01 '10 at 13:55

dsimcha

17,224
9
64
81

6

Depends what you're using to type them. – endolith May 26 '11 at 21:42

score 4 · Answer 7 · answered Nov 01 '10 at 17:58

In this one case, a complex maths formula, I'd say go for it.

I can say in 20 years I've never had to code something this complex and greek letters keeps it close to the original maths. If you can't understand it, you shouldn't be maintaining it.

Saying that, if I ever have to maintain µ and σ in bog standard code that you bequeathed me, I will find out where you live...

score 3 · Answer 8 · answered Nov 01 '10 at 11:34

3

Pro: it looks nice
Con: the unicode characters and so the whole meaning might get lost in the tool chain (editor, code formatter, version control, older compiler)

How big is the risk for you? Does the gain outweigh the risk?

answered Nov 01 '10 at 11:34

LennyProgrammers

5,649
24
37

2

Tool chain? What tool chain? – badp Nov 01 '10 at 11:36
2

Editor, code formatter, version control, older compiler. Every tool and person touching your file. I've had bad experience with tools messing up with unicode files, YMMV. – LennyProgrammers Nov 01 '10 at 12:04

score 2 · Answer 9 · answered Nov 01 '10 at 12:02

2

Sometime in the not too distant future, we'll all be using text editors / IDEs / web browsers that make it easy to write edit text including Classical Greek characters, etc. (Or maybe we'll all have learned to use this "hidden" functionality in the tools we currently use ...)

But until that happens, non ASCII characters in program source code would be hard for many programmers to handle, and are therefore a bad idea if you are writing applications that might need to be maintained by someone else.

(Incidentally the reason you can have Greek characters but not square root signs in Python identifiers is simple. The Greek characters are classified as Unicode Letters, but the square root sign is a non-letter; see http://www.python.org/dev/peps/pep-3131/ )

answered Nov 01 '10 at 12:02

Stephen C

25,180
6
64
87

I think that it would be a great idea to make an IME which could translate characters for users who can't directly input them. – AndrejaKo Nov 01 '10 at 15:15
Yeah, more or less when we'll have switched to DVORAK. :( – badp Nov 01 '10 at 15:35
1

@AndrejaKo Linux does have a IME that accepts LaTeX style commands -- that is, you type `\mu` and it insterts `µ`. – badp Nov 01 '10 at 15:36
@badp Thanks a lot! I'll try that next time I boot! – AndrejaKo Nov 01 '10 at 15:40
Emacs supports a bunch of nice input methods that make typing Unicode symbols easy. (Including a TeX one which is what I use.) Emacs is hardly futuristic. (It *is* awesome, of course.) – Tikhon Jelvis Dec 03 '12 at 06:53

tcrosley · Answer 10 · 2010-11-01T15:28:33.430

You didn't say what language/compiler you are using, but usually the rule for variable names is that they must start with an alphabetic character or underscore, and contain only alphanumerics and underscores. A Unicode √ would not be considered alphanumeric, since it is a mathematical symbol instead of a letter. However σ might be (since it is in the Greek alphabet) and á would probably be considered alphanumeric.

score 2 · Answer 11 · edited May 23 '17 at 12:40

I posted the same kind of question on StackOverflow

I definitely think that it worth using unicode in heavy math-related problems, because it makes it possible to read the formula directly, which is impossible with plain ASCII.

Imagine a debugging session: of course you can always hand-write the formula the code is supposed to compute to see if its correct. But ninety percent of the time, you won't bother and the bug can stay hidden for a long, looong time. And no one is ever willing to look at this abstruse 7-line, plain ASCII formula. Of course, using unicode isn't as good as a tex-rendered formula, but it is way better.

The alternative of using long descriptive names is not viable because in math, if the identifier is not short, the formula will look even more complicated (why do you think people, around the XVIII century, began to replace "plus" by "+" and "minus" by "-" ?).

Personnally, I would also use some subscripts and superscripts (I just copy-paste them from this page). For instance: (had python allowed √ as an identifier)

√ = math.sqrt #function alias
c² = c**2
σʷ² = σʷ**2
γ² = γ**2
σ′ʷ = √(σʷ² * (1 - (σʷ²/c²)*Wʷⁱⁿ(t, e)) + γ²)

Where I used superscripts because there is no subscript equivalent in unicode. (Unfortunately, the unicode subscript character set is very limited. I hope that one day, subscripting in unicode will be considered as diacritics, i.e. a combination of one char for subscript, and another char for the subscripted letter)

One last thing, I think this conversation about using non-ASCII character is primarily biased, because many programmers never deal with "formula intensive mathematical notations". So they think that this question is not that important, because they never experienced a significant portion of code that would require the use of non-ASCII identifiers. If you are one of them (and I was until recently), consider this: suppose that the letter "a" is not part of ASCII. Then you will have a pretty good idea of the problem of having none of greek letters, subscripts, superscripts when computing non-trivial math formulas.

score 1 · Answer 12 · answered May 01 '11 at 11:59

personally I am motivated to consider programming languages as a tool for mathematicians in this context, as I don't actually use math that looks anything like that in my life. :D And sure, why not use ɛ or σ or whatever — in that context, it is actually more legible.

(Although, I have to say, my preference would be to support superscript numbers as direct method calls, not variable names. eg 2² = 2 ** 2 = 4, etc.)

score 0 · Answer 13 · answered Nov 01 '10 at 18:05

Is this code just for your personal project? If so, go nuts, use whatever you want.

Is this code meant for others to use? i.e., and open source app of some sort? If so, you're likely just asking for trouble because different programmers use different editors, and you cant be certain all editors will support unicode correctly. Plus not all command shells will show it correctly when the source code file is type'd/cat'd, and you may run into issues if you need to display it within html.

score -2 · Answer 14 · answered Nov 01 '10 at 13:18

-2

What the hell is σ, what is W, what is ε, c and what is γ?
You are to name your variables in a way that explains what their purpose is.
I'd personally beat up anyone who'd leave the Unicode or the ASCII-version for me to maintain, although the ASCII-version is better.

What is evil is calling variables σ or s or sigma or value or var1, because this doesn't convey any information.

Assuming you write your code in English (as I believe you should wherever you are from), ASCII should suffice to give your variables meaningful names, so there is no actual need for Unicode.

answered Nov 01 '10 at 13:18

back2dos

29,980
3
73
114

2

what if he did a copy/paste of the paper and then made it part of his source code as a comment despite the one character variable names? – Engineer2021 Nov 01 '10 at 13:25
23

A lot of these variable names have strong meanings to those familiar with the problem domain. To someone familiar with the domain, English names might be *less* readable than names like sigma or rho. – dsimcha Nov 01 '10 at 13:50
3

I'm afraid something like `rank_error_with_99_pct_confidence` is a bit too long for this and wouldn't actually make the formulas any easier to understand. AllegSkill/TrueSkill call those sigma, so I believe it's perfectly acceptable of me to maintain the domain specific name they have. – badp Nov 01 '10 at 15:37
4

@badp: good names is concise and descriptive; but it does not have to be fully descriptive. For your sigma, it is perfectly good to use `rank_error` and put the extra detail about 99-percent confidence in the documentation/comment somewhere. – Lie Ryan Nov 01 '10 at 17:05
1

@dsimcha: I think those familiar with a particular domain are significantly rarer than those, who never heard of it. And I think those familiar with the domain will be able to cope with plain english names, whereas those not familiar with it will be completely unable to understand what's happening if everything is obfuscated by greek one-letter-variables. – back2dos Nov 01 '10 at 17:35
1

@back2dos: Would you really expect people who know nothing about some non-trivial problem domain to be able to maintain code that deals heavily w/ that problem domain? I would consider that unreasonable. Whenever I write code I write it assuming reasonable knowledge of the problem domain. – dsimcha Nov 01 '10 at 17:52
mu, theta, delta, lambda... better than their standard and accepted maths greek letters? One hopes that whoever maintains this code has some maths knowledge... – gbn Nov 01 '10 at 17:53
@dsimcha: Your argument is less than vaguely related to the situation. The code discussed is a simple formula, that could be self-explanatory if variables were named in a sensible manner and 1 or 2 lines of comments could make it even better. – back2dos Nov 01 '10 at 20:18
Anyone working on anything even vaguely related to stats knows what σ means. I'm not familiar with the meanings of the other letters, but I'm perfectly prepared to believe that they are actually *more* meaningful than your proposed alternatives to the people actually working in the area. – TRiG Apr 17 '13 at 00:30

score -2 · Answer 15 · answered Jun 15 '14 at 03:35

For variable names with well-known mathematical origins this is absolutely acceptable - even preferred. But if you ever expect to distribute the code, you should place these values in a module, class, etc. so that IDE auto-complete can handle "typing" the strange characters.

Using √ or ² in an identifier - not so much.

Is it bad to use Unicode characters in variable names?

15 Answers15

Linked

Related