How do JIT interpreters handle variable names?

Question

Let's say I am to design a JIT interpreter that translates IL or bytecode to executable instructions at runtime. Every time a variable name is encountered in the code, the JIT interpreter has to translate that into the respective memory address, right?

What technique do JIT interpreters use in order to resolve variable references in a performant enough manner? Do they use hashing, are the variables compiled to addresses ahead of time, or am I missing something altogether?

I disagree that JIT interpreters should be called compilers, due to the fact that they translate IL or bytecode to native instructions during runtime and not at compile-time. — MathuSum Mut, Jun 13 '16 at 10:54
Translating source code to bytecode or bytecode to native code is the [definition of compiler](http://www.pcmag.com/encyclopedia/term/40105/compiler). There is nothing in the definition of compiler that specifies *when* that happens or that excludes it from running concurrently with the program it's compiling, so clarifying when it runs with JIT or AOT is useful. On the other hand, "JIT interpreter" is redundant since interpreters, [by definition](https://en.wikipedia.org/wiki/Interpreter_%28computing%29), run code "just in time." TL;DR: "JIT compiler" is the useful, meaningful term here. — 8bittree, Jun 13 '16 at 15:26
compilers in general dont care about variable names after parsing. it just becomes an address. unless the human needs it for debugging then it is a table of what the compiler needs to what the human needs. — old_timer, Jun 14 '16 at 01:48
https://en.wikipedia.org/wiki/Interpreter_(computing) "An interpreter generally uses one of the following strategies for program execution: translate source code into some efficient intermediate representation and immediately execute this." It falls under both categories. — MathuSum Mut, Jun 14 '16 at 03:48
I agree, a JIT compiler does fall under the category of interpreter, but, "JIT interpreter" is identical to just plain "interpreter," and therefore is a broader term that includes those interpreters that don't use an intermediate representation. — 8bittree, Jun 14 '16 at 14:14
Hmm, good point, I'll use the more standard nomenclature then from now on thanks. — MathuSum Mut, Jun 14 '16 at 18:14

score 7 · Accepted Answer · answered Jun 12 '16 at 23:28

7

Have a look at this example from Wikipedia:

for (int i = 2; i < 1000; i++) {
    for (int j = 2; j < i; j++) {
        if (i % j == 0)
            continue outer;
    }
    System.out.println (i);
}

which roughly translates into the following byte code:

0:   iconst_2
1:   istore_1
2:   iload_1
3:   sipush          1000
6:   if_icmpge       44
9:   iconst_2
10:  istore_2
11:  iload_2
12:  iload_1
13:  if_icmpge       31
16:  iload_1
17:  iload_2
18:  irem
19:  ifne            25
22:  goto            38
25:  iinc            2, 1
28:  goto            11
31:  getstatic       #84;           // Field java/lang/System.out:Ljava/io/PrintStream;
34:  iload_1
35:  invokevirtual   #85;           // Method java/io/PrintStream.println:(I)V
38:  iinc            1, 1
41:  goto            2
44:  return

Note that it reads very much like assembly language, where variables are stored at local addresses, and referred to directly by their address. There is no trace of the original variable names.

To find out how Java bytecode works in excruciating detail, you can consult Oracle's documentation.

Further Reading
The Java® Virtual Machine Specification.

answered Jun 12 '16 at 23:28

Robert Harvey

198,589
55
464
673

So variables are compiled into relative addresses at compile time? – MathuSum Mut Jun 12 '16 at 23:31
That's what it looks like to me. – Robert Harvey Jun 12 '16 at 23:31
Interesting observation :) – MathuSum Mut Jun 12 '16 at 23:32
2

@MathuSum Mut: in addition to compiling to relative address, the optimizer usually will also turn the IR to [Single Static Assignment Form](https://en.m.wikipedia.org/wiki/Static_single_assignment_form), where all variable assignments happen exactly once. SSA makes various optimizations easier and makes variable names mostly irrelevant. – Lie Ryan Jun 13 '16 at 04:22
I think we can add that if you try to debug a java code with was built at runtime, you will see all method parameters will be named "arg0, arg1,...". If you do so with java debug, variable names still exists. This even lead to a somewhat bug with Spring MVC with @PathParam annotation, if you precise a value but your path param have the same name than method's param, it will work in debug, but not in runtime. – Walfrat Jun 13 '16 at 09:08

Basile Starynkevitch · Answer 2 · 2016-06-13T19:57:49.563

Variables are mostly known at parsing time and their binding and scope is relevant for parsing. JIT compiling libraries don't really handle variables (and don't care much about their name, type and perhaps scope).

libjit handle values (which includes formals & locals, as locations).
GNU lightning deals with virtual registers in its instruction set
asmjit is tied to x86-64 and deals with registers and stack frame locations
GCCJIT deals with lvalues (including formals & locals, as locations)
LLVM internal language is mostly SSA

The main point is that a JIT would deal with "locations" or "values" not with "variables". So your bytecode won't know about "variables" (except perhaps thru debugging related meta-data).

If you are designing a JIT (and not designing and implementing your programming language) you should think in terms of locations and values, and not of variables. Perhaps you should think in terms of formal semantics (look at SECD for example of an abstract VM).

If you know some Scheme or Lisp, I recommend reading Queinnec's Lisp In Small Pïeces book. It is dealing with the many ways to implement Lisp like languages, including thru bytecode.

How do JIT interpreters handle variable names?

2 Answers2