4

I have been reading every now and then on the virtual machines of programming languages like Java, Python and Lua. They all have a notion of bytecode, into which the source code is translated and that is excutable on a virtual machine (register or stack based).

Now on an x86 architecture, all code resides in RAM which is addressable using the CPU's address space. An instruction pointer register points to the position in memory that is currently executed. Jumps modify this instruction pointer register, but basically software in memory is linearized into one piece of RAM.

With Virtual Machines, I am not so certain. Before executing the virtual machine, is all bytecode copied/linked together into a contiguous array with all instructions? Or does the VM keep various modules in different bytecode pages that are swapped/exchanged as needed?

wirrbel
  • 3,018
  • 2
  • 21
  • 33

1 Answers1

4

Most VMs do not put all instructions into a single array, for various reasons:

  • Instructions might not have constant size, so a piece of memory might not be interpreted as a C-like array of instructions. OK, that's a bit pedantic. However, there are interpreters that don't use a sequence of opcodes, but a graph that's traversed during evaluation.

  • It's most sensible to have one bytecode structure per function or method. This makes it easy to attach metadata to a function, which makes debugging significantly easier. I've seen various approaches to this: there might be a metadata instruction that's otherwise a no-op, or a function might be represented as some object that contains a bytecode structure among other fields.

  • Using separate bytecode structures affords us more safety. If my VM contains a bug and doesn't stop at the end of the code unit, I might have an arbitrary code execution vulnerability. But if I have clear bounds for my bytecode structure, it's easier to detect when something is off. This is especially important if the VM allows jumps/gotos but wants to restrict them to inside the current function.

  • Most VMs support runtime loading of more code. This code has to be placed somewhere, and has to be made accessible to the rest of the code. Even more fun, some languages allow runtime code generation. That gets difficult if I have to fit that into a single non-growable array.

amon
  • 132,749
  • 27
  • 279
  • 375