4

I'm on a quest to write a LISP-ish language that compiles to Brainfuck. Well, it's a stack of intermediate compilers actually. Currently I'm trying to write the one that transforms this code:

a++b+a

into:

++>+<

It's standard Brainfuck, but with variables added.

The compiler is responsible for two things:

  1. Assigning a and b some address in the memory, say, 0 and 1
  2. Translating each occurrence of a and b into a number of <'s or >'s so the memory pointer ends up at the right location

Now, here's the thing: in order for the compiler to know how many <'s to put, it needs to know where the memory pointer currently points to. At first it points to 0, but as the code gets executed, the pointer moves around and to get to a you need to move relatively from where you're at.

So the only way to know the value of the memory pointer would be to execute the code and replace variable names as you go. "Oh, mempoint is now 4, I see an a and it's location is 1 so i need to move three places left: <<<"

If I close my eyes I can hear a voice saying: "you're trying to solve the Halting Problem"

So my question is: am I really? Is there any way around?

Andrej T.
  • 149
  • 3
  • 2
    Since Brainfuck is Turing Complete, yes, there is guaranteed to be a way around. That said, it might be ugly, and nothing springs to mind without modifying the target language slightly. – Telastyn May 23 '15 at 14:31
  • 1
    If you glance at BrainFuck in the esolang (esoteric languages) wiki under [notable implementations](https://esolangs.org/wiki/Brainfuck#Notable_implementations), you will see two instances of something to BF compilers: Basic to BF and C to BF. –  May 23 '15 at 14:51

2 Answers2

4

It depends on whether you want to replace < and > with variables or really add variables to the language. In the first case, you probably lose Turing completeness (because you only have finitely many memory cells). It may remain Turing complete if each cell can hold arbitrarily large values, but honestly I'm not sure.

Anyway, if there are no < and > operations, the translation is simple: You know which variable the code last accessed so you just need to compare the addresses of the previous and next variable. Only loops are a bit tricky, because with code like a+[b+] the current cell might be either a or b, both in the loop body and after the end of the loop. One simple workaround is forcing all paths to end on the same cell, i.e., rewriting the code to a+[b+a] or a+b[b+] (which could be simplified to a+b[+]).

If bare > and < remain available in your source language, then things are more tricky. As a comment points out, Brainfuck is Turing complete so you can just build an equivalent program with enough complication, but a simple and efficient translation may not always be possible. Statically determining the current cell at any given point generally requires knowing the trip count of all loops up to that point. Even ignoring the potential reduction from the Halting problem (I'm not convinced that that actually applies here), the trip count and thus the memory location frequently depends on the input, i.e. can not be predicted statically even by an oracle machine.

0

Okay, to answer my own question, there is a way around.

Compile time vs. runtime

Calculating the number of <'s statically is indeed impossible, as few people have suggested. Sometimes whether code is executed depends on some input and there's no way around that. An example: ,[>]

So the only way to do it is to search for cells at runtime. If you're able to find the beginning of the memory space (address 0), you can just jump 2 times to the right to get to address 2.

Finding a zero is easy in Brainfuck: [<] # do something here

However, you don't want to stop at some arbitrary cell that happens to have the value 0, you need to make sure it's the actual beginning of the memory zero!

This is really inconvenient. Whichever value you choose to represent the starting address with, it can accidentally show up in the memory.

Dividing the memory space

So I divided the memory in what I call memory space and value space. Odd cells are values, even cells are "addresses". Well, there're not exactly unique numbers identifying cells, but more of a "ladder" that you use to move around.

Here's the layout of the memory, after it's mapped:

memory = [0, 0, 1, 0, 1, 0 ...] start ^ ^ addr ^ value

Now every time you search for a location, you go left < once to enter the address space, then loop until you find a zero, then move right n times.

This schema requires you to fill the memory with alternating 1's and 0's before the program is executed. Given a fixed memory size, the compiler can generate code that does this for you.

This solution is not perfect. To begin with, you lose half of the memory. Filling it up with 1's is also annoying and costs you time at the start of each program.

Appendix: marking the start with a "1"

You might be thinking: why not invert the schema and mark address zero with a 1 and leave all other addresses to be 0? That way you don't need to fill up the memory with 1's.

Let's try.

The code we need, translated into words would be:

  1. go left one cell to enter the address space
  2. if the value is 0, then move left 2 cells (keep searching)
  3. else (you've reached the start) go right N places

You need to write an if / then / else construct. In Brainfuck this means copying some values to temporary locations (we've got plenty of those!) then executing the correct branch, then cleaning up.

But in our case "executing the code" means "go to some other location and never come back" so it's impossible to clean up the temporary variables. And if they're not 0 you won't find the start again.

I tried to find a way around this but I couldn't.

Here's the code for the compiler: https://gist.github.com/whoeverest/3b15f2a06fc32333f342

Andrej T.
  • 149
  • 3