So, I'm a little late to the game, but these questions (particularly about DOES>) were mystifying me as well, being new to Forth. Here is what I've learned and how I've implemented it:
[TL;DR: "CREATE" makes a word with a simple, default behavior. "DOES>" does not return to its caller. Instead, it uses the return address to put a "goto" in the most recent definition.]
CREATE does not take anything from the stack or return anything. It parses a word from the input and makes a dictionary entry for it. It does fill in the code for the newly-created word with standard boilerplate code that pushes an aligned address on the stack and simply returns (the same aligned address that subsequent "," (comma) calls would fill in with data). In my system, the generated code for something like CREATE NewVar
would look like this:
NewVar: push_data next_addr
return
next_addr:
Therefore, we could define (initialized) VARIABLE as:
: VARIABLE CREATE 0 , ;
or, in pseudo-machine code:
VARIABLE: call CREATE
push_data 0
call comma
return
Saying something like VARIABLE NewVar
would then make NewVar as a word that does the "push_data/return". The 0 ,
then stores a zero at the address that NewVar puts on the stack -- the "next_addr" shown in the code snippet. Doing things like NewVar @
or 42 NewVar !
then reads and writes that location.
There is nothing special (at least in my system) about the words between CREATE and DOES> or even the words after DOES> in terms of compilation. A word whose definition uses CREATE and DOES> is compiled normally, making sure that DOES> is "call"ed in the compiled code. The special thing that DOES> does is as follows: It finds the code location of the last-created word, and then it overwrites the "return" instruction with a "jump" instruction, the destination of which is the address on the return stack of the DOES> routine. This address is popped off the return stack every time DOES> is called, being used to make "jump" instructions. When DOES> then tries to return to its caller, it is actually returning to whomever called the word that had DOES> in it... not to the remainder of code. My implementation of DOES> looks sort of like this:
DOES>: [find second opcode of latest definition]
popr ; like "R>"
[overwrite opcode with a "jump" to TOS value]
return
So, when we define VALUE like this:
: VALUE VARIABLE DOES> @ ;
What we get is something like this:
VALUE: call VARIABLE
call DOES>
call fetch <-- return address of call to DOES>
return
The code will call our definition of VARIABLE, given above, which in turn calls CREATE to create the new entry. But, when it calls DOES>, DOES> will pop the return address pointed to above, and adjust the definition of NewVar to jump to that location, thus making NewVar so that it pushes "next_addr" on the data stack as before, but now jumps and calls fetch. This also makes execution of VALUE such that it ends at the call to DOES>. When DOES> returns, it returns to the caller of VALUE, not to the remainder of VALUE's definition (@ ;
).
Notice that CREATE is not an immediate word. Our definition of VARIABLE was CREATE 0 ,
, but that does not create a word named "0", since CREATE is encountered during the definition of VARIABLE... it just gets baked into the definition. Instead, it is when VARIABLE actually executes that CREATE will attempt to retrieve the next word from input and make a new definition for it.
Also notice that DOES> assumes a lot about the most-recently defined word. I could have made it search diligently for the "return" opcode but instead, knowing how CREATE creates a new word, it simply used a fixed offset into that definition. I'm leaning on the spec. that says "An ambiguous condition exists if [ the most recent definition ] was not defined with CREATE ...". In my system, that "ambiguous condition" is that some word gets a "jump" opcode as its second instruction.
What does DOES> consume as input?
It consumes its own return address and also uses a global variable that points to the most recent definition.
Is there some kind of 'default behaviour' that DOES> overrides?
Yes. It is the default behavior ("push_data/return") that CREATE makes.