1

EDIT: Perhaps what I am misunderstanding is that when it is said that the code we type gets turned into machine code of 0s and 1s. If these 0s and 1s are the abstracted representation of their electric states, then I actually don't have a question (it is just amazing that a compiler can take English and turn it into a form that can impact the processor). If the compiler turns the code a file that contains literal 0s and 1s, then I don't understand the transformation steps steps that strictly occur between this file of literal 0s and 1s and the execution of the program (I understand how programs already in RAM are executed).

I have been researching for an answer to this question all day. I have read a lot about the computer's hardware and understand concepts like the clock, its different states that it "steps" through to execute a program, and (at a basic level) the compiler. I also know that the bits of each part of an instruction code are wired in a certain way so that it does what it is supposed to and the compiler takes the code we write and converts it to binary. But my question is this: how does the processor "understand" this binary (or machine readable) code.

In the computer there are no 0s and 1s obviously, and 0s and 1s are an abstraction themselves reprenting the presence or absence of electricity. So how does a 0 go in one end of the computer and a bit being off come out the other? I understand how this may work with input from the keyboard because I understand how computer peripherals work and get stored to some degree (the wiring underneath the keyboard accomplishes this as an extension of the wiring of the hardware that accomplishes everything that the computer does), but the compiler and the other (seemingly missing) piece of the puzzle that moves the computer to action seem to be a black box right now. Thanks so much in advance, always appreciate any responses.

One last time just to be super clear: I understand the compiler takes our code and converts it to 0s and 1s. That's great, but how does this 0 get interpreted, by a component and/or a process, that causes this abstraction (like the #0) cause one of the computer's bits to be in a different position.

Thanks so much once again. I have searched for hours about this question on this forum and other places and I can't seem to find an answer that moves past the: "yeah the compiler converts to 0s and 1s and those do some stuff". The how is what I am after.

steez
  • 31
  • 5
  • 1
    Computer's don't work on "the ones and zeros" any more than a person reading books works on the "a to z"'s. The bits are all organized into groups that have meaning, just like letters grouped into words. – whatsisname Jan 25 '21 at 23:20
  • Thank you for the response. I understand the computer does not operate on literal 0s and 1s. However, machine code is in the decimal system (i.e., when code is compiled, the resulting file is literal 0s and 1s). So how does the computer interact with and execute the machine code and turn these literal 0s into its functional form as either the presence or absence of electricity? – steez Jan 25 '21 at 23:46
  • @steez, in your last comment did you mean in the binary system? – Erik Eidt Jan 25 '21 at 23:55
  • Yes I did, apologies. – steez Jan 25 '21 at 23:56
  • I think you have the fundamental misunderstanding that there are literally 1's and 0's in the physical hardware, whether on disc or in the CPU. There are physical values that are used to represent 1's and 0's but are not actually or literally 1's and 0's. Thus, the interpretation of physical values as 1's or 0's is up to the kind of device (e.g. rotating rust, vs. dynamic ram vs. CPU registers). – Erik Eidt Jan 25 '21 at 23:57
  • I understand that there are no 0s and 1s anywhere in a computer and no concept of language. However, what I am driving at is the code that we write is in English. So then the compiler must able to take this code (in English, or instruction code, etc.) and interact with the computer in such a way that this code is converted to a form that the computer can work with. This just seems incredible to me, so I am trying to understand how this step can occur. Thanks. – steez Jan 26 '21 at 00:01
  • Yes, the compiler thinks it is outputting 1's and 0's for the machine code, and logically speaking, it is, but those 1's and 0's are always encoded in some physical form. Physical forms can be interchanged, shared, translated to other physical forms by having a common interchange standard. – Erik Eidt Jan 26 '21 at 00:01
  • What Eric Eidt is trying to say is, yes, the compiler outputs a bunch of bits to a file, but the file itself is ultimately represented on a physical device that somehow encodes the 1s and 0s in a physical way. I think your question revolves about the different ways of physically storing binary information on different hardware components, and about how this information transferred between components. I think the full picture is more complicated than it seems, as it involves multiple levels and types of memory, and different hardware technologies. – Filip Milovanović Jan 26 '21 at 16:56
  • Reading through the link Doc Brown posted should help - but perhaps you just need a bit of context. The CPU is essentially doing really simple/stupid things, *really fast*. It doesn't "know" anything beyond the very limited instruction set. It has no notion of files. It just receives inputs via conductive elements or wires, passes it through a logic gates that ultimately produces signals in other wires. And this stream of signals then affects other components (that may have microprocesors of their own) and back and forth it goes. 1/2 – Filip Milovanović Jan 26 '21 at 17:56
  • I think the key point is: most of the real "smarts" is offloaded, stored as data in software (drivers, OS, libraries, app code) - this is essentially stored knowledge that combines to make the files be stored in a certain way, or figure out how to locate them again, transfer the data through multiple levels of hardware and physical representations, and ultimately make the right wires carry the right signals. It's all really one extremely complicated bookkeeping device, written in a language that this "stupid" thing that only supports a limited set of instructions can understand. 2/2 – Filip Milovanović Jan 26 '21 at 17:56
  • It's hard to concisely describe how many layers of abstraction you're missing or trying to unroll at once. A "zero" means completely different things on different media (inside the CPU, in RAM, on an SSD, on a magnetic HDD, coming over a network interface). The only thing they have in common is they are all _logically_ described with the symbol zero. The details of _how_ the CPU and RAM agree that a bit is zero are non-trivial. The sane way to think of it is "we call this low-voltage state zero by convention" – Useless Jan 26 '21 at 19:38
  • If you want to, for example, follow the chain from typing `0` on a keyboard to resetting a bit ... it's a long path. It'd be easier to start from thinking about the early plugboard computers (like [ENIAC](https://en.wikipedia.org/wiki/ENIAC)) and work up. – Useless Jan 26 '21 at 19:45

3 Answers3

3

CPUs have instruction sets, add two numbers, jump forward 3 instructions, if this is true run the next instruction, if not skip and instruction etc

https://en.wikipedia.org/wiki/Instruction_set_architecture

The 1s and 0s are stored in the memory of the computer and grouped together and read as instructions or data, ie run instruction number 3 or data ie take that bit of memory there and do some operations on it.

Essentially when you compile code you are turning it into combination of these very basic instructions which are stored on disc, loaded into memory and then run by the cpu.

Some of these instructions or areas of memory will map to pins on connectors to devices, such as a graphics card, or disc drive etc and their state, on or off will cause the device to do things.

Ewan
  • 70,664
  • 5
  • 76
  • 161
  • Thanks for the response. What I am asking, more specifically, is that these "0s" and "1s" aren't really 0s and 1s, they are the presence or absence of electricity. So when the code is compiled into binary, the file is literal 0s and 1s. So how do these literal 0s and 1s then get transformed to the their matching electric states? – steez Jan 25 '21 at 23:38
  • 2
    @steez, The file is never *literally* 0's and 1's -- these values (1 & 0) are logical values, and they are encoded into some physical form on media (e.g. disc) where the media knows what physical values mean 1 vs. 0. Then they are transmitted to the processor or memory via interfaces that also have physical wires where 1's and 0's are differentiated by their voltage. – Erik Eidt Jan 25 '21 at 23:52
  • Ok this is the best answer so far, thank you. So when you type out each letter of a code like Assembly, each letter you type gets stored physically and then when there is an entire word that denotes some action, the individual characters are wired in a way that performs the desired operation? – steez Jan 26 '21 at 00:40
  • 2
    or rather, all files, all data on a digital computer is allways "1s and 0s" regardless of where the file is displayed as binary or not. some chip of memory or magnetic media has the on/off states which encode the data. The transform is what pixels you turn onnor off on the screen when combinations of the states are read – Ewan Jan 26 '21 at 07:57
  • @steez "the individual characters are wired in a way that performs the desired operation" - not quite; the CPU doesn't understand the words/concepts we use in our programming languages. The compiler translates what we say in code to simple instructions. E.g. I might say drawLine(pointA, pointB), meaning "draw line between two points", but the CPU doesn't understand that. It's more like "go to this memory location, change the number, move to the next, change the number, move to the next, change the number, ...". It "understands" a very limited set of things supported by its circuitry. – Filip Milovanović Jan 26 '21 at 20:41
  • @Ewan Thank you both for the responses. I know that the computer does not understand the English word "ADD", for instance, there is just certain wiring within the computer that can perform some operation that we call addition (or ADD). In terms of detecting this operation when needed, that is why I was saying that maybe when I type each letter of add in succession each key is wired to a gate that turns on when these 3 letters are put together and we execute the operation. Does this make sense? – steez Jan 26 '21 at 23:24
  • @steez - it's not quite like that, that's what I'm trying to convey. It's not that a specific character combination activates something. The CPU is even "dumber" than that; for the add operation to happen, a special program needs to read those several characters, recognize them as the ADD mnemonic, and convert that to a number (an "opcode") that comes from a (relatively) small set of predefined numbers (instructions). *That* number (a string of 0s & 1s) is what ultimately activates the circuitry, that's what the CPU directly supports. The executable program is stored in that transformed form. – Filip Milovanović Jan 27 '21 at 03:28
  • Ok that is actually really really helpful. Is the compiler the "special program" that does this recognizing? That is sort of the heart of my question: what does this recognizing and how? – steez Jan 27 '21 at 03:35
  • @steez - For a higher level language like C, it's the compiler. If you're writing close to the metal in assembly lang, then it's the assembler program, but let's just use the term "translator", to avoid confusion. The following is all *very* handwavy, but, programs (translators included) can have hardcoded data embedded in them (and they can read data from a file). So you can put the string constant "ADD" in the translator before it's ever executed. You can also store the corresponding opcode as a constant. So, you can prearrange some data to form something like a lookup table, let's say. 1/3 – Filip Milovanović Jan 27 '21 at 05:19
  • So the translator program is made out of some data, and a set of prearranged instructions that allow it to, when a program in the source language is fed into it, read some characters, and compare with the "ADD" already stored in it; the CPU itself supports simple comparisons. The translator program's code is prearranged so that, if the comparisons succeed, the CPU (while executing the translator) jumps to a sequence of instructions (also part of the translator) that result in the opcode being written to the data/file representing the output program. 2/3 – Filip Milovanović Jan 27 '21 at 05:19
  • In case of assembly language, you're mostly working with mnemonics corresponding to low level instructions, so the translation process is not as complicated as for a higher level language. In case of a high level language and a compiler, a very simple looking statement like `c = a + b` may, when translated, result in a bunch of instructions. The final result is bunch of simple instructions, a long string of bytes that mostly looks like incomprehensible gibberish even to programmers, but encodes the right sequence of simple events, checks, jumps, etc., that make something interesting happen.3/3 – Filip Milovanović Jan 27 '21 at 05:19
  • @steez P.S. I have a feeling you might enjoy this video of a 1985 lecture by Richard Feynman; it's over an hour long, but it's an absolutely delightful watch. He talks about how computers work at the lowest level. It's not going to answer all of your questions, but it might help bring additional context: https://www.youtube.com/watch?v=EKWGGDXe5MA – Filip Milovanović Jan 27 '21 at 05:35
  • Again, that was insanely helpful and helps clarify what is going on underneath the good with the compiler and the interpretation of code. I liked how you said it is sort of a lookup table of sorts, with the languages definition of ADD mapped to a certain op code, and once the compiler recognizes that command, the CPU knows which op code to fetch and then execute. That "recognition" step is so amazing... – steez Jan 27 '21 at 15:42
  • @steez The assembler is just a computer program. It's not special. `if(char1 == 'A' && char2 == 'D' && char3 == 'D') {outputFile.write(OPCODE_FOR_ADD);}` The CPU doesn't know the word ADD, and the assembler doesn't even tell it the word ADD. When the CPU executes the assembler, it looks for A, then it looks for D, then it looks for another D, if it finds all of those then it edits the program counter which just happens to now point to the place where the code for writing the opcode to the object file is... (because the person who wrote the assembler made it that way) – user253751 Mar 22 '21 at 10:58
2

A compiler operates at a logical level, not concerning itself with physical storage, but rather with groupings of 1's and 0's — several different kinds of groupings.  The first grouping is text files aka source code, and the second grouping is machine code.

Text is stored on disc, for example, as a sequence of bytes, where a byte is a group 8-bits that is ordered from most significant to least significant (or vice versa — that's up to the media;  each bit of course is physically encoded in the media as physically measurable values differentiating 1's from 0's).  Each byte represents a number: the number represents an ascii or UTF-8 value.  This is a logical encoding or mapping of logical characters into logical bytes — a layered encoding well above the physical encoding of bits in media.

Machine code is also a logical encoding of command operations to numbers — also well above the physical encodings of bits in media.  It can also be stored on disc as a sequence of bytes.

When the compiler recognizes a program's syntax according to the grammar of the programming language, it builds an internal object diagram of that program.  Later it traverses that object diagram in order to generate commands in machine code to instruct the processor to follow the steps of the program.

Machine code is designed to be easily decoded (at least by comparison to other encodings) — usually there's a first field that is an opcode field that tells the processor something about how to interpret additional fields of the instruction.  The compiler knows these encodings and generates them for the processor such that it will perform the given program.

Erik Eidt
  • 33,282
  • 5
  • 57
  • 91
1

But my question is this: how does the processor "understand" this binary (or machine readable) code.

It doesn’t. It’s turns numbers into other numbers and puts them different places based on still other numbers.

What those numbers mean is someone else’s problem. That someone sometimes thinks of them differently based on where they are. But they’re still just numbers. The processor doesn’t need to know what they are. Just how to transform them and where to put them.

So how does a 0 go in one end of the computer and a bit being off come out the other?

From a microphone or analog joystick being sampled or a keystroke at the keyboard, zero finds its way in through many different ways.

If that joystick has a 100k potentiometer in it you might think 100,000 values can come out. But if I’m sticking the value in one byte then only 256 values can come out. Because a byte only has 8 bits. This is called analog to digital conversion.

Another way is someone just types in a zero at the keyboard because they want a zero. They can do it hex, octal, ascii, heck even binary if they want to be silly. They know where they want the zero and they know what it will mean. Doesn’t mean the processor does. All it knows is what it’s supposed to do with it.

What we programmers do is use the rules that the processor follows blindly to get some useful work done. That might be letting you see and edit what you’ve typed on the screen before you put it on paper. It might be hearing what that zero sounds like when a sound card plays it on the system speaker.

What gives that zero meaning in it’s different places is often something called an encoding. If we’re talking about ascii (or UTF-8) a zero often means this is the end of the string. (Pascal strings tell you how long they are in the first byte. For them zero means a zero length string).

But not everything is encoded text. Some stuff is just data and it doesn’t follow any set pattern to tell you what it means. It will only be meaningful to whatever created it.

It is impossible to simply look at a zero and know what it means. You need to know it’s context. The processor is no different.

The really magical thing the processor does is copy that zero around faithfully. It doesn’t let it turn into a 0.3 when no one is looking like analog stuff does. That’s the real reason we use 1 and 0. It’s on or off. Anything in between is noise to filter out.

But it takes humans to make that zero into something humans care about. The processor don’t care.

candied_orange
  • 102,279
  • 24
  • 197
  • 315
  • Thank you. I know that it doesn't literally understand binary code. What I am trying to get at is how code we write (in whatever human language) gets transformed into a form that makes something happen inside the CPU. – steez Jan 26 '21 at 00:04
  • @steez better now? – candied_orange Jan 26 '21 at 00:27
  • @steez, would it help to know that a computer does not literally translate from English? It translates from a specially designed language - a programming language, which is only superficially English-like - into CPU instructions. A stream of human-readable characters, typed or copied into the computer, is already encoded into binary data (often using an ASCII encoding), and the compiler simply converts (according to complex rules) that data into executable instructions (which are also a form of data). – Steve Jan 26 '21 at 11:41
  • The amazing part to me is how it recognizes the key words (or primitives) of the programming language. The human-characters that we see that execute an "add" function are already stored, but I am trying to get at how these are recognized based on our input. Is it just the keyboard input triggering certain switches being turned on and when you type "a" "d" and then "d" this triggers the execution of the code? – steez Jan 26 '21 at 23:29
  • @steez “add” is an op code. It was assigned a number when electrical engineers designed the processor. When programmers want an add to happen they put that number in memory and wait for it to be executed. – candied_orange Jan 26 '21 at 23:42
  • So when I type sum() in python it executes this op code. Does each letter get stored in its ascii equivalent in memory and then when all three are typed in succession together it triggers the execution due to the circuit having electricity present? It has to be based off of the way the keyboard input is stored and executed right? – steez Jan 27 '21 at 01:09
  • Ascii may be involved. You might type “sum” with a Japanese keyboard and end up storing a different string of numbers. This still works providing the compiler or interpreter used knows what to do with that number. If it doesn’t then it never even becomes an “add” opcode number. – candied_orange Jan 27 '21 at 01:20
  • 1
    Also, what you type in is important. Notepad will just make a file. That file might be ran later. A console will execute your line as soon as you hit enter. Either way can get to that add opcode. – candied_orange Jan 27 '21 at 01:30
  • Ok thank you, I appreciate the help. So the compiler is what is responsible for taking the input in the file - lets just say "sum" in English, but I understand what you mean that it need not be stored in Ascii - and going character by character and then when it finds that these 3 letters actually have a corresponding op code, it can be executed and then looks to the registers to see which digits to add. Do you have any recommended books or other resources to get a look behind the hood? – steez Jan 27 '21 at 02:07
  • We don’t do recommendations here. I will say it was my computer architecture class that drilled this into me. We wrote emulators for a theoretical computer that had its own opcodes. Programmed it with a custom assembly language. Learned a lot. It’s all numbers and the context in which you find them. – candied_orange Jan 27 '21 at 02:15
  • Thanks, apologies about seeking the recommendation. Did not know about that lol. – steez Jan 27 '21 at 15:34