44

We are often told that the hardware doesn't care what language a program is written in as it only sees the compiled binary code, however this is not the whole truth. For example, consider the humble Z80; its extensions to the 8080 instruction set include instructions like CPIR which is useful for scanning C-style (NULL-terminated) strings, e.g. to perform strlen(). The designers must have identified that running C programs (as opposed to Pascal, where the length of a string is in the header) was something that their design was likely to be used for. Another classic example is the Lisp Machine.

What other examples are there? E.g. instructions, number and type of registers, addressing modes, that make a particular processor favour the conventions of a particular language? I am particularly interested in revisions of the same family.

Gaius
  • 646
  • 1
  • 7
  • 17
  • 3
    Don't forget that the Z-80 also had the LDIR instruction, very useful when copying strings when you know the length (like in Pascal, where the length was stored in the header). – TMN Jul 30 '12 at 20:45
  • 27
    1. The Z-80 was designed in 1975, when Unix and C were an obscure operating system and language on a few computers, 3 years before the first edition of K&R. 2. There's nothing about Pascal that mandates the string length be "in a header." 3. Strings in CP/M, the major microcomputer OS at the time, were terminated with the '$' character, not '\0'. CPIR could search for any character. 4. CPIR is matched with CPDR (search backwards), as well as other -IR and -DR instructions. Conclusion: CPIR has nothing to do with the C programming language. It's just a byte search instruction. – librik Jul 31 '12 at 06:26
  • All the marketing for the Z80 referred to these as string handling instructions. – Gaius Jul 31 '12 at 08:22
  • 4
    The biggest (and one of the most annoying for the hardware designers) of the things forced by C is a byte addressing. CPUs would have been simpler and faster without this abomination. – SK-logic Jul 31 '12 at 08:47
  • @SK-logic: Nothing in C would stop a CPU from doing indiviudal bit addressing. A C compiler could then decide to keep the last three address bits zero (except for bitfield operations). And if you mean addressing memory in 32 bits chunks, that too is allowed in C. The macro `CHAR_BIT` has a minimum of 8, but no maximum. The only difficult architectures for C is those with 6 and 7-bit bytes – MSalters Jul 31 '12 at 11:55
  • @SK-logic, ISTR that the first machine to which C was ported was a word-addressed one. – AProgrammer Jul 31 '12 at 12:04
  • Yes, yes. But now it is all this legacy C code which depends on a byte-level access is what keeps CPU designers from abandoning this otherwise useless functionality. – SK-logic Jul 31 '12 at 12:35
  • Weird, I wrote a C function (don't ask why) that dealt with the fact that the compiler was upcasting to word-level addressing (because word addressing was what the CPU did). – Paul Jul 31 '12 at 12:38
  • Wasn't a lot of the CPU design during the early Pascal/Basic/C era really influenced by the needs of ASM programers, and their need for speed? – Paul Jul 31 '12 at 12:41
  • Not precisely language-based, but some chips (e.g., newer intel chips) have instructions like "AESENC - Perform a single round of AES encryption". – Brian Jul 31 '12 at 13:56
  • @TMN: The Z-80 still gets used. The graphing calculators I've used in school use the Z80. (edited) –  Aug 01 '12 at 03:13
  • 1
    @SK-logic: Although the POSIX standard requires byte addressing, the C standard does not. Any implementation where `sizeof(int)` equals 1 must require that type `char` be signed (since an `int` must be able to hold all values of type `char`). I've written code for a machine where `char` and `int` are both 16-bit signed integers; the biggest difficulties are that one can't use unions for type conversion, and efficient storage of large number of bytes requires manual packing and unpacking. Those issues are minor compared with the possibility in C that sizeof(int)==sizeof(long), since... – supercat Aug 01 '12 at 15:18
  • 2
    ...that means there's no standard type which is guaranteed to hold the difference between two `unsigned int` values. C99 improved that situation, but prior to C99 there was no guaranteed-safe single-step way to compare a potentially-negative value to a value of type `unsigned int` (one would have to test whether the number was negative before doing the comparison). – supercat Aug 01 '12 at 15:25
  • @Gaius marketing most likely saw an opportunity with what they had. You asked about design time, not deploy time. –  Aug 02 '12 at 10:25
  • A professor once told me that the certain CPU's were once designed (in the 70's) around functional programming paradigms. However, I can't find a good reference to an article documenting this. – Dibbeke Aug 09 '12 at 10:55

15 Answers15

20

The existing answers focus on ISA changes. There are other hardware changes, too. For instance, C++ commonly uses vtables for virtual calls. Starting with the Pentium M, Intel has an "indirect branch predictor" component which accelerates virtual function calls.

MSalters
  • 8,692
  • 1
  • 20
  • 32
  • 6
    And the Berkeley RISC architecture included the concept of a "register file", so instead of making functions "spill" registers onto the stack, a block of 8 registers was given to each function. This sped up object-oriented code considerably, since it tends to consist of many method calls to short methods. – TMN Aug 01 '12 at 20:29
  • 1
    This isn't a valid example. The "Table of function pointers" design is *also* used in many dynamic linking scenarios, for example, through DLL import and export on Windows, and also used in C programs. Although I guess you could argue that it does show the processor being optimized for a specific use, it's not language-specific. – DeadMG Aug 02 '12 at 18:35
  • @DeadMG: Other cases benefitted, that's true. But until C++ became popular, CPU designs weren't _influenced_. And that was the question posed. Similarly, TMN does have a point about register files. Assembly didn't have such a clear concept of functions. Functions, as we commonly understand them today, date back to Algol 60, and therefore we can say that Algol 60 influenced the CPU register file design. – MSalters Aug 03 '12 at 07:43
14

The Intel 8086 instruction set includes a variation of "ret" which adds a value to the stack pointer after popping the return address. This is useful for many Pascal implementations where the caller of a function will push arguments onto the stack before making a function call, and pop them off afterward. If a routine would accept e.g. four bytes' worth of parameters, it could end with "RET 0004" to clean up the stack. Absent such an instruction, such a calling convention would likely have required that code pop the return address to a register, update the stack pointer, and then jump to that register.

Interestingly, most code (including OS routines) on the original Macintosh used the Pascal calling convention despite the lack of a facilitating instruction in the 68000. Using this calling convention saved 2-4 bytes of code at a typical call site, but required an extra 4-6 bytes of code at the return site of every function that took parameters.

supercat
  • 8,335
  • 22
  • 28
  • There is also `ENTER` counterpart to this `RET n`... – herby Aug 01 '12 at 14:10
  • 1
    @herby: I don't think `ENTER` existed in the original 8086; it came with later processors. It does bring up an interesting point, though: the BP-based addressing modes are clearly designed around the use of stacked parameters and locals accessed via frame pointer. I find this convention interesting in a number of ways, especially considering that (1) pure assembly language code is more apt to use values in registers than the stack, but (2) the advantages of [BP+nn] addressing over [SP+nn] addressing are more significant for assembly-language programs that access things on the stack than... – supercat Aug 01 '12 at 14:42
  • ...for hand-written assembly code. A compiler will generally know, for every generated instruction, how SP and BP compare; if SP is BP-8, for example, it's not really any easier for the compiler to address [BP+12] than [SP+20]. If on a recompile the compiler has to add another PUSH/POP around a block of code, it can adjust SP-based offsets appropriately. On the other hand, in hand-written assembly, adding a PUSH/POP would more likely require tweaking the code between them. So frame pointers are mainly a benefit to combined high-level/asm code. – supercat Aug 01 '12 at 14:46
  • Maybe a possibility to reuse code without its recompilation is also of some marginal usability point for BP addressing. And God knows if BP addressing instructions are not faster in circuitry than SP addressed ones, since BP addressing is sort of standard... – herby Aug 01 '12 at 14:51
  • @herby: The 8086 doesn't offer SP-based addressing modes other than the implied address for PUSH and POP. I don't think code reuse without recompilation is an issue because in most cases parameters will be in a fixed position relative to SP when a separately-linked piece of code is entered. The issue is mainly with hand-written assembly. There would be advantages to requiring that BP always hold a valid frame address any time interrupts are enabled, but enough 8086 code violates that convention that it cannot be safely assumed (though I think Windows imposes stronger requirements). – supercat Aug 01 '12 at 15:04
  • 3
    @herby: Actually, I suspect a big part of the reason compilers have generally used frame pointers has a lot to do with debugging. To debug a program which did not use such a convention would require that the compiler generate--and the debugger use--a file listing the SP-BP offset for every instruction. Such detailed metadata is common today (and is an essential part of what makes garbage-collected languages practical) but the amount RAM it requires would have been unacceptable 30 years ago. – supercat Aug 01 '12 at 15:08
  • @supercat one thing is meta-data, another whether it was loaded to RAM or not. Even with the original COM format it would be simple to have it at the end so the program could ignore it. –  Aug 02 '12 at 10:28
  • In the original COM format, there would have been no way to prevent the data from being loaded, though if the code and metadata totaled less than 0xFF00 bytes the code could without difficulty have overwritten the metadata once it began execution [I don't think the COM loader could do anything with a file bigger than that size]. – supercat Jan 18 '15 at 23:57
10

One example is MIPS, which has both add and addu for trapping and ignoring overflow respectively. (Also sub and subu.) It needed the first type of instruction for languages like Ada (I think--I've never actually used Ada though) which deal with overflows explicitly and the second type for languages like C that ignore overflows.

If I remember correctly, the actual CPU has some additional circuitry in the ALU for keeping track of overflows. If the only language people cared about was C, it wouldn't need this.

Tikhon Jelvis
  • 5,206
  • 1
  • 24
  • 20
  • Not sure if related, but those instructions are probably also useful in other situations, like safe memory allocation, i.e. if you are allocating `nmemb*size+offset` bytes and need to make sure that you don't get an overflow. – NikiC Aug 04 '12 at 19:04
  • @NikC: I was thinking that the `addu` and `subu` instructions (the ones that *don't* check for overflows) were the ones that were added to make C happy. Of course, I don't really know--we only covered it vaguely in lecture and I'm certainly no expert in architecture :P. – Tikhon Jelvis Aug 04 '12 at 21:22
  • Oh yeah, I was thinking the other way around, sorry :/ – NikiC Aug 04 '12 at 22:39
8

The Burroughs 5000 series was designed to efficiently support ALGOL, and Intel's iAPX-432 was designed to efficiently execute Ada. The Inmos Transputer had it's own language, Occam. I think the Parallax "Propeller" processor was designed to be programmed using its own variant of BASIC.

It's not a language, but the VAX-11 instruction set has a single instruction to load a process context, which was designed after a request from the VMS design team. I don't remember the details, but ISTR it took so many instructions to implement that it put a serious upper limit on the number of processes they could schedule.

TMN
  • 11,313
  • 1
  • 21
  • 31
  • What is it about these designs that makes them particularly suitable? E.g. what feature of the iAPX does Ada particularly benefit from? – Gaius Jul 30 '12 at 20:55
  • ISTR that the Ada target of iAPX-432 was more trying to save a failed design by attaching it to something with yet great expectations than anything else. – AProgrammer Jul 31 '12 at 12:21
  • @AProgrammer: I'm pretty sure the iAPX-432 was designed from the start to use Ada. I even recall some rumors that Intel wasn't going to publish the instruction set, to discourage assembly language programming and force people to use Ada for everything. – TMN Aug 01 '12 at 14:27
  • 1
    @TMN, Intel's 432 project started in 1975 and introduced in 1981 (Wikipedia). Ironman (final requirements for Ada), was published in January 1977, and green was chosen in May 1979, modified and the final result published as a military standard in July 1980. There is a timeline issue in stating that iAPX-432 was designed from the start to use Ada. (It is a late and typical "close the semantic gap" processor with the usual drawbacks at a time when alternatives started to be searched; marketing it as Ada processor was a tentative to save a failed design -- ISTR that nobody but Intel used it) – AProgrammer Aug 01 '12 at 17:28
  • 1
    @AProgrammer: Hmmm, looks like you're right. I ran across [this paper](http://dl.acm.org/citation.cfm?id=801835) from the lead architect of the 432 and in the summary he says "This close match of architecture and language did not occur because the 432 was designed to execute Ada—it was not." I'll have to dig out my old 432 book and see what it says. – TMN Aug 01 '12 at 18:48
7

One thing nobody seems to have mentioned so far is that advances in compiler optimization (where the base language is largely irrelevant) drove the shift from CISC instruction sets (which were largely designed to be coded by humans) to RISC instruction sets (which were largely designed to be coded by compilers.)

5

IBM's Z series mainframe, is the descendant of the IBM 360 from the 1960s.

There were several instructions which were put there to specifically to speed up COBOL and Fortran programs. The classic example being the BXLE – "Branch on Index Low Or Equal" which is most of a Fortran for loop or a COBOL PERFORM VARYING x from 1 by 1 until x > n encapsulated in a single instruction.

There is also a whole family of packed decimal instructions to support fixed point decimal arithmetic common in COBOL programs.

svick
  • 9,999
  • 1
  • 37
  • 51
James Anderson
  • 18,049
  • 1
  • 42
  • 72
  • I think you mean [descendant](http://en.wikipedia.org/wiki/IBM_System_z). – Clockwork-Muse Jul 31 '12 at 16:54
  • @X-Zero -- oops! Early morning, not enough caffiene in the system etc....... – James Anderson Aug 01 '12 at 01:41
  • 1
    More interesting is the TI 32050 DSP's block-repeat instruction. Its operand is the address of the instruction following the last one in the loop; loading a loop-count register and then performing the block-repeat instruction will cause instructions up to (but not including) the target to be repeated the specified number of times. Very strongly reminiscent of a FORTRAN `DO` loop. – supercat Aug 01 '12 at 15:11
  • @supercat Every DSP worthy of the name includes three features: zero-overhead loop, single instruction multiply-accumulate, and a bit-reversed addressing mode of some kind. Almost every DSP algorithm known to Man uses loops. The two most common algorithms are FIR filter, which is a loop around a multiply-accumulate, and FFT, for which bit-reversed addressing is critical. Many DSPs include a one-instruction radix-2 FFT butterfly operation, or a dual multiply/add that can be used to make a one-instruction butterfly. – John R. Strohm Aug 02 '12 at 18:35
  • @JohnR.Strohm: Every DSP I've seen includes a repeat-multiply-accumulate, but not all of them include more generalized zero-overhead loops. Actually, I'm not quite sure why such loops should be considered only a "DSP" feature, since they would be useful in a lot of "conventional processor" code as well. – supercat Aug 02 '12 at 18:42
5

The Motorola 68000 family introduced some autoincrement adressmode that made copying data through the cpu very efficient and compact.

[Updated example]

this was some c++ code that influenced 68000 assembler

while(someCondition)
    destination[destinationOffset++] = source[sourceOffset++]

implemented in conventional assembler (pseudocode, I forgot the 68000 assembler commands)

adressRegister1 = source
adressRegister2 = destination
while(someCondition) {
    move akku,(adressRegister1)
    move (adressRegister2), akku
    increment(adressRegister1, 1)
    increment(adressRegister2, 1)
}

with the new adressmode it became something simmilar to

adressRegister1 = source
adressRegister2 = destination
while(someCondition) {
    move akku,(adressRegister1++)
    move (adressRegister2++), akku
}

only two instructions per loop instead of 4.

k3b
  • 7,488
  • 1
  • 18
  • 31
3

Early Intel CPUs had the following features, many of them now obsoleted in 64-bit mode:

  • ENTER, LEAVE and RET nn instructions [early manuals told explicitly those were introduced for block structured languages, e.g., Pascal, which supports nested procedures]
  • instructions for speeding up BCD arithmetic (AAA, AAM, etc.); also BCD support in x87
  • JCXZ and LOOP instructions for implementing counted loops
  • INTO, for generating a trap on arithmetic overflow (e.g., in Ada)
  • XLAT for table lookups
  • BOUND for checking array bounds

Sign flag, found in the status register of many CPUs, exists to easily perform signed AND unsigned arithmetic.

SSE 4.1 instruction set introduces instructions for string processing, both counted and zero-terminated (PCMPESTR, etc.)

Also, I could imagine that a number of system-level features were designed to support safety of compiled code (segment limit checking, call gates with parameter copying, etc.)

zvrba
  • 3,470
  • 2
  • 23
  • 22
3

Some ARM processors, mainly those in mobile devices, include(d) Jazelle extension, which is hardware JVM interpreter; it interprets Java bytecode directly. Jazelle-aware JVM can use the hardware to speed up the execution and eliminate much of JIT, but fallback to software VM is still ensured if bytecode cannot be interpreted on chip.

Processors with such unit include BXJ instruction, which puts processor in special "Jazelle mode", or if activating the unit had failed, it is just interpreted as normal branch instruction. The unit reuses ARM registers to hold JVM state.

The successor to Jazelle technology is ThumbEE

usoban
  • 131
  • 3
2

As far as I know this was more common in the past.

There is a session of questions in which James Gosling said that there were people trying to make hardware that could deal better with JVM bytecode, but then these people would find out a way to do it with common "generic" intel x86 (maybe compiling the bytecode in some clever way).

He mentioned that there is advantage in using the generic popular chip (such as intel's) because it has a large corporation throwing huge sums of money at the product.

The video is worth checking out. He talks about this at minute 19 or 20.

2

The Intel iAPX CPU was specifically designed for OO languages. Didn't quite work out, though.

The iAPX 432 (intel Advanced Processor architecture) was Intel's first 32-bit microprocessor design, introduced in 1981 as a set of three integrated circuits. It was intended to be Intel's major design for the 1980s, implementing many advanced multitasking and memory management features. The design was therefore referred to as a Micromainframe...

The iAPX 432 was "designed to be programmed entirely in high-level languages", with Ada being primary and it supported object-oriented programming and garbage collection directly in hardware and microcode. Direct support for various data structures was also intended to allow modern operating systems for the iAPX 432 to be implemented using far less program code than for ordinary processors. These properties and features resulted in a hardware and microcode design that was much more complex than most processors of the era, especially microprocessors.

Using the semiconductor technology of its day, Intel's engineers weren't able to translate the design into a very efficient first implementation. Along with the lack of optimization in a premature Ada compiler, this contributed to rather slow but expensive computer systems, performing typical benchmarks at roughly 1/4 the speed of the new 80286 chip at the same clock frequency (in early 1982).

This initial performance gap to the rather low profile and low priced 8086-line was probably the main reason why Intel's plan to replace the latter (later known as x86) with the iAPX 432 failed. Although engineers saw ways to improve a next generation design, the iAPX 432 Capability architecture had now started to be regarded more as an implementation overhead rather than as the simplifying support it was intended to be.

The iAPX 432 project was a commercial failure for Intel...

gnat
  • 21,442
  • 29
  • 112
  • 288
  • Reading the paper, it sounds like many aspects of the design could be useful in object-oriented frameworks such as are popular today. An architecture which used a combination of a 32-bit object-id and 32-bit offset could in many cases offer better caching performance than one where object ids were all 64 bits (in most cases, an application which would use billions of objects would be better served by instead having more, larger ones; one which would store billions of bytes in one object would be better served subdividing that into smaller objects. – supercat Nov 03 '14 at 19:45
2

I did a quick page search and it seems that no one has mentioned CPU's developed specifically to execute Forth. The Forth programming language is stack based, compact, and used in control systems.

Paddy3118
  • 617
  • 1
  • 5
  • 10
1

The 68000 had MOVEM which was most suited to pushing multiple registers onto the stack in a single instruction which is what many languages expected.

If you saw MOVEM (MOVE Multiple) preceding JSR (Jump SubRoutine) throughout the code then you generally knew that you were dealing with C complied code.

MOVEM allowed for auto increment of the destination register allowing each use to continue stacking on the destination, or removing from the stack in the case of auto decrement.

http://68k.hax.com/MOVEM

Myztry
  • 11
  • 1
1

Atmel's AVR architecture is entirely designed from the ground up to be suitable for programming in C. For example, this application note elaborates further.

IMO this is closely related with rockets4kids'es excellent answer, with early PIC16-s being developed for direct assembler programming (40 instructions total), with later families targeting C.

Vorac
  • 7,073
  • 7
  • 38
  • 58
1

When the 8087 numerical coprocessor was designed, it was fairly common for languages to perform all floating-point math using the highest-precision type, and only round the result to lower precision when assigning it to a lower-precision variable. In the original C standard, for example, the sequence:

float a = 16777216, b = 0.125, c = -16777216;
float d = a+b+c;

would promote a and b to double, add them, promote c to double, add it, and then store the result rounded to float. Even though it would have been faster in many cases for a compiler to generate code that would perform operations directly on type float, it was simpler to have a set of floating-point routines which would operate only on type double, along with routines to convert to/from float, than to have separate sets of routines to handle operations on float and double. The 8087 was designed around that approach to arithmetic, performing all all arithmetic operations using an 80-bit floating-point type [80 bits was probably chosen because:

  1. On many 16- and 32-bit processors, it's faster to work with a 64-bit mantissa and a separate exponent than to work with value which divides a byte between the mantissa and exponent.

  2. It's very difficult to perform computations which are accurate to the full precision of the numerical types one is using; if one is trying to e.g. compute something like log10(x), it's easier and faster to compute a result which is accurate to within 100ulp of an 80-bit type than to compute a result which is accurate to within 1ulp of a 64-bit type, and rounding the former result to 64-bit precision will yield a 64-bit value which is more accurate than the latter.

Unfortunately, future versions of the language changed the semantics of how floating-point types should work; while the 8087 semantics would have been very nice if languages had supported them consistently, if functions f1(), f2(), etc. return type float, many compiler authors would take it upon themselves to make long double an alias for the 64-bit double type rather than the compiler's 80-bit type (and provide no other means of creating 80-bit variables), and to arbitrarily evaluate something like:

double f = f1()*f2() - f3()*f4();

in any of the following ways:

double f = (float)(f1()*f2()) - (extended_double)f3()*f4();
double f = (extended_double)f1()*f2() - (float)(f3()*f4());
double f = (float)(f1()*f2()) - (float)(f3()*f4());
double f = (extended_double)f1()*f2() - (extended_double)f3()*f4();

Note that if f3 and f4 return the same values as f1 and f2, respectively, the original expression should clearly return zero, but many of the latter expressions may not. This led to people condemning the "extra precision" of the 8087 even though the last formulation would generally be superior to the third and--with code that used the extended double type appropriately--would rarely be inferior.

In the intervening years, Intel has responded to language's (IMHO unfortunate) trend toward forcing intermediate results to be rounded to the operands' precision by designing their later processors so as to favor that behavior, to the detriment of code which would benefit from using higher precision on intermediate calculations.

supercat
  • 8,335
  • 22
  • 28
  • Note that you've got an answer ([above](http://programmers.stackexchange.com/a/158797/40980)) in this post already. Are they answers that could/should be merged into one? –  Nov 03 '14 at 19:25
  • @MichaelT: I don't think so--one covers stack design, and the other covers floating-point semantics. – supercat Nov 03 '14 at 19:32
  • Just making sure. Personally, I believe that it would be possible to make one, stronger answer (using headers to separate the sections), but thats my take on it. You may wish to still use headers to clearly identify at the top what each answer part addresses (`## How the stack changed the processor` and `## How floating point changed the processor`) so that people can get int he proper mind set when reading it and are less likely to think you were either absent minded in answering or reposting the same (r similar) answers. –  Nov 03 '14 at 19:37
  • @MichaelT: The two answers are sufficiently disjoint that I think they should be voted upon separately. Although the 80486 absorbed the functions previously performed by the 8087/80287/80387, the 8086 and 8087 were designed as separate chips with nearly-independent architectures. Although both ran code from a common instruction stream, that was handled by having the 8086 treat certain byte sequences as requests to generate address read/write requests while ignoring the data bus, and having the 8087 ignore everything else that was going on. – supercat Nov 03 '14 at 20:17