153

Currently C is considered a low level language, but back in the 70's was it considered low level? Was the term even in use then?

Many popular higher level languages didn't exist until the mid 80's and beyond so I'm curious if and how the nature of low level has changed over the years.

Thomas Owens
  • 79,623
  • 18
  • 192
  • 283
joeyfb
  • 1,229
  • 2
  • 9
  • 9
  • 15
    As one data point, circa 1978 some programming mentors described C to me as a glorified assembly language. – Ben Crowell Jul 01 '18 at 17:56
  • 7
    @BenCrowell I am not sure what “glorified” means in the context of your statement but I’ve experienced calling C a _universal (= platform-independent) assembly language_. – Melebius Jul 02 '18 at 09:56
  • 1
    @Melebius "Glorified" in this context means "(especially of something ... unexceptional) represented in such a way as to appear more elevated or special." \[[Google](https://www.google.com/search?q=define+glorified)\]. I can't find that usage in MW or OED dictionaries, but it's common enough and Google has it (albeit it lists no sources). – Uyghur Lives Matter Jul 02 '18 at 13:42
  • @Melebius That was what C was designed to be. – Thorbjørn Ravn Andersen Jul 02 '18 at 17:22
  • 14
    Interestingly, there's a case to be made that [C Is Not a Low-level Language](https://queue.acm.org/detail.cfm?id=3212479), and is _less_ one now than it was in the 70s, because C's abstract machine is farther removed from modern hardware than it was from the PDP-11. – Tom Jul 02 '18 at 19:46
  • 5
    When thinking of this you might also want to think of the origin--Unix is written in C, c runs on Unix and cross-compiles unix to other platforms. Anything not necessary to compile Unix was unnecessary overhead, and C's main goal was to make it as easy to write/port a compiler as possible. For this purpose it was the EXACT correct level, so I don't think of C as high or low level, I think of it as a tool to port Unix that, like nearly all Unix tools, is extremely adaptable to many problems. – Bill K Jul 02 '18 at 21:56
  • 4
    It's worth noting that lisp was invented in [1958](https://en.wikipedia.org/wiki/Lisp_(programming_language)#History) – Prime Jul 03 '18 at 17:42
  • Of course, real programmers consider anything other than 8 toggle-switches and a STORE button to be ostentatiously high-level ;) – John U Jul 04 '18 at 10:18
  • When there were no high-level languages, of course nobody called it "low-level". ;) Reminds me of the Simpsons, where the grandfather said "world war one" in a flashback and was asked "Why do you keep calling it the first one? There is only one. This can't possibly happen again!" – Fabian Röling Jan 15 '19 at 10:59

7 Answers7

157

This depends on your definition of high-level and low-level language. When C was developed, anything that was higher-level than assembly was considered a high-level language. That is a low bar to clear. Later, this terminology shifted to the point that some would nowadays consider even Java to be a low-level language.

Even within the high-level language landscape of the 70s, it is worth pointing out that C is fairly low level. The C language is basically B plus a simple type system, and B is not much more than a convenient procedural/structured syntax layer for assembly. Because the type system is a retro-fit on top of the untyped B language, you can still leave out type annotations in some places and int will be assumed.

C consciously leaves out expensive or difficult to implement features that were already well-established at the time, such as

  • automatic memory management
  • nested functions or closures
  • basic OOP or coroutines
  • more expressive type systems (e.g. range-restricted types, user-defined types such as record types, strong typing, …)

C does have some interesting features:

  • support for recursion (as a consequence of its stack-based automatic variables, as compared to languages where all variables have global lifetime)
  • function pointers
  • User-defined data types (structs and unions) were implemented shortly after C's initial release.
  • C's string representation (pointer-to-chars) is actually a huge improvement over B which encoded multiple letters into one machine word.
  • C's header files were an efficiency hack to keep compilation units small, but also happen to provide a simple module system.
  • Assembly-style unrestricted pointers and pointer arithmetic, as compared to safer references. Pointers are an inherently unsafe feature but also very useful for low-level programming.

At the time when C was developed, other innovative languages such as COBOL, Lisp, ALGOL (in various dialects), PL/I, SNOBOL, Simula, and Pascal had already been published and/or were in wide use for specific problem domains. But most of those existing languages were intended for mainframe programming, or were academic research projects. E.g. when ALGOL-60 was first designed as an universal programming language, the necessary technology and computer science to implement it didn't exist yet. Some of these (some ALGOL dialects, PL/I, Pascal) were also intended for low-level programming, but they tended to have more complex compilers or were too safe (e.g. no unrestricted pointers). Pascal notably lacks good support for variable-length arrays.

Compared to those languages, C rejects “elegant” and expensive features in order to be more practical for low-level development. C was never primarily a language design research project. Instead, it was an offshoot of Unix kernel development on the PDP-11 minicomputer which was comparatively resource-constrained. For its niche (a minimalist low-level language for writing Unix with a single-pass compiler that's easy to port) C absolutely excelled – and over 45 years later it still is the lingua franca of systems programming.

amon
  • 132,749
  • 27
  • 279
  • 375
  • 17
    Just a small addition for people who are unaware of what was necessary for the existing ALGOL family and Pascal languages: Those languages had lexically nested functions where you could access (local) variable declared in outer functions. That meant either that you had to maintain a "display" - an array of pointers to outer lexical scopes - at _each_ function call and return (that changed lexical level) - _or_ you had to chain lexical scopes up the stack and each such variable access required multiple indirect hops up the stack to find it. Expensive! C jettisoned all that. I still miss it. – davidbak Jul 01 '18 at 05:42
  • And just to add to the list of really high level innovative languages that already existed at the time: Don't forget SNOBOL (especially its SPITBOL flavor), SETL, and APL. Three personal favorites. All had automatic memory management as well as a lot of idiosyncratic personality. (SETL was never widely used; SNOBOL did have a reasonable amount of use.) – davidbak Jul 01 '18 at 05:44
  • 12
    @davidbak: The x86-64 System V ABI (aka the calling convention on Linux / OS X) defines `%r10` as the "static chain pointer", which is exactly what you're talking about. For C it's just another call-clobbered scratch register, but I guess Pascal would use it. (GNU C nested functions use it for passing a pointer to the outer scope when such a function doesn't inline (e.g. if you make a function pointer to it so the compiler creates a trampoline of machine code on the stack): [Acceptability of regular usage of r10 and r11](https://stackoverflow.com/q/49928950)) – Peter Cordes Jul 01 '18 at 08:54
  • 1
    I've also heard C referred to as a "medium-level language" and "portable assembly". – chepner Jul 01 '18 at 15:43
  • 2
    @PeterCordes Fascinating! Pascal was still widely used when System V came out (though I don't know when the formal SysV ABI was defined). Your linked answer is very informative. – davidbak Jul 01 '18 at 16:03
  • 2
    @davidbak: The generic SysV ABI (http://www.sco.com/developers/devspecs/gabi41.pdf) doesn't mention a static-chain pointer. Neither does the i386 processor supplement (i386 psABI). The x86-64 psABI was designed around 2000 / 2001 while AMD was designing the first AMD64 CPUs. The mailing-list archives include some discussion about the design of the calling convention ([links here](https://stackoverflow.com/questions/4429398/why-does-windows64-use-a-different-calling-convention-from-all-other-oses-on-x86/35619528#35619528)), but IDK who suggested adding a static-chain pointer.) – Peter Cordes Jul 01 '18 at 16:22
  • @davidbak: See [Where is the x86-64 System V ABI documented?](https://stackoverflow.com/q/18133812) for links to ABI docs. – Peter Cordes Jul 01 '18 at 16:24
  • 3
    C has user-defined types (struct/union). The rest of those "features" were left out, I suspect, because they are of zero or negative use (unless you are entering an obfuscated code contest :-)), as they detract from the goal of keeping the language both simple and expressive. – jamesqf Jul 01 '18 at 17:25
  • 6
    @jamesqf: but very early C didn't have struct assignment; I guess you have to memcpy or copy the members individually instead of writing `a = b;` to copy a whole struct the way you can in ISO C89. So in early C, user-defined types were definitely second-class and could only be passed by reference as function args. [C's aversion to arrays](https://stackoverflow.com/a/35598701) and also [Why does C++ support memberwise assignment of arrays within structs, but not generally?](https://stackoverflow.com/q/3437110) – Peter Cordes Jul 01 '18 at 22:17
  • @PeterCordes - structs had no assignment or pass-by-value yet on the other hand structs had bitfields - I'm nearly 100% positive it was in early C K&R - which shows you they were quite serious about using C down to the metal. BTW, it could also be argued that copy-by-value for structs wasn't as interesting on the (mini-)computers of the day - typically with 16 to 64 KWords of memory _total_, plus much simpler ALU-to-memory architectures where there were no pipelining, bus width was the same as word width, etc. – davidbak Jul 01 '18 at 22:33
  • The systems programming language differs in some significant regards from the language specified by the C89/C99/C11 Standards. Quality implementations which are suitable for systems programming will be usable for that purpose, but other implementations may not. Unfortunately, there's no "official" distinction between the different kinds of implementations. – supercat Jul 02 '18 at 03:02
  • @supercat This depends a lot on your definition of systems programming. You are right that a lot of potentially useful idioms are technically undefined or implementation-defined behaviour as per the standards, but of course any compiler can make non-portable guarantees. E.g. the Linux kernel assumes it will be compiled with GCC. Systems programming can also describe user-space programs where exploiting UB isn't _necessary_, e.g. when writing libraries. You don't need an arcane “quality implementation” for that, the usual suspects like GCC/Clang/Visual Studio are fine. – amon Jul 02 '18 at 08:02
  • 1
    If you want to google the two first lines of this answer then it was called generations. https://en.wikipedia.org/wiki/First-generation_programming_language – Thomas Koelle Jul 02 '18 at 13:03
  • @amon: Systems programming requires the ability to recycle address ranges for use as different types. Ritchie's language could do that without having to use compiler-specific features; the only reasons his malloc()/free() couldn't be 100% portable were that (1) K&R2 didn't define a type with worst-case alignment, and (2) a malloc() written in portable C would need to specify the size of the heap at compile time and create a static object to hold it. The language defined by the C Standard includes no way of telling a compiler "storage that was declared as type X will be used as type Y..." – supercat Jul 02 '18 at 15:00
  • ...and in fact doesn't even provide for doing much of anything. I think it's deliberate that lvalues of aggregate-member type do not have carte-blanche permission to access the parent aggregate, but way the Standard is written doesn't provide for *any* circumstances where an aggregate may be accessed using a non-character-type member lvalue. An implementation that decided that given `union U {int x; float y } u; ... u.x=1;` invokes UB because it accesses a `union U` using an lvalue of type `int` should of course be denounced as obtuse, but is there any good reason to denounce that... – supercat Jul 02 '18 at 15:05
  • ...without also denouncing one that can't reliably handle `int *p = &u.x; *p = 1;` as being likewise of needlessly-poor quality? Actually, the way 6.5p7 is written would probably make 99.9% of C programs invoke UB because requires that all accesses be performed "*by*" lvalues. While C++ treats some kinds of expressions that modify objects as lvalues, I don't think C does. An expression like `i=5` is not an lvalue, and since the lvalue within it `i` doesn't modify the stored value of `i`, the modification of `i` must be done by something that isn't an lvalue. – supercat Jul 02 '18 at 15:16
  • @supercat I'm afraid I didn't understand most of that. It seems you have some very specific gripes with the C standard, stemming from considerable experience with the language. Though I'd like to point out that there's a difference between a useful systems language (as which C is widely considered) and a 100% portable systems language (which, according to your comments, C is not). I don't think that's a fatal flaw, e.g. a standard library implementation doesn't have to be (and in general, can't be) portable. I don't know what that has to do with “quality implementations”. – amon Jul 02 '18 at 15:57
  • 1
    @amon: The authors of the Standard recognized (in the published Rationale) the possibility that an implementation could be conforming but useless, but thought that people wanting to produce quality implementations would make them useful whether or not the Standard required it. For an implementation to be useful for systems programming, there must be a variety of situations where it processes certain actions "In a documented fashion characteristic of the environment", but the Standard makes essentially no effort to indicate when any particular kind of implementation should behave that way. – supercat Jul 02 '18 at 16:15
  • @amon: There is no circumstance where the act of converting an lvalue has any defined behavior other than reading the associated bytes from memory and interpreting them as a value of the proper type, nor where the act of writing an lvalue has any defined behavior other than converting the value to a sequence of byte values and storing those to the associated area of memory. Reads and writes that don't meet some absurdly-narrow criteria, however may result in any alternative behavior that implementations see fit, but quality implementations for a particular platform and application field... – supercat Jul 02 '18 at 16:33
  • ...should behave in the common defined fashion in all cases that would be relevant to programs targeting that particular platform and field. The Standard may allow an implementation to do otherwise and still be conforming, but that would not prevent such behavior from rendering an implementation unsuitable for the particular platforms and fields it cannot reliably support. – supercat Jul 02 '18 at 16:36
  • 1
    "C's string representation (pointer-to-chars) is actually a huge improvement over B which encoded multiple letters into one machine word." ... or at least it is for machines that have byte-addressable memory, but it's worth noting that plenty of machines that are contemporary to C had only word-addressable memory, often with rather long word sizes (e.g. 60 bits on the CDC7600). B's approach is much more sensible for such machines. – Jules Jul 03 '18 at 01:25
  • The header system, while tedious to write, allows compilation to be very fast indeed. This is especially visible on big projects (e.g. an OS) when compared to any modern programming language. – Sulthan Jul 03 '18 at 09:47
  • Modula's system seems to allow faster compiles exactly because it doesn't use headers, which rely on being mail-merged into code anywhere and inherently difficult to analyse. – idrougge Jul 03 '18 at 10:03
144

To answer the historical aspects of the question:

The design philosophy is explained in The C Programming Language written by Brian Kernighan and C designer Dennis Ritchie, the "K&R" you may have heard of. The preface to the first edition says

C is not a "very high level" language, nor a "big" one...

and the introduction says

C is a relatively "low level" language... C provides no operations to deal directly with composite objects such as character strings, sets, lists, or arrays. There are no operations that manipulate an entire array or string...

The list goes on for a while before the text continues:

Although the absence of some of these features may seem like a grave deficiency,... keeping the language down to modest size has real benefits.

(I only have the second edition from 1988, but the comment below indicates that the quoted text is the same in the 1978 first edition.)

So, yes, the terms "high level" and "low level" were in use back then, but C was designed to fall somewhere on the spectrum in between. It was possible to write code in C that was portable across hardware platforms, and that was the main criteria for whether a language was considered high level at the time. However, C lacked some features that were characteristic of high level languages, and this was a design decision in favor of simplicity.

gatkin
  • 1,319
  • 1
  • 8
  • 7
  • 42
    This is an excellent answer ! Verifiable historical evidence + testimony of an actor best informed of the meaning of low and high level in the period the OP refers to. By the way, I confirm that the quotes were already in the 1978 edition. – Christophe Jul 01 '18 at 16:42
38

In the early 1970s, C was a dazzling breath of fresh air using modern constructs so effectively that the entire UNIX system could be rewritten from assembly language into C with negligible space or performance penalty. At the time many contemporaries referred to it as a high level language.

The authors of C, primarily Dennis Ritchie, were more circumspect and in the Bell System Technical Journal article said "C is not a very high-level language." With a wry smile and intending to be provocative, Dennis Ritchie would say it was a low-level language. Chief among his design goals for C was to keep the language close to the machine yet provide portability, that is machine independence.

For more info consult the original BSTJ article:

Thank you Dennis. May you rest in peace.

bud wonsiewicz
  • 389
  • 2
  • 2
  • 3
    It was basically typed and nicer-syntax wrappers for PDP assembly if you ask me. – einpoklum Jun 30 '18 at 22:43
  • 1
    @einpoklum: I haven't studied PDP assembly, but reportedly it's similar to m68k, and if so it's hardly low-level. ;-) – R.. GitHub STOP HELPING ICE Jul 01 '18 at 01:37
  • 2
    @R.. [...68000 machine language itself supports C constructs, thanks in part to the similarities between the PDP-11 and the 68000](https://www.atarimagazines.com/compute/issue74/c_and_the_68000.php) *Compute! Magazine* July 1986. – tchrist Jul 01 '18 at 05:12
  • 2
    @einpoklum I tend to agree with that. I learned C back in about 1980 at university, on a PDP-11/34a as it happens, and it was described by one of the profs as a "Portable assembly language." Probably because in addition to the PDP-11, we had several Superbrain CP/M machines in the lab that had a C compilers available on them. https://en.wikipedia.org/wiki/Intertec_Superbrain – dgnuff Jul 01 '18 at 08:09
  • 2
    Also an excellent answer based on verifiable historical evidence (wow! there was a pre-version of the K&R available out there!). – Christophe Jul 01 '18 at 16:47
  • 7
    @einpoklum Isn't this somewhat far-fetched ? I had the opportunity to write system and application code in the mid 80's in assembler (Z80 and M68K), a "portable assembler" called M (PDP11 syntax with an abstract 16 bit register and instruction set), and C. In comparison to assembler, C is definitively a high level language: the productivity of C programming was an order of magnitude higher than in assembler! Of course no strings (SNOBOL), no native datafile support (COBOL), no self generating code (LISP), no advanced math (APL), no objects (SIMULA), so we can agree that it was not very high – Christophe Jul 01 '18 at 17:02
21

As I wrote elsewhere on this site when someone referred to the malloc/free memory management pattern as "low-level programming,"

Funny how the definition of "low-level" changes over time. When I was first learning to program, any language that provided a standardized heap model that makes a simple allocate/free pattern possible was considered high-level indeed. In low-level programming, you'd have to keep track of the memory yourself, (not the allocations, but the memory locations themselves!), or write your own heap allocator if you were feeling really fancy.

For context, this was in the early 90s, well after C came out.

Mason Wheeler
  • 82,151
  • 24
  • 234
  • 309
  • Wasn't the standard library only a thing since standardization in the end-80s (well, based on existing Unix APIs)? Also, Kernel programming (which was C's original problem domain) naturally requires low-level stuff like manual memory management. At the time C must have been the highest level programming language that a serious kernel was written in (I think nowadays the NT kernel uses a fair amount of C++ as well). – amon Jun 30 '18 at 16:36
  • @amon definition may vary little bit but NT kernel is in C (but given how poor the MS C compiler was before ..possibly compiled with C++ compiler) – Adriano Repetti Jun 30 '18 at 18:30
  • Can you name a significant language with no equivalent or better memory allocation system, than what C had? Your answer implies most were like that in the 90's, which seems odd claim to me. – hyde Jun 30 '18 at 21:11
  • @hyde Assembly, which was still a significant programming language back then. – Mason Wheeler Jun 30 '18 at 21:45
  • 6
    @hyde, I used Algol 60, two different FORTRAN dialects, and several different BASIC dialects back in the 1970s, and none of those languages had pointers or a heap allocator. – Solomon Slow Jun 30 '18 at 23:49
  • 7
    Strictly speaking, you can still program with no `malloc()` by directly calling [`brk(2)`](http://man7.org/linux/man-pages/man2/brk.2.html) or [`mmap(2)`](http://man7.org/linux/man-pages/man2/mmap.2.html) and managing the resulting memory yourself. It's a massive PITA for no conceivable benefit (unless you happen to be *implementing* a malloc-like thing), but you can do it. – Kevin Jul 01 '18 at 03:45
  • 1
    @amon - except for the remarkable Burroughs stack-based machines that were programmed in ALGOL from bottom to top...and much earlier than Unix, too. Oh, and by the way, Multics, which was the inspiration for Unix: Written in PL/I. Similar to ALGOL, higher level than C. – davidbak Jul 01 '18 at 05:51
  • @masonwheeler Pure assembler (as opposed to assebler source files or inline asm with higher level language code) definitely wasn't major in 90's, outside demo scene and low level embedded. – hyde Jul 01 '18 at 07:49
  • @jameslarge The answer talks about early 90's, not 70's... – hyde Jul 01 '18 at 07:51
  • @davidbak : Re "Multics, which was the inspiration for Unix: Written in PL/I": This may explain why few (if any) of us are using Multix these days :-) – jamesqf Jul 01 '18 at 17:29
  • @Kevin Another (admittedly fringe) use case is if you need to allocate memory in a process that you're poking with ptrace. The `mmap` syscall is safe to use, even if the code is currently in `malloc`, whereas `malloc` is famously not async-safe. – James_pic Jul 01 '18 at 18:01
  • @jamesqf - You scoff, sirrah! Multics was great - awesome for its time, and still has some useful concepts yet to be added to the OSes of our day. (A couple anyway.) It actually had many of the language compilers of the time; it's just that it itself was written in PL/I. – davidbak Jul 01 '18 at 22:25
  • 6
    Actually, the reason more of us aren't using Multics today probably has more to do with the fact that it only ran on hellishly expensive mainframes - i.e., the typical outrageous mainframe expense of the day plus the extra expense of a half a cabinet of specialized hardware to implement virtual memory with security. When 32-bit minicomputers like the VAX-11 came out everyone but the banks and the government decamped from IBM and the Seven Dwarfs and took their large-scale processing to "mini" computers. – davidbak Jul 01 '18 at 22:37
  • I'd still like this answer to actually mention the programming languages it refers to. If the original source doesn't mention them, then perhaps it isn't a very quote-worthy to begin with. – hyde Jul 02 '18 at 06:38
  • @davidbak: You misunderstand me. I'm not saying that Multics wasn't great, as I have no experience from which to form an opinion. I'm saying that the fact that it was written in PL/I explains why it is not widely used. Indeed, per Wikipedia there hasn't been a native Multics system running since 2000. And perhaps explains why PL/1 isn't widely used: apparently there are compilers available, but I've never seen a PL/1 program "in the wild" - that is, outside my long-ago introduction to programming languages class. – jamesqf Jul 02 '18 at 17:58
  • I don't consider " malloc/free" to be part of "C" as it is no different to any other memory manager a user of the C compiler can write. – Ian Jul 05 '18 at 08:35
15

Many answers have already referred to early articles that said things like “C is not a high level language”.

I can’t resist piling on, however: many, if not most or all HLLs at the time - Algol, Algol-60, PL/1, Pascal - provided array bounds checking and numeric overflow detection.

Last I checked buffer and integer overflows were the root cause of many security vulnerabilities. ... Yep, still the case...

The situation for dynamic memory management was more complicated, but still, C style malloc/free was a great step backward in terms of security.

So if your definition of HLL includes “automatically prevents many low level bugs”, well, the sorry state of cybersecurity would be very different, probably better, if C and UNIX had not happened.

Krazy Glew
  • 275
  • 1
  • 5
  • 1
    Re array bounds checking and the like, how much does this slow your code down? This may not matter much these days when you're writing user-facing code, but if you're doing serious modelling (where run times are in days or weeks) you tend to care about things like that. – jamesqf Jul 01 '18 at 17:33
  • 7
    Since I happen to have been involved in the implementation of Intel MPX and the pointer/bounds checking compiler that spun out if it, I can refer you to the papers on their performance: essentially 5-15%. Much of that involves compiler analyses that were barely possible in the 1970s and 1980s - compared to naive checks that might be 50% slower. However, I think that it is fair to say that C and UNIX set back work on such analyses by 20 years - when C became the most popular programming language, there was much less demand for safety. – Krazy Glew Jul 01 '18 at 17:45
  • 9
    @jamesqf Moreover, many machines prior to C had special hardware support for bounds checking and integer overflows. Since C did not use that HW, it eventually was deprecated and removed. – Krazy Glew Jul 01 '18 at 17:48
  • 6
    @jamesqf For example: the MIPS RISC ISA originally was based on the Stanford benchmarks, which were originally written in Pascal, only later moved to C. Since Pascal checked for signed integer overflow, therefore so did MIPS, in instructions like ADD. Circa 2010 I was working at MIPS, my boss wanted to remove unused instructions in MIPSr6, and studies showed that the overflow checks were almost never used. But it turned out that Javascript did such checks - but could not use the cheap instructions because of lack of OS support. – Krazy Glew Jul 01 '18 at 17:56
  • 3
    @KrazyGlew - It's very interesting that you bring this up since in C/C++ the fight between users and compiler writers over "undefined behavior" due to signed integer overflow is heating up, since todays compiler writers have taken the old mantra "Making a wrong program worse is no sin" and turned it up to 11. Plenty of posts on stackoverflow and elsewhere reflect this... – davidbak Jul 01 '18 at 22:45
  • @davidbak: I really like Rust's approach: you can get wrapping math if you want it, but you have to ask for it explicitly. You can also get explicit overflow-checked math. The "default" for signed and unsigned is that integer overflow is an error (checked for in debug builds, undefined behaviour in optimized builds). See [`i32.wrapping_add()`](https://doc.rust-lang.org/std/primitive.i32.html#method.wrapping_add) vs. `overflowing_add` vs. `checked_add` vs. `saturating_add`. Also for sub/neg/mul/div/rem/pow/abs/shl/shr. Rust also has built-in popcount, bitscan, rotate, etc. – Peter Cordes Jul 01 '18 at 22:53
  • 2
    If Rust had SIMD intrinsics like C, it might be a nearly ideal modern portable assembly language. C is getting worse and worse as a portable assembly language because of aggressive UB-based optimization, and failure to portably expose new primitive operations that modern CPUs support (like popcnt, count leading/trailing zeros, bit-reverse, byte-reverse, saturating math). Getting C compilers to make efficient asm for these on CPUs with HW support often requires non-portable intrinsics. Having the compiler emulate popcnt on targets without it is better than idiom-recognition to get `popcnt`. – Peter Cordes Jul 01 '18 at 22:59
  • 1
    @PeterCordes: If C had specified a means of explicitly requesting low-level semantics, and recognized but deprecated code which relied upon them without explicitly requesting them, then code which uses such semantics without requesting them could be viewed as "broken". As it is, however, compiler writers would prefer to retroactively pretend that code using constructs that implementations on most platforms had previously defined and processed in predictable and useful fashion has always been "broken". – supercat Jul 02 '18 at 03:07
  • @davidbak: The notion that "making wrong programs worse is no sin" fails to recognize that many if not most programs are subject to two primary requirements: (1) When given valid data, produce valid output; (2) When given invalid data, behave in constrained fashion. Abnormal program termination would qualify as "constrained", as would production of meaningless output. Running arbitrary code of a hostile entity's choosing, however, does not. – supercat Jul 02 '18 at 05:03
  • @davidbak: This is yet another area in which C and C++ could diverge, so it makes not much sense to talk about C/C++ as it it's one language. C++ can easily support additional types, where implementations can choose between compiler-native and library implementations. A `std::wrapped` is fairly trivial to implement even as a library function, and can be zero-overhead on most hardware. – MSalters Jul 02 '18 at 07:37
  • @MSalters: It's simple in C++ to make "wrap-overflow" or "trap-overflow" integer types that kinda-sorta work, but language-based types could offer better semantics. Although C was designed with inside-out expression evaluation, that's not a good model from a semantic standpoint. It would be more helpful to say that coercing the result of an addition, subtraction, multiply, left-shift, or bitwise operator to certain types would likewise coerce the operands or, in some cases, require that a programmer add typecasts to indicate the intention. Given `int64a = int32a+int32b;`, for example... – supercat Jul 02 '18 at 16:05
  • ...I don't think it's at all clear what behavior the programmer would be expecting if the result won't fit in a 32-bit `int`. Requiring that the programmer either force the addition to be performed as 64 bits or affirm the fact that the result is 32 bits before promoting it would make the intention clearer. – supercat Jul 02 '18 at 16:08
  • @Peter Cordes: But integer overflow & wrapping math aren't the same as array bounds checking. In the scientific/engineering modelling domain, where I've mostly worked, integers are mostly used as array indices into floating point arrays (or arrays of structs), and at least until recently you really didn't have machines with enough memory to get anywhere near to where a 32-bit overflow or wrap would be an issue. – jamesqf Jul 02 '18 at 17:34
  • @jamesqf :(1) buffer over/underflow is the big security hole, but (2) integer over/underflow ca lead to buffer over/underflow. // E.g. most HW does not distinguish signed or unsigned, so if you can get produce a bit pattern like 0xFFFFFFxx and use in a memory offset then you are doing a buffer underflow. Integer over/underflow may just be the root cause that leads to a buffer over/underflow. Checking the root cause rather than all uses may reduce the overhead you were concerned about. Although IMHO I like checking everywhere, if it does not cost. – Krazy Glew Jul 02 '18 at 17:58
  • @Krazy Glew: Sure, but that's in a different domain. Somehow I just can't see that the sort of modelling software I deal with as being a security issue. You don't produce bit patterns to use as indices... – jamesqf Jul 02 '18 at 18:07
  • 1
    @jamessqf: (1) array A[J-K] where J=1 and K=1 produces bit pattern 0xFFFFFFFF (or 0xFFFFFFFC if an array of int32_t). (2) I also work on modelling SW, albeit computer architecture related. Some of my modelling SW has gotten into products that people, external customers, use to do performance analysis => security hole, QED. (3) I don't know what modelling SW you work on, but if it is financial analysis or weather modelling, it is quite common to see such applications accessible via web APIs, sometimes to the public. "Model your retirement" or "model microclimate". – Krazy Glew Jul 02 '18 at 18:17
  • @jamesqf: I never said anything about array bounds checking; I was only talking about C quality-of-implementation issues surrounding optimization based on integer overflow vs. taking advantage of wrapping for bithacks. If you're really only using `int` for looping over arrays, then C's current behaviour is good. Integer bit hacks are sometimes useful with SIMD compare bitmaps if manually vectorizing (although you only get 32 or 64 elements with single-bit element size). – Peter Cordes Jul 02 '18 at 18:20
  • @jamesqf: Integer overflows may not be an issue when processing valid data, but may be an issue when trying to validate data. For example, something like `if (size1 + size2 <= totalSize) { handle_thing(ptr, size1); handle_thing(ptr_size1, size2); }` would reject items where `size1` is too big but `size2` isn't, or vice versa, or those where both values are somewhat too big, but may erroneously accept some items where both values are way too big. – supercat Jul 02 '18 at 20:25
8

Consider older and much higher languages that predated C (1972):

Fortran - 1957 (not much higher level than C)

Lisp - 1958

Cobol - 1959

Fortran IV - 1961 (not much higher level than C)

PL/1 - 1964

APL - 1966

Plus a mid level language like RPG (1959), mostly a programming language to replace plugboard based unit record systems.

From this perspective, C seemed like a very low level language, only a bit above the macro assemblers used on mainframes at the time. In the case of IBM mainframes, assembler macros were used for database access such as BDAM (basic disk access method), since the database interfaces hadn't been ported to Cobol (at that time) resulting in a legacy of a mix of assembly and Cobol programs still in use today on IBM mainframes.

rcgldr
  • 191
  • 3
  • 2
    If you want to list older, higher languages, [don't forget LISP](https://en.wikipedia.org/wiki/Lisp_(programming_language)). – Deduplicator Jul 02 '18 at 21:53
  • @Deduplicator - I added it to the list. I was focusing on what was used on IBM mainframes, and I don't recall LISP being that popular, but APL was also a niche language for IBM mainframes (via time sharing consoles) and IBM 1130. Similar to LISP, APL is one of the more unique high level languages, for example look how little code it takes to create Conway's game of life with a current version of APL: https://www.youtube.com/watch?v=a9xAKttWgP4 . – rcgldr Jul 03 '18 at 08:00
  • 2
    I wouldn't count RPG or COBOL as particularly high-level, at least not prior to COBOL-85. When you scratch the surface of COBOL, you see that it is essentially a collection of very advanced assembler macros. To begin with, it lacks both functions, procedures and recursion, as well as any kind of scoping. All storage must declared at the top of the program leading to either extremely long overtures or painful variable reuse. – idrougge Jul 03 '18 at 11:25
  • I have some bad memories of using Fortran IV, I don't recall it being appreciably "higher level" than C. – DaveG Jul 03 '18 at 13:54
  • @idrougge - COBOL and RPG, and also mainframe instructions sets included full support for packed or unpacked BCD, a requirement for financial software in countries like the USA. I consider the related native operators such as "move corresponding" to be high level. RPG was unusual in that you specified linkage between raw input fields and formatted output fields and/or accumulators, but not the order of operations, similar to the plugboard programming it replaced. – rcgldr Jul 03 '18 at 20:53
  • @DaveG - Fortran IV supports exponentiation as a native operator, not a function call. There's also a minimal standard set of library functions, and formatted output is native to the language. It was also common to have mainframe specific "parallel" extensions to Fortran for specific mainframes, such as CDC 6600 (multiple arithmetic units, but they had to be individually referenced, similar to XMM registers on current X86 processors), or CDC 7600 (more generic native parallel math operators), – rcgldr Jul 03 '18 at 21:04
  • @rcgldr The C standard I/O library is listed in my dog-eared K&R from the 70's. I'd consider it part of the C environment, particularly if you are going to list mainframe-specific extensions for Fortran. As far as exponentiation, in the 40 or so years I've been writing software, I'm not sure I've ever used it outside of a test program. – DaveG Jul 03 '18 at 22:29
  • @DaveG - I updated my answer noting that Fortran at the time was not much higher level than C. I'll delete this comment later as it won't be needed once you read it. – rcgldr Jul 04 '18 at 02:11
  • @DaveG: If a language had a "squared" operator, I'd probably use it a fair amount; for languages that handle integer-power cases separately, I'd regard `lenSquared = (x1-x2)^2 + (y1-y2)^2` as being nicer either `lenSquared = (x1-x2)*(x1-x2)+(y1-y2)*(y1-y2)` or `dx = x1-x2; dy=y1-y2; lensquared = dx*dx + dy*dy;`. – supercat Jul 04 '18 at 19:16
6

The answer to your question depends upon which C language it is asking about.

The language described in Dennis Ritchie's 1974 C Reference Manual was a low-level language which offered some of the programming convenience of higher-level languages. Dialects derived from that language likewise tended to be low-level programming languages.

When the 1989/1990 C Standard was published, however, it did not describe the low-level language which had become popular for programming actual machines, but instead described a higher-level language which could be--but was not required to be--implemented in lower-level terms.

As the authors of the C Standard note, one of the things that made the language useful was that many implementations could be treated as high-level assemblers. Because C was also used as an alternative to other high-level languages, and because many applications didn't require the ability to do things that high-level languages couldn't do, the authors of the Standard allowed implementations to behave in arbitrary fashion if programs tried to use low-level constructs. Consequently, the language described by the C Standard has never been a low-level programming language.

To understand this distinction, consider how Ritchie's Language and C89 would view the code snippet:

struct foo { int x,y; float z; } *p;
...
p[3].y+=1;

on a platform where "char" is 8 bits, "int" is 16 bits big-endian, "float" is 32 bits, and structures have no special padding or alignment requirements so the size of "struct foo" is 8 bytes.

On Ritchie's Language, the behavior of the last statement would take the address stored in "p", add 3*8+2 [i.e. 26] bytes to it, and fetch a 16-bit value from the bytes at that address and the next, add one to that value, and then write back that 16 bit value to the same two bytes. The behavior would be defined as acting upon the 26th and 27th bytes following the one at address p without regard for what kind of object was stored there.

In the language defined by the C Standard, in the event that *p identifies an element of a "struct foo[]" which is followed by at least three more complete elements of that type, the last statement would add one to member y of the third element after *p. Behavior would not be defined by the Standard under any other circumstances.

Ritchie's language was a low-level programming language because, while it allowed a programmer to use abstractions like arrays and structures when convenient, it defined behavior in terms of the underlying layout of objects in memory. By contrast, the language described by C89 and later standards defines things in terms of a higher-level abstraction, and only defines the behavior of code that is consistent with that. Quality implementations suitable for low-level programming will behave usefully in more cases than mandated by the Standard, but there's no "official" document specifying what an implementation must do to be suitable for such purposes.

The C language invented by Dennis Ritchie is thus a low-level language, and was recognized as such. The language invented by the C Standards Committee, however, has never been a low-level language in the absence of implementation-provided guarantees that go beyond the Standard's mandates.

supercat
  • 8,335
  • 22
  • 28
  • This seems to say the same language is both high- and low-level depending on what the documentation says. – user253751 Nov 05 '20 at 09:16
  • @user253751: I suppose whether one uses the terms "high-level" and "low-level" to describe what languages *have* or what they *lack*. Sorta like the question of whether a baritone is a bass who can sing high notes, or a tenor who can't. – supercat Nov 05 '20 at 15:39