1

I was reading some of the documentation for the linux kernel and I stumbled upon an article about adding new syscalls to the linux kernel. The article essentially says that any syscall in the linux kernel must be supported permanently:

A new system call forms part of the API of the kernel, and has to be supported indefinitely.

And that this has contributed to problems because historically, there have been cases where linux syscalls were added but did not plan on being extensible:

The syscall table is littered with historical examples where this wasn’t done, together with the corresponding follow-up system calls

My question is why do these syscalls have to be supported permanently? Why can't these syscalls be deleted or rewritten entirely with a cleaner, more extensible implementation?

2 Answers2

15

This doesn't really have anything particular to do with Linux or with syscalls. This is true for any interface that wants to be backwards-compatible: once you add something, you can never take it out and you can never change it because somewhere, someone wrote some software that relies on some particular quirk of the behavior. In fact, in some cases, developers will even refrain from fixing bugs because there is software that relies on those bugs.

There are examples everywhere: .NET's default random number generator is cryptographically broken, but it can never be changed because there is software which relies on the current behavior. Java is still carrying around the pre-1.5 non-generic collections in Java 19.

Python 3 made some backwards-incompatible changes. Updating existing codebases did not take a lot of work, and the Python community even provided tools for automatically converting code from Python 2 to Python 3, and yet, it took over 11 years until they could afford to end-of-life Python 2 (and in reality, there are still applications that only work on Python 2).

Windows Vista enforced some security rules that had already been documented by Microsoft for many years, and still it broke tens of thousands of applications. In fact, Microsoft shipped Windows Vista with a database of over 20000 applications which Vista detects and automatically runs in a sort-of "compatibility mode", otherwise, Vista and any version after it would have been completely unusable.

Windows is also full of bugs that will never be fixed because too much software relies on that particular behavior.

Apple is very aggressive with deprecating APIs, but they can afford that because they have very tight control over the developer community and an almost fanatic user base that is happy to throw away older applications. Nevertheless, there are plenty of questions on Stack Exchange about how to defeat Apple's measures or how to run outdated operating systems because the current ones removed some important feature.

And of course there are plenty of examples in Linux. Compare for example the syscalls for the LoongArch architecture (which was added to Linux two months ago and thus only supports the newest, most modern, non-deprecated syscalls) with the ones for the x86/i32 architecture (which was the very first one and thus supports every syscall ever added).

How important backwards compatibility is, depends on the software in question. A JavaScript library that is only used by a couple of dozen projects which are all actively maintained is not a big deal – the developers will complain, but then fix their code. But, for example, an OS that essentially runs our modern world as we know it? That is a completely different game. In particular, you might not have access to the source code of all the software that runs on that OS, so you cannot migrate to a new syscall!

As a general rule: the more "infrastructure-ish" the software, the higher the backwards compatibility requirements. That means OS kernels (e.g. Linux, NT), standard libraries (e.g. libc or the Win32 API), systems programming languages (C, C++, Java, C#), VM platforms (JVM, CLR), network protocols (IP, UDP, TCP, Ethernet, HTTP), etc.

Look at how long the transition to IPv6 is taking – probably, there are devices which will never use it. Or look at HTTP: while HTTP/2 has clear advantages over HTTP/1.1 and HTTP/3 has further advantages over HTTP/2, and while all modern servers and clients support HTTP/2 and many support HTTP/3, there are a lot of devices outside of the "browser / web server" space which will never support anything other than the most basic subset of HTTP/1.1.

In fact, HTTP/3 (or rather QUIC) is another good example: HTTP/3 is HTTP on top of QUIC as opposed to HTTP on top of TCP for HTTP/1.1 and HTTP/2. QUIC is a new layer 4 transport protocol intended to replace TCP (at least for certain use cases). So, you would think that QUIC is implemented on top of IP and side-by-side with TCP and UDP on layer 4, right? Wrong: it is actually implemented on top of UDP. Why? Because the entire Internet knows how to deal with TCP and UDP, whereas a new protocol would, for example, require firewalls, intrusion-detection systems and other so-called "middleboxes" to be updated. So, here we have a layer 4 protocol on top of a layer 4 protocol, for backwards compatibility, even though this is less efficient and the whole reason for developing QUIC is efficiency!

Jörg W Mittag
  • 101,921
  • 24
  • 218
  • 318
  • 5
    Well, at least there are only [14 competing standards instead of 15](https://xkcd.com/927/). – Greg Burghardt Jul 29 '22 at 00:04
  • Apple also changes behaviour of existing methods. Say a bug is fixed in iOS 15. If you build for iOS 11 a method that you call will be bug-compatible with iOS 11, if you build for iOS 15, the bug is gone (obviously where this makes sense and people worked around the bug in a way that only worked with the bug present). – gnasher729 Jul 29 '22 at 19:29
4

The short answer is backwards compatibility. Otherwise, old software would no longer work on a new version, which tends to be a massive setback for companies and developers alike.

Flater
  • 44,596
  • 8
  • 88
  • 122