24

I am developing a server-client application where the client will run on Windows and the server probably on Linux. Maybe I'll later port the client over to Mac and Linux, but not yet.

All home-computers these days run on little-endian. I googled a while, but I couldn't really find a list of devices which run on big-endian. As far as I know, some Motorola chips still use big-endian and maybe some phones (I do not plan on porting the app to smartphones, so this doesn't matter to me). So, why would I rearrange the bytes of every integer, every short, every float, double, and so on, for reading and writing, when I already know that both, server and client run on little-endian?

That's just unnecessary work to do. So, my question is: Can I safely ignore the endianness and just send little-endian data? What are the disadvantages?

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
tkausl
  • 594
  • 6
  • 13
  • 4
    How will the machines know if they're receiving little-endian data instead of the usual/standard big-endian data? – Ixrec Apr 22 '16 at 23:14
  • "If you're sharing data with another platform, or (de)serializing some binary format with defined endianness, then you need to match that platform's or format's endianness..." ([Regarding little endian and big endian conversion](http://programmers.stackexchange.com/a/215537/31260)) – gnat Apr 22 '16 at 23:20
  • 2
    You need to distinguish between the metadata that is required by the network protocol, and the payload which is just a bunch of uninterpreted bytes for everyone except your code. I hope you're not rolling your own networking stack. Consequently I assume the question is only about the payload, correct? –  Apr 22 '16 at 23:29
  • 2
    @delnan yes, only talking about the payload. I'll of course still _talk in_ network-byte-order to the network-stack itself. – tkausl Apr 23 '16 at 00:00
  • 3
    Just a thought on the side: Is it really necessary for you to work at an abstraction level where endianness is a concern? It might be worthwhile to consider using protocols for which appropriate libraries exist that encapsulate all this low-level "mess". Then, you also have the additional bonus that adding further clients can be done much easier. – godfatherofpolka Apr 23 '16 at 07:31
  • @godfatherofpolka good point, however, at the moment, I'm not using any higher-level abstraction. I use plain old boost.asio for networking and I don't want that much overhead because the application - especially the server - will eventually send thousands of packets (my own packets, not network packets) every second and even a 0.01 ms overhead will probably be noticeable. – tkausl Apr 23 '16 at 07:47
  • 1
    @tkausl Just two more thoughts on the side: As a general rule, IO is extremely slow compared to computations, so any overhead a introduced by working at an higher abstraction level is most likely negligible. It might even happen that some libraries outperform handrolled implementations due to clever resource pooling and asynchronous handling, etc.. So, I would first carefully evaluate existing solutions. Furthermore, given your description, I would also spend some thoughts on scalability rather than performance, here you might again benefit from using higher-level protocols. – godfatherofpolka Apr 23 '16 at 08:52
  • 1
    _"even a 0.01 ms overhead will probably be noticeable"_ Over a network? I doubt it. That's more than likely _well_ within your packet delay jitter margin already. – Lightness Races in Orbit Apr 23 '16 at 11:48
  • I *believe* that DCE-RPC says that data is according to the endianness of the *sender*. This is referred to as "receiver makes right". It includes a tag to state what that order is. – Roger Lipscombe Apr 23 '16 at 17:49
  • @RogerLipscombe: Some UTF-16 files use the same trick, with a Byte Order Marker or BOM. Microsoft made this particularly popular (so much so that one sometimes finds the BOM in UTF-8 files, where it is mainly just annoying). http://unicode.org/faq/utf_bom.html#utf16-11 – torek Apr 23 '16 at 19:03

9 Answers9

30

... why would I rearrange the bytes ... when I already know that both, server and client run on little endian? Thats just unnecessary work to do.

It's only unnecessary if you can guarantee your code will always run on little-endian architectures. If you intend for it to have a long life, it's worth the extra effort to avoid disturbing well-proven code a decade from now when some big-endian architecture has become the "in" thing and you find it to be a good market for your application.

There is a network-standard byte ordering. It's big-endian, but nothing says you have to abide by it when designing your protocol. If you know ahead of time the majority of the systems running your code will be little-endian and performance is critical, declare that the "tkausl standard byte ordering" and go with it. Where you'd normally call htons() to put things in the order you need, write a macro called htots() that conditionally compiles to nothing on little-endian architectures and does the re-arranging on big-endian.

Maintaining the code to do the inbound and outbound conversions isn't really a big effort. If you have a very large number of messages, find a way to express them and write a program to generate the inbound and outbound conversions.

Blrfl
  • 20,235
  • 2
  • 49
  • 75
  • 10
    The wording `when designing your protocol` is important, because it also implicitly says that this option only exists when designing a new protocol and not when implementing some existing protocol. And mentioning the need for a `htots` (and really an entire family of functions), also makes it clear that choosing a different byte ordering is not something one does to make the code simpler, but it might make it slightly faster. – kasperd Apr 23 '16 at 08:02
  • 4
    There are (non-standard but very common these days) functions `htole32()`, `htole16()`, `le16toh()`, etc., functions available as well. The file to include to get these declared is unfortunately even less standard: `` or `` depending on platform. – torek Apr 23 '16 at 09:33
  • This answer is fine, but I think the assumption that the performance might be critical the the given case is most probably a wrong assumption, based more on superstition than on facts. – Doc Brown Apr 24 '16 at 11:21
  • 1
    @DocBrown: I always like to point out that the X protocol has supported picking your own byte order for 30 years, and as tight as resources were back then, nobody ever complained that it was a problem. – Blrfl Apr 24 '16 at 11:33
8

It's your protocol.

You can't safely ignore it. But you can safely label it. You control the client and the server. You control the protocol. Doesn't it make sense not to care whether it's big-endian or little-endian so long as you know whether both sides agree?

This means overhead. Now you have to mark your endianness somehow. Do that, and I can read it on anything.

If you don't want data overhead, and your CPU is bored and looking for something to do, then conform.

Peter Mortensen
  • 1,050
  • 2
  • 12
  • 14
candied_orange
  • 102,279
  • 24
  • 197
  • 315
6

So, my question is: Can I safely ignore the endianess and just send little-endian data?

There are two interpretations of that:

  • If you design your applications / protocols to always1 send little-endian, then you are NOT ignoring endianess.

  • If you design your applications / protocols to send / receive whatever the native endianess is, then they will work as long as you run your applications on platforms with the same native endianess.

    Is that "safe"2? That is for you to judge! But certainly there are common hardware platforms that use little-endian, big-endian or ... bi-endian.

    Reference:

What are the disadvantages?

The obvious disadvantage of ignoring endianess is that if you / your users need to run your applications / protocol between platforms with different native endianess, then you have a problem. The applications will break, and you will need to change them to fix the problem. And deal with version compatibility problems, etcetera.

Clearly, most current generation platforms are natively little-endian, but 1) some are not, and 2) we can only guess what will happen in the future.


1 - Always ... including on platforms that are natively big-endian.

2 - Indeed, what does "safe" mean? If you are asking us to predict the future direction of hardware platforms ... I'm afraid that is not objectively answerable.

Stephen C
  • 25,180
  • 6
  • 64
  • 87
5

Various protocols used to transmit data between servers use little endian numbers:

  1. BSON
  2. Protocol Buffers
  3. Capn Proto

See https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats, for details on various formats some of which have little-endian numbers, and some have big-endian numbers.

There is absolutely nothing wrong with using a protocol based on little endian numbers. A big endian machine is just as capable of reading little endian numbers as a little endian machine can read big endian numbers. Many people have done it specifically to avoid the extra computation cost of decoding big-endian numbers on little endian machines.

If you build your protocol on top of one of these existing protocols, then you don't even have to worry about the issue yourself, its already taken care of. When you decide to run your code on a big-endian platform, then the libraries that implement these protocols will automatically take care of ensuring that you decode the values correctly.

Winston Ewert
  • 24,732
  • 12
  • 72
  • 103
3

The standard BSD networking stack in C has the hton/ntoh functionality ( network-to-host/host-to-network ) which expand to no-ops on network-native machines (big endian). You'd need your own counterparts to these for the scenario in which network-native byte order is little endian.

That's the robust way to do it.

It'd be unconventional, but I see nothing wrong with it. Networked computers always get bytestreams and they need to agree on protocols on how to interpret those bytes. This is just part of it.

cat
  • 734
  • 1
  • 7
  • 15
Petr Skocik
  • 240
  • 2
  • 10
3

Endianness is not the only consideration. There is the size of integers, there is packing of structs that you might want to send or receive, and so on.

You can ignore all this. Nobody can force you. On the other hand, the safe and reliable way is to document an external format, and then write code that will read or write the external format correctly, no matter what your processor, your programming language, and the implementation of your programming language are.

Usually it's not much code. But it has a huge benefit: People reading your code won't suspect that you are clueless, know nothing about interchanging external data, and write code that generally cannot be trusted.

gnasher729
  • 42,090
  • 4
  • 59
  • 119
3

I dont think any of the answers are quite precise enough. According to Wikipedia endianness is the order of bytes comprising a word.

Lets take 4 bytes and interpret them as an int. One a little endian system the bytes will be interpreted from right-to-left, and vice-verca on a big endian system. Obviously it is important to agree on which end to interpret an int.

Lets zoom out a little bit to modern network protocols which could be using json or xml. None of those formats will transfer an int as 4 bytes. They will transfer the data as text which will be parsed as an int on the receiving side.

So in the end endianness doesn't matter when using json or xml. We still need to use big endian for tcp headers which is why it is called network byte order, but most programmers don't need to mess with those on a daily basis.

The most widely used encoding mostly today is utf-8 which happpens to also beimmune to problems regarding endianness.

So I would say yes. It is safe to ignore endianness when using text based formats transferred using utf-8.

Esben Skov Pedersen
  • 5,098
  • 2
  • 21
  • 24
  • two down votes and no comments. Great. – Esben Skov Pedersen Apr 24 '16 at 06:03
  • 1
    I wasn't the downvoter but this answer seems to be ignoring/dismissing a perfectly valid question. Just because some protocols are text based doesn't mean that all protocols should be. – Peter Green Apr 24 '16 at 14:43
  • 4
    I upvoted this because it touches the fact that the payload format has nothing to do with the underlying protocols. Some people just love to dig into made-up problems. – Zdenek Apr 24 '16 at 17:09
  • I upvoted this too. Because compared to the asked question, I do think that many people would miss the fact that one **only** need to concern about the byte order, **only when** they would interpret the **raw** transmitted bytes with a `multiple bytes <--> 1 object` conversion rule. But why do we need to convert the binary IP address and port number to byte order? Because the IP and port number would be interpreted by all the intermediates , and for example, they may get 4 bytes and translate into a port number. `4 bytes -> 1 object`, in that case, byte order matters. – Rick May 18 '20 at 03:43
2

One example of a big endian system is the MIPS used in routers. Both ARM and MIPS are endian-switchable, but often MIPS is big endian because it makes making network hardware easier (the most significant part of a word is the part you receive first and can make a routing decision before you've received the rest of the word, rather than having to buffer the whole word).

So it depends what you mean by 'Linux', but if you ever want to run your server app on a smaller system like a router running OpenWRT then you may have to consider big endian support.

As usual, making simplifying assumptions is a perfectly sensible optimisation until such time as you hit something that doesn't fit the assumptions. Only you can say how painful it would be to unwind them if you ever come across such a problem.

user1908704
  • 129
  • 1
0

Big endian systems seem to be on their way out. Many of the traditional unixes used big endian but they have been in decline for years in favor of linux on x86.

arm is bi-endian but the big endian variant seems to be rarely seen.

mips exists in both variants. Afaict the big endian variant is mostly seen on networking applicances (for historical reasons internet protocols generally use big endian).

ppc was traditionally big endian with some parts supporting both endians but IBM seems to now be pushing little endian mode for 64-bit ppc (they recently pushed ppc64el ports into Debian and Ubuntu).

sparc is normally big endian but again seems to be in decline.

If you are implementing an existing protocol then obviously you have to follow it's specifications. If you want the IETF to bless your new protocol then big endian is likely to be easier because that is what they already use in their existing protocols but IMO for a new "greenfield" protocold design little endian is the way to go.

You can either put in macros from the start which will be no-ops on little endian systems or you can not bother until/unless you need to port to a big endian system.

Peter Green
  • 2,125
  • 9
  • 15