I have a query regarding big endian and little endian.
Basically the conversion is used to reverse the byte order in memory
.
When we need to do the conversion, do we need convert each and every data types?
I have a query regarding big endian and little endian.
Basically the conversion is used to reverse the byte order in memory
.
When we need to do the conversion, do we need convert each and every data types?
The answer is: it depends.
If you're sharing data with another platform, or (de)serializing some binary format with defined endianness, then you need to match that platform's or format's endianness.
Assuming that target endianness is well defined (and different from your native endianness), that will tell you what conversions you need.
Oh, and I'd suggest using htons
, htonl
and friends rather than twiddling the bytes manually - they're more likely to get optimized to a single BSWAP
instruction or similar.
The internal storage of the native integer format is one aspect of the internal binary representation of numbers on various processors (two's complement vs. one's complement is another). The position of the most-significant to least-significant bytes is an important aspect of data storage and transport. For example byte order is a concern when transferring data between two systems using a binary data format (examples: XDR, Xupl, UBJson, etc).
We can examine 32-bit (b32) and 16-bit (b16) words on a machine. They are stored in either four bytes (byte[4]
) or two bytes (byte[2]
), and can be examined as an array of bytes. Consider the arrangement of the bytes in a 32-bit word (byte[4]). Label the bytes A,B,C,D, and there are 4! = 24
possible permutations,
ABCD ABDC ACBD ACDB ADBC ADCB
BACD BADC BCAD BCDA BDAC BDCA
CABD CADB CBAD CBDA CDAB CDBA
DABC DACB DBAC DBCA DCAB DCBA
Two of those permutations, ABCD and DCBA, are often used on processors, and we call these orderings the 'endianness' of the processor. That is to say that the order of the bytes defines the endianness of the processor. Suppose label A is most significant and D is least significant, then these two formats have special names,
The 16-bit case is much simpler, only two permutations, AB and BA.
Other formats are used; the pdp-11 had a middle-endian layout, BADC.
There are many processors (ARM, PowerPC, Sparc9+, Alpha, Mips, Pa-Risc, IA-64) which can switch between big-endian and little-endian.
Here is a short C-program that will tell you the endianness of your processor,
#include <stdio.h>
#include <string.h>
typedef unsigned long ulong;
typedef unsigned char uchar;
int
main(int argc, char *argv[])
{
uchar word[4] = {(uchar)0x01,(uchar)0x23,(uchar)0x45,(uchar)0x67};
ulong be = 0x01234567;
ulong le = 0x67452301;
ulong me = 0x23016745;
ulong we; ulong ue;
memcpy(&we,word,4);
if( we == be ) printf("Big-endian\n");
if( we == le ) printf("Little-endian\n");
if( we == me ) printf("Middle-endian\n");
char UNIX[4+1]="UNIX";
ue = ((ulong)'U'); ue<<=8;
ue += ((ulong)'N'); ue<<=8;
ue += ((ulong)'I'); ue<<=8;
ue += ((ulong)'X');
printf("%s = %.4s\n",UNIX,(char*)&ue);
int ndx;
uchar *p = word;
printf("@%x:\n", p );
for( ndx=0; ndx<sizeof(we); ndx++ )
{
printf("[%02x] %03d:%02x\n", ndx, p[ndx], p[ndx] );
}
}
the conversion itself is usually some fancy bit twiddler solution like: (for a 32 bit integer)
i = ((i&0xff000000) >>> 24) | ((i&0x00ff0000) >> 8) | ((i&0x0000ff00) << 8) | ((i&0x000000ff) <<24)
why it is needed is that some architectures are big endian and some are little endian and if they want to communicate through a byte stream they need to agree on endianness on the byte stream so that a 1
doesn't become a 16,777,216
(2^24
)
the conversion is only used for primitives that are larger than 1 byte; the other (composite) types are generally defined as a sequence of other composite types and/or primitives that will remain in order
for example the LAS file format is defined in little endian. this means the first header bytes in order are:
L
)A
)S
)F
)and so on