endian dilemma

Jerry Feldman gaf at blu.org
Fri Sep 6 10:17:36 EDT 2002


One more comment on solving endianism issues over communications channels 
between different systems has been solved in different ways. There is a 
FIPS standard called asn.1 (I have not updated my knowledge on this 
recently). All data is transmitted in TLD formant (Type,.length, data). In 
general, each data type is given a one byte type tag, followed by a length 
of 0 or more bytes (Some types may not require a length). In general 
lenghts are 1 byte if the length is less than 127, and mult-byte when the 
length is > 127. The first byte is the length of the length with the sign 
bit turned on. Data is generally represented in an ascii non-delimited 
string. Compount types, like structures are simply groups of TLD elements. 
For example, let's assume that an integral positive type is 5, and an 
integral negative type is 6. The reason for the two is that negatives are 
converted to positive numbers. Leading zeroes are suppressed. 
So your encoding routine on the sending side would convert an integral (eg. 
long) to a series of ascii characters, zero suppressed, with the first 
digit being the most significant. The receiving side would then take the 
resulting sting, make a local copy and add a zero delimiter at the end, and 
pass it into the standard C function, strtoul().  This would work fine 
between big and little endian as well as 32 and 64 bit systems. Floating 
point data could easily be converted to ASCII in the same manner. Special 
data types, like time and date exist such that they would be time zone and 
system neutral. Shortcut types might also be crteated to represent common 
elements, such as the decimal digits 0-9, and -1. And, as I mentioned, 
structures are simply compound types. 

Sending binary data, even endian neutral, can be dangerous. For instance, 
in C, the following structure indicates the problem.
struct {
	char a;
	long b;
};
The size of the struct on a 32 bit CISC system might be 9 bytes, but on a 
32 bit RISC would be 8 bytes long, because field b must be aligned on a 32 
bit boundary. On a 64 bit system (Alpha, PA RISC 2, Itanium), the structure 
would be 16 byes long because of natural alignment. Additionally, 
structures themselves might be aligned on special boundaries, which could 
be 32, 64 or 128 bit aligned. So, the TLD stuff outlined above solves this, 
but does add overhead. 
-- 
Jerry Feldman <gaf at blu.org>
Associate Director
Boston Linux and Unix user group
http://www.blu.org PGP key id:C5061EA9
PGP Key fingerprint:053C 73EC 3AC1 5C44 3E14 9245 FB00 3ED5 C506 1EA9




More information about the Discuss mailing list