Jeremy W. Sherman

stay a while, and listen

Compatibility Basics

Supporting multiple platforms – “platform” being the ramshackle combination of languages, libraries, OS, hardware, etc., that your code depends on to run – is a burden.

But you’ve no choice once you push data out of your application and off into the wide wide world. Any data you write to disk or network is potentially going to end up being read back on another platform. It could be something wildly different – data sent from an iPhone 5 to a little-endian MIPS machine on someone’s desktop halfway across the world – or it could be something less drastic, like an old file written to disk by version 1 of your application and then read back several OS and application upgrades later.

There are techniques you can adopt to mitigate problems, but many applications will founder on far simpler issues well before they reach that level of sophistication.

You likely take for granted:

  • How large an integer is.
  • What happens when you add 1 to the largest representable signed integer on your platform.
  • Whether the bytes in a word are stored big-end or little-end first.
  • That C strings are encoded using UTF-8.
  • Struct padding is always the same, right?

But all these assumptions are a trap:

  • Integer size varies from machine to machine.

    A user ditches their 32-bit machine and restores all their data to a 64-bit machine. Can your app still load its saved files from before the migration?

    OS X developers of some years past are well-acquainted with this problem, and Apple wrote the 64-Bit Transition Guide to aid them – and now you – in coping with this issue.

  • Integer overflow is undefined behavior in C. Unchecked overflow also often presents a security risk. When a C program is compiled for an architecture using two’s-complement representation for integers (read: pretty much everything you’ll likely ever compile for, but not everything someone else might want to compile your code for), often integers will wrap around from their largest to their smallest representable value on overflow.

    But not always, and sometimes the behavior changes as you change compiler optimizations, so don’t count on it!

  • Endianness bites most people when they first do networking.

    They soon find ntohs/ntohl and htons/htonl, which are used to convert port numbers (s, short, 16-bit) and IPv4 addresses in numeric (l, long, 32-bit) format between net (i.e., big) and host-endian representations.

    There’s no great magic hidden in these functions, except that they bake in knowledge of whether the host platform stores bytes in network byte order natively or requires byte swapping (or more complicated hijinks) to convert to and from network byte order.

    A simple byte-swap requires no great magic:

    • Initialize an output value to all zeroes.
    • Repeatedly:
      • Rotate the byte you’re working on now to be rightmost.
      • Mask it off using a bit-and.
      • Shift it into the correct (mirrored about the middle) position.
    • Bit-or it into the output value.
1
2
3
4
5
6
7
8
9
10
11
12
13
/** Swaps bytes, and so swaps big/little endianness.
 *  (Hope your platform doesn't use packed binary-coded decimal!) */
uint16_t byteswap16(uint16_t in) {
{
    uint16_t out = 0;
    for (size_t i = 0, e = sizeof(in); i < e; ++i) {
        uint16_t shifted_right = (in >> (i * CHAR_BIT));
        uint16_t byte_i = shifted_right & 0xFF;
        uint16_t mirrored = (byte_i << ((e - (i + 1)) * CHAR_BIT));
        out |= mirrored;
    }
    return out;
}

But endianness is also an issue for binary data files, which includes many applications' document formats. If you just spit raw bytes to disk, in whatever order you find them in-memory, then you’ll run into trouble when an app in a different endian environment slurps that file in.

  • C strings can use whatever encoding. If you’re doing data interchange, you need to make sure the encoding is specified, and take the appropriate steps to convert strings to and fro.

All of these traps are straightforward to avoid once you’re aware of them, but are a royal pain to redress after you’ve built a mountain of code atop platform-naïve foundations.

Unless you’ve got these right, don’t even bother worrying about the more obvious (and #ifdef-multiplying) differences between platforms.