Jeremy W. Sherman

stay a while, and listen

Arithmetic Will Bite You One Day

Int: A Young Love

Early C has an innocent air. Take for example this bounds-checking function, which converts a file descriptor to a pointer, after checking that the file descriptor is a valid index:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
getf(f)  /* Unix 6th edition: unix/fio.c:6619 */
{
    register *fp, rf;

    rf = f;
    if(rf<0 || rf>=NOFILE)
            goto bad;
    fp = u.u_ofile[rf];
    if(fp != NULL)
            return(fp);
bad:
    u.u_error = EBADF;
    return(NULL);
}

Want a value in a register? Use register. It does what it says on the tin. (At least it did then.)

Types? Those are a syntactic convenience. Go with the flow and use the native word size: the default type is int, and that’s nearly all you’ll need. It’s the (default, so not explicitly declared) type of both fp and rf in this function. And what do you think the function’s return type is, eh?

The Love That Wouldn’t Die

Modern C shows its continuing love for int in subtle ways that will one day corrupt your code.

It’s subtle, because most of the time, C arithmetic just works, to the point where you can remain unaware of what’s actually going on when you perform some innocent-looking arithmetic.

What’s (0xFFFF << 24)? Let’s see:

1
2
3
4
uint64_t mask = (0xCAFF << 24);
uint64_t expected = 0xCAFF000000;
printf("%" PRIx64 " == %" PRIx64 "? %d\n", mask, expected, mask == expected);
/* ffffffffff000000 == caff000000? 0 */

Well, that ain’t right.

And if you’ve got warnings turned on, your compiler might even be so kind as to warn you that you’re doing something boneheaded:

1
2
3
4
5
mask.c:7:25: warning: signed shift result (0xCAFF000000) requires 41 bits to
represent, but 'int' only has 32 bits [-Wshift-overflow]
    uint64_t mask = (0xCAFF << 24);
                     ~~~~~~ ^  ~~
1 warning generated.

Int, what int? I ordered a uint64_t, thank you muchly.

But “‘int’ only has 32 bits”. And the compiler ran out of bits. And then, once it got done shifting the int around, it widened it to take up a full 64 bits, and the sign bit came with it:

  • We wanted

    0000_00ca_ff00_0000
    
  • But our int only had room for

    ff00_0000
    
  • Which widens to the uint64_t

    ffff_ffff_ff00_0000
    

Integer Promotions and Arithmetic Conversions

So you see, int is still very much the preferred type for integral literals and arithmetic.

This preference is embedded in the core rules underlying C arithmetic:

  • The integer promotions, which describe how integral types smaller than int get promoted to int or unsigned int
  • The usual arithmetic conversions, which describe how to pick a common type for the arguments to an arithmetic operation, and what the final result type should be.

Both rules make extensive use of the integer conversion rank of the various integral types, which we can roughly summarize as:

  • Bigger integral types have a higher rank.
  • Unsigned and signed types of the same size have the same rank.

Put all this together, and you can draw up a big spreadsheet of what value converts how with what other type of value, and how the arithmetic operation’s result type gets picked.

And that’s not even to speak of the fun we can have with the limited precision provided by a fixed-width integral type, namely overflow (INT_MAX + 1) and underflow (INT_MIN - 1).

What do you do?

  • Crank up compiler warnings (-Weverything -Wextra -Werror)
  • Be suspicious of arithmetic:
    • Are the types right, even after promotion?
    • Can this overflow?
    • Can this underflow?
  • Be doubly suspicious of external data, whether from files, the network, or even your own web service.

Further Reading

Even CPU simulator writers can get it wrong, as shown by The cltq story.

Arithmetic problems regularly feature as security vulnerabilities; you can dig into this angle starting with INT02-C. Understand integer conversion rules from the CERT Secure Coding Standards.

And you can dive deep into the details, and several example vulnerabilities, starting with Type Conversions from the “C Language Issues for Application Security” chapter of McDonald et. al.’s Art of Software Security Assessment.