RSS

Sleep Considered Harmful

11 May

I’m not talking about “I’ll sleep when I’m dead” but, with apologies to Edsger Dijkstra, if you are programming a modern computer and you find yourself using sleep() there is probably a better way to write your program.

Recently, my team has been porting fairly old software to a system with more than ten times the performance of the previous system and sleep() has bitten us several times.

How Did We Get Here?

The first computers were slow and expensive enough that the thought of deliberately slowing them down was ludicrous. But the first personal computers — slow as they were by today’s standards — interacted with people in ways that sometimes meant making them work in human time, slowed down considerably from even the 4.77 MHz that a IBM PC was clocked at. Say you wanted to blink the cursor in a text editor, alternating the current character between light on dark and dark on light every half second. That’s on, wait a half second, off, wait a half second, and repeat. That half second is nearly two and a half million clock cycles! How do you do that?

A common technique for timing things on early, single-user computers was a wait loop or busy loop. Knowing how long it took the CPU to simply increment an integer, you would construct a loop like:

/* Count to 1 million to introduce a small delay */
for (i = 0; i < 1000000; ++i) {
    /* Do nothing */
}

There are several problems with that code in the context I describe. Many or most early PC programs were written in assembly language. Also, it wasn’t possible to represent one million in an 8- or 16-bit number, so you might have nested loops that each counted to one thousand. But the real problem with that is the assumption in the algorithm: that you know how long it takes to increment i and test it against the limit. This is not invariant.

Before long, computers that were mostly compatible with the IBM PC — even other systems from IBM — had processors that ran at 6 MHz or even faster. Blinking a cursor 20% faster isn’t a big deal, but myriad other user interactions — like the repeat rate when you hold down a key — matter a great deal and can frustrate users when they don’t behave as expected. The answer — well, one answer — is for the system to provide a means to wait a specified amount of time. This function is often called sleep() and the loop above might be replaced by:

/* Wait one second */
sleep(1);

It might be that sleep() is implemented by a busy loop, but the system can tune itself to do the right thing — loop the right number of times — to accomplish the delay desired by the programmer.

Sophisticated operating systems might allow other things to happen while the sleeping program was waiting. Early versions of Microsoft Windows did something like this. They implemented “cooperative multitasking,” wherein every program had to periodically call a yield() function to allow other programs to have some system resources.

Cooperative multitasking was an awkward stepping stone to “preemptive multitasking,” wherein you write your program without concern for other programs that may run at the “same time” and the operating system preemptively interrupts your program to let others have system resources. (A consequence of this inversion is the need to mark critical sections of your code so the system doesn’t interrupt them. But that’s a topic for another day.)

Coordination

With truly independent programs, sleep() isn’t a terrible solution. But software often has to wait for hardware and as computers grew faster and applications grew more sophisticated, it became common for two or more programs to cooperate to achieve some goal. Programmers with sleep() in their tool box would write code like:

/* Wait for the data server to start up */
sleep(5);

On some system at some time, five seconds may have been the right answer. But any number of things can make the server take more or less time to start. If it takes less, then the program waiting for it is wasting time and the system is not as responsive as it could be. Worse, if the server takes longer to start, this program proceeds, trying to make use of services from another program that aren’t available. The correct way to handle this is for the server to implement some “I’m ready” indicator that clients like this code can look for. Something like:

// Wait for the data server to start up
serverFound = false;
timeOut = 1;
for (i = 0; i < 30; ++i) {
    if (serverReady(timeOut)) {
        serverFound = true;
        break;
    }
}

The serverReady() function waits timeOut seconds at most for the server to be ready. It might be implemented with a system function like select() or pthread_cond_wait().

Timing

There’s one last scenario where sleep() is really the wrong answer: polling loops. Many data collection applications have core logic like:

// Poll for data every 100 ms
while (true) {
    // Request data
    sendQueries();
    // Handle data
    processResponses();
    // Wait before polling again
    usleep(50000);
}

Here the programmer has relied on experience that requesting and processing data take 50 milliseconds so the microsleep should take another 50 for the whole loop to take the desired 100. But we’re back to the original problem with IBM PC wait loops: assuming the performance of the system. If originally sending queries took 10 ms and processing responses took 40, a performance improvement that let processing run 40 percent faster shortens the overall loop to around 85 ms, changing the program behavior from polling 10 times a second to polling 12 times a second. A better idiom for this is to use a timer.

// now() returns system clock in milliseconds
expiration = now();
setTimer(expiration);
while (true) {
    if (timerExpired())
        sendQueries();
        processResponses();
        expiration += 100;
        setTimer(expiration);
    }
}

This is still not perfect. If sending and processing take more then 100 ms together, the timer is set to a time in the past and will be expired at the top of the next loop. Your application requirements will dictate whether skipping one or more scans is the best fix or if you need some other approach. Still, this algorithm is immune to better performance of the underlying system, a much more likely scenario than worse performance.

Just as high-level languages provided if, for, while and such obviating the dangerous GOTO, so too do modern systems provide semaphores, timers, and such obviating the dangerous sleep(). If you are still using sleep(), wake up!

No Perfect Program

Sleep Considered Harmful

How Did We Get Here?

Coordination

Timing

Like this:

Related

Leave a ReplyCancel reply

Pages

Archives

Categories

No Perfect Program

Sleep Considered Harmful

How Did We Get Here?

Coordination

Timing

Share this:

Like this:

Related

Leave a ReplyCancel reply

Pages

Archives

Categories

Discover more from No Perfect Program