History for UnfairLinuxThreadsAndPythonOhMy

??changed:
-
From a post to comp.lang.python, 20 April 1999, by NeilSchemenaur:

 I think I might have found part of the problem.  My Debian Linux
 system has glibc-2.1.3 which includes LinuxThreads 0.7.  From the
 LinuxThreads FAQ:

    -- D.6: Scheduling seems to be very unfair when there is strong
    contention on a mutex: instead of giving the mutex to each
    thread in turn, it seems that it's almost always the same
    thread that gets the mutex. Isn't this completely broken
    behavior?

    -- What happens is the following: when a thread unlocks a mutex,
    all other threads that were waiting on the mutex are sent a
    signal which makes them runnable. However, the kernel
    scheduler may or may not restart them immediately. If the
    thread that unlocked the mutex tries to lock it again
    immediately afterwards, it is likely that it will succeed,
    because the threads haven't yet restarted. This results in an
    apparently very unfair behavior, when the same thread
    repeatedly locks and unlocks the mutex, while other threads
    can't lock the mutex.

    -- This is perfectly acceptable behavior with respect to the
    POSIX standard: for the default scheduling policy, POSIX
    makes no guarantees of fairness, such as "the thread waiting
    for the mutex for the longest time always acquires it first".
    This allows implementations of mutexes to remain simple and
    efficient. Properly written multithreaded code avoids that
    kind of heavy contention on mutexes, and does not run into
    fairness problems. If you need scheduling guarantees, you
    should consider using the real-time scheduling policies
    SCHED_RR and SCHED_FIFO, which have precisely defined
    scheduling behaviors.

 Threaded Python contends heavily for a few mutexes.  Adding
 sched_yield() to a few strategic places seems to improve things a
 lot but I don't know if it is the proper solution.  Does anyone
 else know better?  LinuxThreads 0.8 is supposed to be more fair.

 I think the attached code shows the problem (or maybe I just
 don't understand threads at all :).  On my _uniprocessor_ machine
 I get about four stars before the new thread seems to stop
 running::

  ======================================================================
  import thread
  import os
  import sys
  
  def run():
      while 1:
          if os.fork() == 0:
              sys.stderr.write('*')
              break
          os.wait()
  
  thread.start_new_thread(run, ())
  while 1:
      pass

====================================================================

additional comments by Tony Rossignol (mailto:[email protected]) 2000-04-25

Background:  

  We are running three Linux RedHat servers as our Zope server farm.   Two of these servers have kernel 2.2.12 w/ glibc 2.1.2 and the third has kernel 2.2.5 w/ glibc 2.0.7.  All three are dual PentiumIII servers, varying in speed from 400-500Mhz.  The machine with the older kernel is the slowest box.

  Both servers with the newer kernel/glibc experience unexplained restarts; server 3, the slower/older system does not experience these restarts.  Frequently server 3 will remain up for 24 hours (we have nightly restarts when a clone ZODB is copied over).


Results:  

  Running Neil's script (from above) on the various servers resulted in the following:  servers 1&2 resulted in between 2 to 20 stars being printed before the new thread seemed to stop running.  CPU was totally being eaten up by the python process;  server 3 printed stars until the process was killed.

Meaning:  

  I don't know.  But this is the first solid example I've seen that illustrates the observed differences between our servers, and offers some indication as to cause.