X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=blobdiff_plain;f=pod%2Fperlthrtut.pod;h=7cac46fc55fdd0edc3613f0d23ebe989c1ae1d74;hb=0c2f6559512b2211f892f1a6ae8db4739c5369b4;hp=6e3bcb0323d62cb1bd1a60d6699dae0f0eb805c6;hpb=6eded8f358cae623164c8467748ef5272e9fd62b;p=p5sagit%2Fp5-mst-13.2.git diff --git a/pod/perlthrtut.pod b/pod/perlthrtut.pod index 6e3bcb0..7cac46f 100644 --- a/pod/perlthrtut.pod +++ b/pod/perlthrtut.pod @@ -5,9 +5,15 @@ perlthrtut - tutorial on threads in Perl =head1 DESCRIPTION B: this tutorial describes the new Perl threading flavour -introduced in Perl 5.6.0 called interpreter threads, or ithreads -for short. There is another older perl threading flavour called -the 5.005 model, unsurprisingly for 5.005 versions of Perl. +introduced in Perl 5.6.0 called interpreter threads, or B +for short. In this model each thread runs in its own Perl interpreter, +and any data sharing between threads must be explicit. + +There is another older Perl threading flavour called the 5.005 model, +unsurprisingly for 5.005 versions of Perl. The old model is known to +have problems, deprecated, and will probably be removed around release +5.10. You are strongly encouraged to migrate any existing 5.005 +threads code to the new model as soon as possible. You can see which (or neither) threading flavour you have by running C and looking at the C section. @@ -16,6 +22,15 @@ have C you have 5.005 threads. If you have neither, you don't have any thread support built in. If you have both, you are in trouble. +The user-level interface to the 5.005 threads was via the L +class, while ithreads uses the L class. Note the change in case. + +=head1 Status + +The ithreads code has been available since Perl 5.6.0, and is considered +stable. The user-level interface to ithreads (the L classes) +appeared in the 5.8.0 release, and as of this time is considered stable +although it should be treated with caution as with all new features. =head1 What Is A Thread Anyway? @@ -87,82 +102,7 @@ another thread. Prime and Fibonacci generators both map well to this form of the pipeline model. (A version of a prime number generator is presented later on.) -=head1 Native threads - -There are several different ways to implement threads on a system. How -threads are implemented depends both on the vendor and, in some cases, -the version of the operating system. Often the first implementation -will be relatively simple, but later versions of the OS will be more -sophisticated. - -While the information in this section is useful, it's not necessary, -so you can skip it if you don't feel up to it. - -There are three basic categories of threads: user-mode threads, kernel -threads, and multiprocessor kernel threads. - -User-mode threads are threads that live entirely within a program and -its libraries. In this model, the OS knows nothing about threads. As -far as it's concerned, your process is just a process. - -This is the easiest way to implement threads, and the way most OSes -start. The big disadvantage is that, since the OS knows nothing about -threads, if one thread blocks they all do. Typical blocking activities -include most system calls, most I/O, and things like sleep(). - -Kernel threads are the next step in thread evolution. The OS knows -about kernel threads, and makes allowances for them. The main -difference between a kernel thread and a user-mode thread is -blocking. With kernel threads, things that block a single thread don't -block other threads. This is not the case with user-mode threads, -where the kernel blocks at the process level and not the thread level. - -This is a big step forward, and can give a threaded program quite a -performance boost over non-threaded programs. Threads that block -performing I/O, for example, won't block threads that are doing other -things. Each process still has only one thread running at once, -though, regardless of how many CPUs a system might have. - -Since kernel threading can interrupt a thread at any time, they will -uncover some of the implicit locking assumptions you may make in your -program. For example, something as simple as C<$a = $a + 2> can behave -unpredictably with kernel threads if $a is visible to other -threads, as another thread may have changed $a between the time it -was fetched on the right hand side and the time the new value is -stored. - -Multiprocessor kernel threads are the final step in thread -support. With multiprocessor kernel threads on a machine with multiple -CPUs, the OS may schedule two or more threads to run simultaneously on -different CPUs. - -This can give a serious performance boost to your threaded program, -since more than one thread will be executing at the same time. As a -tradeoff, though, any of those nagging synchronization issues that -might not have shown with basic kernel threads will appear with a -vengeance. - -In addition to the different levels of OS involvement in threads, -different OSes (and different thread implementations for a particular -OS) allocate CPU cycles to threads in different ways. - -Cooperative multitasking systems have running threads give up control -if one of two things happen. If a thread calls a yield function, it -gives up control. It also gives up control if the thread does -something that would cause it to block, such as perform I/O. In a -cooperative multitasking implementation, one thread can starve all the -others for CPU time if it so chooses. - -Preemptive multitasking systems interrupt threads at regular intervals -while the system decides which thread should run next. In a preemptive -multitasking system, one thread usually won't monopolize the CPU. - -On some systems, there can be cooperative and preemptive threads -running simultaneously. (Threads running with realtime priorities -often behave cooperatively, for example, while threads running at -normal priorities behave preemptively.) - -=head1 What kind of threads are perl threads? +=head1 What kind of threads are Perl threads? If you have experience with other thread implementations, you might find that things aren't quite what you expect. It's very important to @@ -183,33 +123,38 @@ do it. However it is important to remember that Perl threads cannot magically do things unless your operating systems threads allows it. So if your -system blocks the entire process on sleep(), perl usually will as well. +system blocks the entire process on sleep(), Perl usually will as well. + +Perl Threads Are Different. -=head1 Threadsafe Modules +=head1 Thread-Safe Modules -The addition of threads has changed Perl's internals +The addition of threads has changed Perl's internals substantially. There are implications for people who write -modules with XS code or external libraries. However, since the threads -do not share data, pure Perl modules that don't interact with external -systems should be safe. Modules that are not tagged as thread-safe should -be tested or code reviewed before being used in production code. +modules with XS code or external libraries. However, since perl data is +not shared among threads by default, Perl modules stand a high chance of +being thread-safe or can be made thread-safe easily. Modules that are not +tagged as thread-safe should be tested or code reviewed before being used +in production code. Not all modules that you might use are thread-safe, and you should always assume a module is unsafe unless the documentation says otherwise. This includes modules that are distributed as part of the core. Threads are a new feature, and even some of the standard -modules aren't thread-safe. (*** I think ActiveState checked this for -psuedofork, check with GSAR) +modules aren't thread-safe. -Even if a module is threadsafe, it doesn't mean that the module is optimized +Even if a module is thread-safe, it doesn't mean that the module is optimized to work well with threads. A module could possibly be rewritten to utilize the new features in threaded Perl to increase performance in a threaded environment. If you're using a module that's not thread-safe for some reason, you -can protect yourself by using semaphores and lots of programming -discipline to control access to the module. Semaphores are covered -later in the article. Perl Threads Are Different +can protect yourself by using it from one, and only one thread at all. +If you need multiple threads to access such a module, you can use semaphores and +lots of programming discipline to control access to it. Semaphores +are covered in L. + +See also L. =head1 Thread Basics @@ -230,36 +175,49 @@ Your programs can use the Config module to check whether threads are enabled. If your program can't run without them, you can say something like: - $Config{useithreads} or die "Recompile Perl with threads to run this program."; + $Config{useithreads} or die "Recompile Perl with threads to run this program."; A possibly-threaded program using a possibly-threaded module might have code like this: - use Config; - use MyMod; - - if ($Config{useithreads}) { - # We have threads - require MyMod_threaded; - import MyMod_threaded; - } else { - require MyMod_unthreaded; - import MyMod_unthreaded; - } + use Config; + use MyMod; + + BEGIN { + if ($Config{useithreads}) { + # We have threads + require MyMod_threaded; + import MyMod_threaded; + } else { + require MyMod_unthreaded; + import MyMod_unthreaded; + } + } Since code that runs both with and without threads is usually pretty messy, it's best to isolate the thread-specific code in its own module. In our example above, that's what MyMod_threaded is, and it's only imported if we're running on a threaded Perl. +=head2 A Note about the Examples + +Although thread support is considered to be stable, there are still a number +of quirks that may startle you when you try out any of the examples below. +In a real situation, care should be taken that all threads are finished +executing before the program exits. That care has B been taken in these +examples in the interest of simplicity. Running these examples "as is" will +produce error messages, usually caused by the fact that there are still +threads running when the program exits. You should not be alarmed by this. +Future versions of Perl may fix this problem. + =head2 Creating Threads The L package provides the tools you need to create new -threads. Like any other module, you need to tell Perl you want to use +threads. Like any other module, you need to tell Perl that you want to use it; C imports all the pieces you need to create basic threads. -The simplest, straightforward way to create a thread is with new(): +The simplest, most straightforward way to create a thread is with new(): use threads; @@ -278,10 +236,11 @@ part of the thread startup. Just include the list of parameters as part of the C call, like this: use threads; + $Param3 = "foo"; $thr = threads->new(\&sub1, "Param 1", "Param 2", $Param3); $thr = threads->new(\&sub1, @ParamList); - $thr = threads->new(\&sub1, qw(Param1 Param2 $Param3)); + $thr = threads->new(\&sub1, qw(Param1 Param2 Param3)); sub sub1 { my @InboundParameters = @_; @@ -295,38 +254,7 @@ off several threads using the same subroutine. Each thread executes the same subroutine, but in a separate thread with a separate environment and potentially separate arguments. -=head2 Giving up control - -There are times when you may find it useful to have a thread -explicitly give up the CPU to another thread. Your threading package -might not support preemptive multitasking for threads, for example, or -you may be doing something compute-intensive and want to make sure -that the user-interface thread gets called frequently. Regardless, -there are times that you might want a thread to give up the processor. - -Perl's threading package provides the yield() function that does -this. yield() is pretty straightforward, and works like this: - - use threads; - - sub loop { - my $thread = shift; - my $foo = 50; - while($foo--) { print "in thread $thread\n" } - threads->yield(); - $foo = 50; - while($foo--) { print "in thread $thread\n" } - } - - my $thread1 = threads->new(\&loop, 'first'); - my $thread2 = threads->new(\&loop, 'second'); - my $thread3 = threads->new(\&loop, 'third'); - -It is important to remember that yield() is only a hint to give up the CPU, -it depends on your hardware, OS and threading libraries what actually happens. -Therefore it is important to note that one should not build the scheduling of -the threads around yield() calls. It might work on your platform but it won't -work on another platform. +C is a synonym for C. =head2 Waiting For A Thread To Exit @@ -335,6 +263,7 @@ for a thread to exit and extract any values it might return, you can use the join() method: use threads; + $thr = threads->new(\&sub1); @ReturnData = $thr->join; @@ -349,7 +278,7 @@ any OS cleanup necessary for the thread. That cleanup might be important, especially for long-running programs that spawn lots of threads. If you don't want the return values and don't want to wait for the thread to finish, you should call the detach() method -instead. detach() is covered later in the article. +instead, as described next. =head2 Ignoring A Thread @@ -364,11 +293,12 @@ it'll run until it's finished, then Perl will clean up after it automatically. use threads; + $thr = threads->new(\&sub1); # Spawn the thread $thr->detach; # Now we officially don't care any more - sub sub1 { + sub sub1 { $a = 0; while (1) { $a++; @@ -377,9 +307,8 @@ automatically. } } - -Once a thread is detached, it may not be joined, and any output that -it might have produced (if it was done and waiting for a join) is +Once a thread is detached, it may not be joined, and any return data +that it might have produced (if it was done and waiting for a join) is lost. =head1 Threads And Data @@ -390,23 +319,56 @@ access that non-threaded programs never need to worry about. =head2 Shared And Unshared Data -The biggest difference between perl threading and the old 5.005 style -threading, or most other threading systems out there, is that all data -is not shared. When a new perl thread is created all data is cloned -and is private to that thread! +The biggest difference between Perl ithreads and the old 5.005 style +threading, or for that matter, to most other threading systems out there, +is that by default, no data is shared. When a new perl thread is created, +all the data associated with the current thread is copied to the new +thread, and is subsequently private to that new thread! +This is similar in feel to what happens when a UNIX process forks, +except that in this case, the data is just copied to a different part of +memory within the same process rather than a real fork taking place. + +To make use of threading however, one usually wants the threads to share +at least some data between themselves. This is done with the +L module and the C< : shared> attribute: + + use threads; + use threads::shared; + + my $foo : shared = 1; + my $bar = 1; + threads->new(sub { $foo++; $bar++ })->join; + + print "$foo\n"; #prints 2 since $foo is shared + print "$bar\n"; #prints 1 since $bar is not shared + +In the case of a shared array, all the array's elements are shared, and for +a shared hash, all the keys and values are shared. This places +restrictions on what may be assigned to shared array and hash elements: only +simple values or references to shared variables are allowed - this is +so that a private variable can't accidentally become shared. A bad +assignment will cause the thread to die. For example: + + use threads; + use threads::shared; + + my $var = 1; + my $svar : shared = 2; + my %hash : shared; + + ... create some threads ... -To make use of threading however, one usually want the threads to share -data between each other. This is done with the L module -and the C< : shared> attribute: + $hash{a} = 1; # all threads see exists($hash{a}) and $hash{a} == 1 + $hash{a} = $var # okay - copy-by-value: same effect as previous + $hash{a} = $svar # okay - copy-by-value: same effect as previous + $hash{a} = \$svar # okay - a reference to a shared variable + $hash{a} = \$var # This will die + delete $hash{a} # okay - all threads will see !exists($hash{a}) - use threads; - use threads::shared; - my $foo : shared = 1; - my $bar = 1; - threads->new(sub { $foo++; $bar++ })->join; - - print "$foo\n"; #prints 2 since $foo is shared - print "$bar\n"; #prints 1 since $bar is not shared +Note that a shared variable guarantees that if two or more threads try to +modify it at the same time, the internal state of the variable will not +become corrupted. However, there are no guarantees beyond this, as +explained in the next section. =head2 Thread Pitfalls: Races @@ -415,6 +377,7 @@ number of pitfalls. One pitfall is the race condition: use threads; use threads::shared; + my $a : shared = 1; $thr1 = threads->new(\&sub1); $thr2 = threads->new(\&sub2); @@ -423,8 +386,8 @@ number of pitfalls. One pitfall is the race condition: $thr2->join; print "$a\n"; - sub sub1 { $foo = $a; $a = $foo + 1; } - sub sub2 { $bar = $a; $a = $bar + 1; } + sub sub1 { my $foo = $a; $a = $foo + 1; } + sub sub2 { my $bar = $a; $a = $bar + 1; } What do you think $a will be? The answer, unfortunately, is "it depends." Both sub1() and sub2() access the global variable $a, once @@ -444,51 +407,69 @@ possibility of error: my $c : shared; my $thr1 = threads->create(sub { $b = $a; $a = $b + 1; }); my $thr2 = threads->create(sub { $c = $a; $a = $c + 1; }); - $thr1->join(); - $thr2->join(); + $thr1->join; + $thr2->join; Two threads both access $a. Each thread can potentially be interrupted at any point, or be executed in any order. At the end, $a could be 3 or 4, and both $b and $c could be 2 or 3. +Even C<$a += 5> or C<$a++> are not guaranteed to be atomic. + Whenever your program accesses data or resources that can be accessed by other threads, you must take steps to coordinate access or risk -data corruption and race conditions. +data inconsistency and race conditions. Note that Perl will protect its +internals from your race conditions, but it won't protect you from you. + +=head1 Synchronization and control + +Perl provides a number of mechanisms to coordinate the interactions +between themselves and their data, to avoid race conditions and the like. +Some of these are designed to resemble the common techniques used in thread +libraries such as C; others are Perl-specific. Often, the +standard techniques are clumsy and difficult to get right (such as +condition waits). Where possible, it is usually easier to use Perlish +techniques such as queues, which remove some of the hard work involved. =head2 Controlling access: lock() The lock() function takes a shared variable and puts a lock on it. -No other thread may lock the variable until the locking thread exits -the innermost block containing the lock. -Using lock() is straightforward: +No other thread may lock the variable until the variable is unlocked +by the thread holding the lock. Unlocking happens automatically +when the locking thread exits the outermost block that contains +C function. Using lock() is straightforward: this example has +several threads doing some calculations in parallel, and occasionally +updating a running total: + + use threads; + use threads::shared; + + my $total : shared = 0; + + sub calc { + for (;;) { + my $result; + # (... do some calculations and set $result ...) + { + lock($total); # block until we obtain the lock + $total += $result; + } # lock implicitly released at end of scope + last if $result == 0; + } + } + + my $thr1 = threads->new(\&calc); + my $thr2 = threads->new(\&calc); + my $thr3 = threads->new(\&calc); + $thr1->join; + $thr2->join; + $thr3->join; + print "total=$total\n"; - use threads; - my $a : shared = 4; - $thr1 = threads->new(sub { - $foo = 12; - { - lock ($a); # Block until we get access to $a - $b = $a; - $a = $b * $foo; - } - print "\$foo was $foo\n"; - }); - $thr2 = threads->new(sub { - $bar = 7; - { - lock ($a); # Block until we can get access to $a - $c = $a; - $a = $c * $bar; - } - print "\$bar was $bar\n"; - }); - $thr1->join; - $thr2->join; - print "\$a is $a\n"; lock() blocks the thread until the variable being locked is available. When lock() returns, your thread can be sure that no other -thread can lock that variable until the innermost block containing the +thread can lock that variable until the outermost block containing the lock exits. It's important to note that locks don't prevent access to the variable @@ -500,42 +481,75 @@ You may lock arrays and hashes as well as scalars. Locking an array, though, will not block subsequent locks on array elements, just lock attempts on the array itself. -Finally, locks are recursive, which means it's okay for a thread to +Locks are recursive, which means it's okay for a thread to lock a variable more than once. The lock will last until the outermost -lock() on the variable goes out of scope. +lock() on the variable goes out of scope. For example: + + my $x : shared; + doit(); + + sub doit { + { + { + lock($x); # wait for lock + lock($x); # NOOP - we already have the lock + { + lock($x); # NOOP + { + lock($x); # NOOP + lockit_some_more(); + } + } + } # *** implicit unlock here *** + } + } + + sub lockit_some_more { + lock($x); # NOOP + } # nothing happens here + +Note that there is no unlock() function - the only way to unlock a +variable is to allow it to go out of scope. -=head2 Thread Pitfall: Deadlocks +A lock can either be used to guard the data contained within the variable +being locked, or it can be used to guard something else, like a section +of code. In this latter case, the variable in question does not hold any +useful data, and exists only for the purpose of being locked. In this +respect, the variable behaves like the mutexes and basic semaphores of +traditional thread libraries. -Locks are a handy tool to synchronize access to data. Using them +=head2 A Thread Pitfall: Deadlocks + +Locks are a handy tool to synchronize access to data, and using them properly is the key to safe shared data. Unfortunately, locks aren't -without their dangers. Consider the following code: +without their dangers, especially when multiple locks are involved. +Consider the following code: use threads; + my $a : shared = 4; my $b : shared = "foo"; my $thr1 = threads->new(sub { lock($a); - yield; sleep 20; - lock ($b); + lock($b); }); my $thr2 = threads->new(sub { lock($b); - yield; sleep 20; - lock ($a); + lock($a); }); This program will probably hang until you kill it. The only way it -won't hang is if one of the two async() routines acquires both locks +won't hang is if one of the two threads acquires both locks first. A guaranteed-to-hang version is more complicated, but the principle is the same. -The first thread spawned by async() will grab a lock on $a then, a -second or two later, try to grab a lock on $b. Meanwhile, the second -thread grabs a lock on $b, then later tries to grab a lock on $a. The -second lock attempt for both threads will block, each waiting for the -other to release its lock. +The first thread will grab a lock on $a, then, after a pause during which +the second thread has probably had time to do some work, try to grab a +lock on $b. Meanwhile, the second thread grabs a lock on $b, then later +tries to grab a lock on $a. The second lock attempt for both threads will +block, each waiting for the other to release its lock. This condition is called a deadlock, and it occurs whenever two or more threads are trying to get locks on resources that the others @@ -549,6 +563,9 @@ order. If, for example, you lock variables $a, $b, and $c, always lock $a before $b, and $b before $c. It's also best to hold on to locks for as short a period of time to minimize the risks of deadlock. +The other synchronization primitives described below can suffer from +similar problems. + =head2 Queues: Passing Data Around A queue is a special thread-safe object that lets you put data in one @@ -557,9 +574,9 @@ synchronization issues. They're pretty straightforward, and look like this: use threads; - use threads::shared::queue; + use Thread::Queue; - my $DataQueue = new threads::shared::queue; + my $DataQueue = Thread::Queue->new; $thr = threads->new(sub { while ($DataElement = $DataQueue->dequeue) { print "Popped $DataElement off the queue\n"; @@ -571,9 +588,9 @@ this: $DataQueue->enqueue(\$thr); sleep 10; $DataQueue->enqueue(undef); - $thr->join(); + $thr->join; -You create the queue with C. Then you can +You create the queue with C. Then you can add lists of scalars onto the end with enqueue(), and pop scalars off the front of it with dequeue(). A queue has no fixed size, and can grow as needed to hold everything pushed on to it. @@ -582,36 +599,26 @@ If a queue is empty, dequeue() blocks until another thread enqueues something. This makes queues ideal for event loops and other communications between threads. - -=head1 Threads And Code - -In addition to providing thread-safe access to data via locks and -queues, threaded Perl also provides general-purpose semaphores for -coarser synchronization than locks provide and thread-safe access to -entire subroutines. - =head2 Semaphores: Synchronizing Data Access -Semaphores are a kind of generic locking mechanism. Unlike lock, which -gets a lock on a particular scalar, Perl doesn't associate any -particular thing with a semaphore so you can use them to control -access to anything you like. In addition, semaphores can allow more -than one thread to access a resource at once, though by default -semaphores only allow one thread access at a time. - -=over 4 +Semaphores are a kind of generic locking mechanism. In their most basic +form, they behave very much like lockable scalars, except that thay +can't hold data, and that they must be explicitly unlocked. In their +advanced form, they act like a kind of counter, and can allow multiple +threads to have the 'lock' at any one time. -=item Basic semaphores +=head2 Basic semaphores -Semaphores have two methods, down and up. down decrements the resource -count, while up increments it. down calls will block if the +Semaphores have two methods, down() and up(): down() decrements the resource +count, while up increments it. Calls to down() will block if the semaphore's current count would decrement below zero. This program gives a quick demonstration: - use threads qw(yield); - use threads::shared::semaphore; - my $semaphore = new threads::shared::semaphore; - $GlobalVariable = 0; + use threads; + use Thread::Semaphore; + + my $semaphore = new Thread::Semaphore; + my $GlobalVariable : shared = 0; $thr1 = new threads \&sample_sub, 1; $thr2 = new threads \&sample_sub, 2; @@ -626,7 +633,6 @@ gives a quick demonstration: $semaphore->down; $LocalCopy = $GlobalVariable; print "$TryCount tries left for sub $SubNumber (\$GlobalVariable is $GlobalVariable)\n"; - yield; sleep 2; $LocalCopy++; $GlobalVariable = $LocalCopy; @@ -634,15 +640,15 @@ gives a quick demonstration: } } - $thr1->join(); - $thr2->join(); - $thr3->join(); + $thr1->join; + $thr2->join; + $thr3->join; The three invocations of the subroutine all operate in sync. The semaphore, though, makes sure that only one thread is accessing the global variable at once. -=item Advanced Semaphores +=head2 Advanced Semaphores By default, semaphores behave like locks, letting only one thread down() them at a time. However, there are other uses for semaphores. @@ -653,8 +659,8 @@ one, and up() increments by one. However, we can override any or all of these defaults simply by passing in different values: use threads; - use threads::shared::semaphore; - my $semaphore = threads::shared::semaphore->new(5); + use Thread::Semaphore; + my $semaphore = Thread::Semaphore->new(5); # Creates a semaphore with the counter set to five $thr1 = threads->new(\&sub1); @@ -666,8 +672,8 @@ of these defaults simply by passing in different values: $semaphore->up(5); # Increment the counter by five } - $thr1->detach(); - $thr2->detach(); + $thr1->detach; + $thr2->detach; If down() attempts to decrement the counter below zero, it blocks until the counter is large enough. Note that while a semaphore can be created @@ -700,7 +706,46 @@ threads quietly block and unblock themselves. Larger increments or decrements are handy in those cases where a thread needs to check out or return a number of resources at once. -=back +=head2 cond_wait() and cond_signal() + +These two functions can be used in conjunction with locks to notify +co-operating threads that a resource has become available. They are +very similar in use to the functions found in C. However +for most purposes, queues are simpler to use and more intuitive. See +L for more details. + +=head2 Giving up control + +There are times when you may find it useful to have a thread +explicitly give up the CPU to another thread. You may be doing something +processor-intensive and want to make sure that the user-interface thread +gets called frequently. Regardless, there are times that you might want +a thread to give up the processor. + +Perl's threading package provides the yield() function that does +this. yield() is pretty straightforward, and works like this: + + use threads; + + sub loop { + my $thread = shift; + my $foo = 50; + while($foo--) { print "in thread $thread\n" } + threads->yield; + $foo = 50; + while($foo--) { print "in thread $thread\n" } + } + + my $thread1 = threads->new(\&loop, 'first'); + my $thread2 = threads->new(\&loop, 'second'); + my $thread3 = threads->new(\&loop, 'third'); + +It is important to remember that yield() is only a hint to give up the CPU, +it depends on your hardware, OS and threading libraries what actually happens. +B Therefore it is important +to note that one should not build the scheduling of the threads around +yield() calls. It might work on your platform but it won't work on another +platform. =head1 General Thread Utility Routines @@ -711,8 +756,8 @@ really fit in anyplace else. =head2 What Thread Am I In? -The Cself> method provides your program with a way to get an -object representing the thread it's currently in. You can use this +The C<< threads->self >> class method provides your program with a way to +get an object representing the thread it's currently in. You can use this object in the same way as the ones returned from thread creation. =head2 Thread IDs @@ -734,7 +779,7 @@ comparison on them as you would with normal objects. =head2 What Threads Are Running? -threads->list returns a list of thread objects, one for each thread +C<< threads->list >> returns a list of thread objects, one for each thread that's currently running and not detached. Handy for a number of things, including cleaning up at the end of your program: @@ -746,8 +791,8 @@ including cleaning up at the end of your program: } } -If some threads have not finished running when the main perl thread -ends, perl will warn you about it and die, since it is impossible for perl +If some threads have not finished running when the main Perl thread +ends, Perl will warn you about it and die, since it is impossible for Perl to clean up itself while other threads are running =head1 A Complete Example @@ -761,9 +806,9 @@ things we've covered. This program finds prime numbers using threads. 4 use strict; 5 6 use threads; - 7 use threads::shared::queue; + 7 use Thread::Queue; 8 - 9 my $stream = new threads::shared::queue; + 9 my $stream = new Thread::Queue; 10 my $kid = new threads(\&check_num, $stream, 2); 11 12 for my $i ( 3 .. 1000 ) { @@ -771,12 +816,12 @@ things we've covered. This program finds prime numbers using threads. 14 } 15 16 $stream->enqueue(undef); - 17 $kid->join(); + 17 $kid->join; 18 19 sub check_num { 20 my ($upstream, $cur_prime) = @_; 21 my $kid; - 22 my $downstream = new threads::shared::queue; + 22 my $downstream = new Thread::Queue; 23 while (my $num = $upstream->dequeue) { 24 next unless $num % $cur_prime; 25 if ($kid) { @@ -787,13 +832,13 @@ things we've covered. This program finds prime numbers using threads. 30 } 31 } 32 $downstream->enqueue(undef) if $kid; - 33 $kid->join() if $kid; + 33 $kid->join if $kid; 34 } This program uses the pipeline model to generate prime numbers. Each thread in the pipeline has an input queue that feeds numbers to be checked, a prime number that it's responsible for, and an output queue -that into which it funnels numbers that have failed the check. If the thread +into which it funnels numbers that have failed the check. If the thread has a number that's failed its check and there's no child thread, then the thread must have found a new prime number. In that case, a new child thread is created for that prime and stuck on the end of the @@ -837,6 +882,141 @@ child has died, we know that we're done once we return from the join. That's how it works. It's pretty simple; as with many Perl programs, the explanation is much longer than the program. +=head1 Different implementations of threads + +Some background on thread implementations from the operating system +viewpoint. There are three basic categories of threads: user-mode threads, +kernel threads, and multiprocessor kernel threads. + +User-mode threads are threads that live entirely within a program and +its libraries. In this model, the OS knows nothing about threads. As +far as it's concerned, your process is just a process. + +This is the easiest way to implement threads, and the way most OSes +start. The big disadvantage is that, since the OS knows nothing about +threads, if one thread blocks they all do. Typical blocking activities +include most system calls, most I/O, and things like sleep(). + +Kernel threads are the next step in thread evolution. The OS knows +about kernel threads, and makes allowances for them. The main +difference between a kernel thread and a user-mode thread is +blocking. With kernel threads, things that block a single thread don't +block other threads. This is not the case with user-mode threads, +where the kernel blocks at the process level and not the thread level. + +This is a big step forward, and can give a threaded program quite a +performance boost over non-threaded programs. Threads that block +performing I/O, for example, won't block threads that are doing other +things. Each process still has only one thread running at once, +though, regardless of how many CPUs a system might have. + +Since kernel threading can interrupt a thread at any time, they will +uncover some of the implicit locking assumptions you may make in your +program. For example, something as simple as C<$a = $a + 2> can behave +unpredictably with kernel threads if $a is visible to other +threads, as another thread may have changed $a between the time it +was fetched on the right hand side and the time the new value is +stored. + +Multiprocessor kernel threads are the final step in thread +support. With multiprocessor kernel threads on a machine with multiple +CPUs, the OS may schedule two or more threads to run simultaneously on +different CPUs. + +This can give a serious performance boost to your threaded program, +since more than one thread will be executing at the same time. As a +tradeoff, though, any of those nagging synchronization issues that +might not have shown with basic kernel threads will appear with a +vengeance. + +In addition to the different levels of OS involvement in threads, +different OSes (and different thread implementations for a particular +OS) allocate CPU cycles to threads in different ways. + +Cooperative multitasking systems have running threads give up control +if one of two things happen. If a thread calls a yield function, it +gives up control. It also gives up control if the thread does +something that would cause it to block, such as perform I/O. In a +cooperative multitasking implementation, one thread can starve all the +others for CPU time if it so chooses. + +Preemptive multitasking systems interrupt threads at regular intervals +while the system decides which thread should run next. In a preemptive +multitasking system, one thread usually won't monopolize the CPU. + +On some systems, there can be cooperative and preemptive threads +running simultaneously. (Threads running with realtime priorities +often behave cooperatively, for example, while threads running at +normal priorities behave preemptively.) + +Most modern operating systems support preemptive multitasking nowadays. + +=head1 Performance considerations + +The main thing to bear in mind when comparing ithreads to other threading +models is the fact that for each new thread created, a complete copy of +all the variables and data of the parent thread has to be taken. Thus +thread creation can be quite expensive, both in terms of memory usage and +time spent in creation. The ideal way to reduce these costs is to have a +relatively short number of long-lived threads, all created fairly early +on - before the base thread has accumulated too much data. Of course, this +may not always be possible, so compromises have to be made. However, after +a thread has been created, its performance and extra memory usage should +be little different than ordinary code. + +Also note that under the current implementation, shared variables +use a little more memory and are a little slower than ordinary variables. + +=head1 Process-scope Changes + +Note that while threads themselves are separate execution threads and +Perl data is thread-private unless explicitly shared, the threads can +affect process-scope state, affecting all the threads. + +The most common example of this is changing the current working +directory using chdir(). One thread calls chdir(), and the working +directory of all the threads changes. + +Even more drastic example of a process-scope change is chroot(): +the root directory of all the threads changes, and no thread can +undo it (as opposed to chdir()). + +Further examples of process-scope changes include umask() and +changing uids/gids. + +Thinking of mixing fork() and threads? Please lie down and wait +until the feeling passes-- but in case you really want to know, +the semantics is that fork() duplicates all the threads. +(In UNIX, at least, other platforms will do something different.) + +Similarly, mixing signals and threads should not be attempted. +Implementations are platform-dependent, and even the POSIX +semantics may not be what you expect (and Perl doesn't even +give you the full POSIX API). + +=head1 Thread-Safety of System Libraries + +Whether various library calls are thread-safe is outside the control +of Perl. Calls often suffering from not being thread-safe include: +localtime(), gmtime(), get{gr,host,net,proto,serv,pw}*(), readdir(), +rand(), and srand() -- in general, calls that depend on some global +external state. + +If the system Perl is compiled in has thread-safe variants of such +calls, they will be used. Beyond that, Perl is at the mercy of +the thread-safety or -unsafety of the calls. Please consult your +C library call documentation. + +On some platforms the thread-safe library interfaces may fail if the +result buffer is too small (for example the user group databases may +be rather large, and the reentrant interfaces may have to carry around +a full snapshot of those databases). Perl will start with a small +buffer, but keep retrying and growing the result buffer +until the result fits. If this limitless growing sounds bad for +security or memory consumption reasons you can recompile Perl with +PERL_REENTRANT_MAXSIZE defined to the maximum number of bytes you will +allow. + =head1 Conclusion A complete thread tutorial could fill a book (and has, many times), @@ -887,12 +1067,18 @@ Silberschatz, Abraham, and Peter B. Galvin. Operating System Concepts, Arnold, Ken and James Gosling. The Java Programming Language, 2nd ed. Addison-Wesley, 1998, ISBN 0-201-31006-6. +comp.programming.threads FAQ, +L + Le Sergent, T. and B. Berthomieu. "Incremental MultiThreaded Garbage Collection on Virtually Shared Memory Architectures" in Memory Management: Proc. of the International Workshop IWMM 92, St. Malo, France, September 1992, Yves Bekkers and Jacques Cohen, eds. Springer, 1992, ISBN 3540-55940-X (real-life thread applications). +Artur Bergman, "Where Wizards Fear To Tread", June 11, 2002, +L + =head1 Acknowledgements Thanks (in no particular order) to Chaim Frenkel, Steve Fink, Gurusamy @@ -903,17 +1089,21 @@ of the prime number generator. =head1 AUTHOR -Dan Sugalski Esugalskd@ous.eduE +Dan Sugalski Edan@sidhe.org Slightly modified by Arthur Bergman to fit the new thread model/module. -=head1 Copyrights +Reworked slightly by Jörg Walter Ejwalt@cpan.org to be more concise +about thread-safety of perl code. -This article originally appeared in The Perl Journal #10, and is -copyright 1998 The Perl Journal. It appears courtesy of Jon Orwant and -The Perl Journal. This document may be distributed under the same terms -as Perl itself. +Rearranged slightly by Elizabeth Mattijsen Eliz@dijkmat.nl to put +less emphasis on yield(). +=head1 Copyrights -For more information please see L and L. +The original version of this article originally appeared in The Perl +Journal #10, and is copyright 1998 The Perl Journal. It appears courtesy +of Jon Orwant and The Perl Journal. This document may be distributed +under the same terms as Perl itself. +For more information please see L and L.