From: Dave Mitchell Date: Sat, 11 May 2002 20:00:51 +0000 (+0100) Subject: updated threads docs X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=bfce6503c5d952d8f1af33465734e82b1483281e;p=p5sagit%2Fp5-mst-13.2.git updated threads docs Message-ID: <20020511200050.D14841@fdgroup.com> p4raw-id: //depot/perl@16543 --- diff --git a/Configure b/Configure index 4c636e5..3176359 100755 --- a/Configure +++ b/Configure @@ -20,7 +20,7 @@ # $Id: Head.U,v 3.0.1.9 1997/02/28 15:02:09 ram Exp $ # -# Generated on Thu May 9 17:42:14 EET DST 2002 [metaconfig 3.0 PL70] +# Generated on Sat May 11 22:27:01 EET DST 2002 [metaconfig 3.0 PL70] # (with additional metaconfig patches by perlbug@perl.org) cat >c1$$ <: Threading is an experimental feature. Both the interface -and implementation are subject to change drastically. In fact, this -documentation describes the flavor of threads that was in version -5.005. Perl 5.6.0 and later have the beginnings of support for -interpreter threads, which (when finished) is expected to be -significantly different from what is described here. The information -contained here may therefore soon be obsolete. Use at your own risk! - -One of the most prominent new features of Perl 5.005 is the inclusion -of threads. Threads make a number of things a lot easier, and are a -very useful addition to your bag of programming tricks. +B: +This tutorial describes the old-style thread model that was introduced in +release 5.005. This model is now deprecated, and will be removed, probably +in version 5.10. The interfaces described here were considered +experimental, and are likely to be buggy. + +For information about the new interpreter threads ("ithreads") model, see +the F tutorial, and the L and L +modules. + +You are strongly encouraged to migrate any existing threads code to the +new model as soon as possible. =head1 What Is A Thread Anyway? diff --git a/pod/perlthrtut.pod b/pod/perlthrtut.pod index 6e3bcb0..2fb09c9 100644 --- a/pod/perlthrtut.pod +++ b/pod/perlthrtut.pod @@ -6,8 +6,11 @@ perlthrtut - tutorial on threads in Perl B: this tutorial describes the new Perl threading flavour introduced in Perl 5.6.0 called interpreter threads, or ithreads -for short. There is another older perl threading flavour called +for short. There is another older Perl threading flavour called the 5.005 model, unsurprisingly for 5.005 versions of Perl. +The old model is deprecated, and will probably be removed around release +5.10. You are strongly encouraged to migrate any existing 5.005 threads +code to the new model as soon as possible. You can see which (or neither) threading flavour you have by running C and looking at the C section. @@ -16,6 +19,15 @@ have C you have 5.005 threads. If you have neither, you don't have any thread support built in. If you have both, you are in trouble. +The user-level interface to the 5.005 threads was via the L +class, while ithreads uses the L class. Note the change in case. + +=head1 Status + +The ithreads code has been available since Perl 5.6.0, and is considered +stable. The user-level interface to ithreads (the L classes) +appeared in the 5.8.0 release, and as of this time is considered stable, +although as with all new features, should be treated with caution. =head1 What Is A Thread Anyway? @@ -162,7 +174,7 @@ running simultaneously. (Threads running with realtime priorities often behave cooperatively, for example, while threads running at normal priorities behave preemptively.) -=head1 What kind of threads are perl threads? +=head1 What kind of threads are Perl threads? If you have experience with other thread implementations, you might find that things aren't quite what you expect. It's very important to @@ -183,7 +195,7 @@ do it. However it is important to remember that Perl threads cannot magically do things unless your operating systems threads allows it. So if your -system blocks the entire process on sleep(), perl usually will as well. +system blocks the entire process on sleep(), Perl usually will as well. =head1 Threadsafe Modules @@ -198,8 +210,7 @@ Not all modules that you might use are thread-safe, and you should always assume a module is unsafe unless the documentation says otherwise. This includes modules that are distributed as part of the core. Threads are a new feature, and even some of the standard -modules aren't thread-safe. (*** I think ActiveState checked this for -psuedofork, check with GSAR) +modules aren't thread-safe. Even if a module is threadsafe, it doesn't mean that the module is optimized to work well with threads. A module could possibly be rewritten to utilize @@ -278,6 +289,7 @@ part of the thread startup. Just include the list of parameters as part of the C call, like this: use threads; + $Param3 = "foo"; $thr = threads->new(\&sub1, "Param 1", "Param 2", $Param3); $thr = threads->new(\&sub1, @ParamList); @@ -295,6 +307,8 @@ off several threads using the same subroutine. Each thread executes the same subroutine, but in a separate thread with a separate environment and potentially separate arguments. +C is a synonym for C + =head2 Giving up control There are times when you may find it useful to have a thread @@ -309,18 +323,18 @@ this. yield() is pretty straightforward, and works like this: use threads; - sub loop { - my $thread = shift; - my $foo = 50; - while($foo--) { print "in thread $thread\n" } - threads->yield(); - $foo = 50; - while($foo--) { print "in thread $thread\n" } - } + sub loop { + my $thread = shift; + my $foo = 50; + while($foo--) { print "in thread $thread\n" } + threads->yield(); + $foo = 50; + while($foo--) { print "in thread $thread\n" } + } - my $thread1 = threads->new(\&loop, 'first'); - my $thread2 = threads->new(\&loop, 'second'); - my $thread3 = threads->new(\&loop, 'third'); + my $thread1 = threads->new(\&loop, 'first'); + my $thread2 = threads->new(\&loop, 'second'); + my $thread3 = threads->new(\&loop, 'third'); It is important to remember that yield() is only a hint to give up the CPU, it depends on your hardware, OS and threading libraries what actually happens. @@ -335,6 +349,7 @@ for a thread to exit and extract any values it might return, you can use the join() method: use threads; + $thr = threads->new(\&sub1); @ReturnData = $thr->join; @@ -349,7 +364,7 @@ any OS cleanup necessary for the thread. That cleanup might be important, especially for long-running programs that spawn lots of threads. If you don't want the return values and don't want to wait for the thread to finish, you should call the detach() method -instead. detach() is covered later in the article. +instead, as described next. =head2 Ignoring A Thread @@ -364,6 +379,7 @@ it'll run until it's finished, then Perl will clean up after it automatically. use threads; + $thr = threads->new(\&sub1); # Spawn the thread $thr->detach; # Now we officially don't care any more @@ -378,8 +394,8 @@ automatically. } -Once a thread is detached, it may not be joined, and any output that -it might have produced (if it was done and waiting for a join) is +Once a thread is detached, it may not be joined, and any return data +that it might have produced (if it was done and waiting for a join) is lost. =head1 Threads And Data @@ -390,23 +406,56 @@ access that non-threaded programs never need to worry about. =head2 Shared And Unshared Data -The biggest difference between perl threading and the old 5.005 style -threading, or most other threading systems out there, is that all data -is not shared. When a new perl thread is created all data is cloned -and is private to that thread! +The biggest difference between Perl ithreads and the old 5.005 style +threading, or for that matter, to most other threading systems out there, +is that by default, no data is shared. When a new perl thread is created, +all the data associated with the current thread is copied to the new +thread, and is subsequently private to that new thread! +This is similar in feel to what happens when a UNIX process forks, +except that in this case, the data is just copied to a different part of +memory within the same process rather than a real fork taking place. To make use of threading however, one usually want the threads to share -data between each other. This is done with the L module -and the C< : shared> attribute: - - use threads; - use threads::shared; - my $foo : shared = 1; - my $bar = 1; - threads->new(sub { $foo++; $bar++ })->join; - - print "$foo\n"; #prints 2 since $foo is shared - print "$bar\n"; #prints 1 since $bar is not shared +at least some data between themselves. This is done with the +L module and the C< : shared> attribute: + + use threads; + use threads::shared; + + my $foo : shared = 1; + my $bar = 1; + threads->new(sub { $foo++; $bar++ })->join; + + print "$foo\n"; #prints 2 since $foo is shared + print "$bar\n"; #prints 1 since $bar is not shared + +In the case of a shared array, all the array's elements are shared, and for +a shared hash, all the keys and values are shared. This places +restrictions on what may be assigned to shared array and hash elements: only +simple values or references to shared variables are allowed - this is +so that a private variable can't accidently become shared. A bad +assignment will cause the thread to die. For example: + + use threads; + use threads::shared; + + my $var = 1; + my $svar : shared = 2; + my %hash : shared; + + ... create some threads ... + + $hash{a} = 1; # all threads see exists($hash{a}) and $hash{a} == 1 + $hash{a} = $var # okay - copy-by-value: same affect as previous + $hash{a} = $svar # okay - copy-by-value: same affect as previous + $hash{a} = \$svar # okay - a reference to a shared variable + $hash{a} = \$var # This will die + delete $hash{a} # okay - all threads will see !exists($hash{a}) + +Note that a shared variable guarantees that if two or more threads try to +modify it at the same time, the internal state of the variable will not +become corrupted. However, there are no guarantees beyond this, as +explained in the next section. =head2 Thread Pitfalls: Races @@ -415,6 +464,7 @@ number of pitfalls. One pitfall is the race condition: use threads; use threads::shared; + my $a : shared = 1; $thr1 = threads->new(\&sub1); $thr2 = threads->new(\&sub2); @@ -423,8 +473,8 @@ number of pitfalls. One pitfall is the race condition: $thr2->join; print "$a\n"; - sub sub1 { $foo = $a; $a = $foo + 1; } - sub sub2 { $bar = $a; $a = $bar + 1; } + sub sub1 { my $foo = $a; $a = $foo + 1; } + sub sub2 { my $bar = $a; $a = $bar + 1; } What do you think $a will be? The answer, unfortunately, is "it depends." Both sub1() and sub2() access the global variable $a, once @@ -451,44 +501,62 @@ Two threads both access $a. Each thread can potentially be interrupted at any point, or be executed in any order. At the end, $a could be 3 or 4, and both $b and $c could be 2 or 3. +Even C<$a += 5> or C<$a++> are not guaranteed to be atomic. + Whenever your program accesses data or resources that can be accessed by other threads, you must take steps to coordinate access or risk -data corruption and race conditions. +data inconsistency and race conditions. Note that Perl will protect its +internals from your race conditions, but it won't protect you from you. + +=head1 Synchonisation and control + +Perl provides a number of mechanisms to coordinate the interactions +between themselves and their data, to avoid race conditions and the like. +Some of these are designed to resemble the common techniques used in thread +libraries such as C; others are Perl-specific. Often, the +standard techniques are clumsly and difficult to get right (such as +condition waits). Where possible, it is usually easier to use Perlish +techniques such as queues, which remove some of the hard work involved. =head2 Controlling access: lock() The lock() function takes a shared variable and puts a lock on it. -No other thread may lock the variable until the locking thread exits -the innermost block containing the lock. -Using lock() is straightforward: +No other thread may lock the variable until the the variable is unlocked +by the thread holding the lock. Unlocking happens automatically +when the locking thread exists the outermost block that contains +C function. Using lock() is straightforward: this example has +several threads doing some calculations in parallel, and occasionaly +updating a running total: + + use threads; + use threads::shared; + + my $total : shared = 0; + + sub calc { + for (;;) { + my $result; + # (... do some calculations and set $result ...) + { + lock($total); # block until we obtain the lock + $total += $result + } # lock implicity released at end of scope + last if $result == 0; + } + } + + my $thr1 = threads->new(\&calc); + my $thr2 = threads->new(\&calc); + my $thr3 = threads->new(\&calc); + $thr1->join; + $thr2->join; + $thr3->join; + print "total=$total\n"; - use threads; - my $a : shared = 4; - $thr1 = threads->new(sub { - $foo = 12; - { - lock ($a); # Block until we get access to $a - $b = $a; - $a = $b * $foo; - } - print "\$foo was $foo\n"; - }); - $thr2 = threads->new(sub { - $bar = 7; - { - lock ($a); # Block until we can get access to $a - $c = $a; - $a = $c * $bar; - } - print "\$bar was $bar\n"; - }); - $thr1->join; - $thr2->join; - print "\$a is $a\n"; lock() blocks the thread until the variable being locked is available. When lock() returns, your thread can be sure that no other -thread can lock that variable until the innermost block containing the +thread can lock that variable until the outermost block containing the lock exits. It's important to note that locks don't prevent access to the variable @@ -500,42 +568,77 @@ You may lock arrays and hashes as well as scalars. Locking an array, though, will not block subsequent locks on array elements, just lock attempts on the array itself. -Finally, locks are recursive, which means it's okay for a thread to +Locks are recursive, which means it's okay for a thread to lock a variable more than once. The lock will last until the outermost -lock() on the variable goes out of scope. +lock() on the variable goes out of scope. For example: + + my $x : shared; + doit(); + + sub doit { + { + { + lock($x); # wait for lock + lock($x): # NOOP - we already have the lock + { + lock($x); # NOOP + { + lock($x); # NOOP + lockit_some_more(); + } + } + } # *** implicit unlock here *** + } + } + + sub lockit_some_more { + lock($x); # NOOP + } # nothing happens here + +Note that there is no unlock() function - the only way to unlock a +variable is to allow it to go out of scope. + +A lock can either be used to guard the data contained within the variable +being locked, or it can be used to guard something else, like a section +of code. In this latter case, the variable in question does not hold any +useful data, and exists only for the purpose of being locked. In this +respect, the variable behaves like the mutexes and basic semaphores of +traditional thread libraries. -=head2 Thread Pitfall: Deadlocks +=head2 A Thread Pitfall: Deadlocks -Locks are a handy tool to synchronize access to data. Using them +Locks are a handy tool to synchronize access to data, and using them properly is the key to safe shared data. Unfortunately, locks aren't -without their dangers. Consider the following code: +without their dangers, espacially when multiple locks are involved. +Consider the following code: use threads; + my $a : shared = 4; my $b : shared = "foo"; my $thr1 = threads->new(sub { lock($a); - yield; + threads->yield; sleep 20; - lock ($b); + lock($b); }); my $thr2 = threads->new(sub { lock($b); - yield; + threads->yield; sleep 20; - lock ($a); + lock($a); }); This program will probably hang until you kill it. The only way it -won't hang is if one of the two async() routines acquires both locks +won't hang is if one of the two threads acquires both locks first. A guaranteed-to-hang version is more complicated, but the principle is the same. -The first thread spawned by async() will grab a lock on $a then, a -second or two later, try to grab a lock on $b. Meanwhile, the second -thread grabs a lock on $b, then later tries to grab a lock on $a. The -second lock attempt for both threads will block, each waiting for the -other to release its lock. +The first thread will grab a lock on $a, then, after a pause during which +the second thread has probably had time to do some work, try to grab a +lock on $b. Meanwhile, the second thread grabs a lock on $b, then later +tries to grab a lock on $a. The second lock attempt for both threads will +block, each waiting for the other to release its lock. This condition is called a deadlock, and it occurs whenever two or more threads are trying to get locks on resources that the others @@ -549,6 +652,9 @@ order. If, for example, you lock variables $a, $b, and $c, always lock $a before $b, and $b before $c. It's also best to hold on to locks for as short a period of time to minimize the risks of deadlock. +The other syncronisation primitives described below can suffer from +similar problems. + =head2 Queues: Passing Data Around A queue is a special thread-safe object that lets you put data in one @@ -559,7 +665,7 @@ this: use threads; use threads::shared::queue; - my $DataQueue = new threads::shared::queue; + my $DataQueue = threads::shared::queue->new(); $thr = threads->new(sub { while ($DataElement = $DataQueue->dequeue) { print "Popped $DataElement off the queue\n"; @@ -582,36 +688,26 @@ If a queue is empty, dequeue() blocks until another thread enqueues something. This makes queues ideal for event loops and other communications between threads. - -=head1 Threads And Code - -In addition to providing thread-safe access to data via locks and -queues, threaded Perl also provides general-purpose semaphores for -coarser synchronization than locks provide and thread-safe access to -entire subroutines. - =head2 Semaphores: Synchronizing Data Access -Semaphores are a kind of generic locking mechanism. Unlike lock, which -gets a lock on a particular scalar, Perl doesn't associate any -particular thing with a semaphore so you can use them to control -access to anything you like. In addition, semaphores can allow more -than one thread to access a resource at once, though by default -semaphores only allow one thread access at a time. +Semaphores are a kind of generic locking mechanism. In their most basic +form, they behave very much like lockable scalars, except that thay +can't hold data, and that they must be explicitly unlocked. In their +advanced form, they act like a kind of counter, and can allow multiple +threads to have the 'lock' at any one time. -=over 4 +=head2 Basic semaphores -=item Basic semaphores - -Semaphores have two methods, down and up. down decrements the resource -count, while up increments it. down calls will block if the +Semaphores have two methods, down() and up(): down() decrements the resource +count, while up increments it. Calls to down() will block if the semaphore's current count would decrement below zero. This program gives a quick demonstration: use threads qw(yield); use threads::shared::semaphore; + my $semaphore = new threads::shared::semaphore; - $GlobalVariable = 0; + my $GlobalVariable : shared = 0; $thr1 = new threads \&sample_sub, 1; $thr2 = new threads \&sample_sub, 2; @@ -642,7 +738,7 @@ The three invocations of the subroutine all operate in sync. The semaphore, though, makes sure that only one thread is accessing the global variable at once. -=item Advanced Semaphores +=head2 Advanced Semaphores By default, semaphores behave like locks, letting only one thread down() them at a time. However, there are other uses for semaphores. @@ -700,7 +796,13 @@ threads quietly block and unblock themselves. Larger increments or decrements are handy in those cases where a thread needs to check out or return a number of resources at once. -=back +=head2 cond_wait() and cond_signal() + +These two functions can be used in conjunction with locks to notify +co-operating threads that a resource has become available. They are +very similar in use to the functions found in C. However +for most purposes, queues are simpler to use and more intuitive. See +L for more details. =head1 General Thread Utility Routines @@ -711,8 +813,8 @@ really fit in anyplace else. =head2 What Thread Am I In? -The Cself> method provides your program with a way to get an -object representing the thread it's currently in. You can use this +The C<< threads->self >> class method provides your program with a way to +get an object representing the thread it's currently in. You can use this object in the same way as the ones returned from thread creation. =head2 Thread IDs @@ -734,7 +836,7 @@ comparison on them as you would with normal objects. =head2 What Threads Are Running? -threads->list returns a list of thread objects, one for each thread +C<< threads->list >> returns a list of thread objects, one for each thread that's currently running and not detached. Handy for a number of things, including cleaning up at the end of your program: @@ -746,8 +848,8 @@ including cleaning up at the end of your program: } } -If some threads have not finished running when the main perl thread -ends, perl will warn you about it and die, since it is impossible for perl +If some threads have not finished running when the main Perl thread +ends, Perl will warn you about it and die, since it is impossible for Perl to clean up itself while other threads are running =head1 A Complete Example @@ -837,6 +939,22 @@ child has died, we know that we're done once we return from the join. That's how it works. It's pretty simple; as with many Perl programs, the explanation is much longer than the program. +=head1 Performance considerations + +The main thing to bear in mind when comparing ithreads to other threading +models is the fact that for each new thread created, a complete copy of +all the variables and data of the parent thread has to be taken. Thus +thread creation can be quite expensive, both in terms of memory usage and +time spent in creation. The ideal way to reduce these costs is to have a +relatively short number of long-lived threads, all created fairly early +on - before the base thread has accumulated too much data. Of course, this +may not always be possible, so compromises have to be made. However, after +a thread has been created, its performance and extra memory usage should +be little different than ordinary code. + +Also note that under the current implementation, shared variables +use a little more memory and are a little slower than ordinary variables. + =head1 Conclusion A complete thread tutorial could fill a book (and has, many times), @@ -909,11 +1027,10 @@ Slightly modified by Arthur Bergman to fit the new thread model/module. =head1 Copyrights -This article originally appeared in The Perl Journal #10, and is -copyright 1998 The Perl Journal. It appears courtesy of Jon Orwant and -The Perl Journal. This document may be distributed under the same terms -as Perl itself. - +The original version of this article originally appeared in The Perl +Journal #10, and is copyright 1998 The Perl Journal. It appears courtesy +of Jon Orwant and The Perl Journal. This document may be distributed +under the same terms as Perl itself. For more information please see L and L.