B<NOTE>: this tutorial describes the new Perl threading flavour
introduced in Perl 5.6.0 called interpreter threads, or ithreads
-for short. There is another older perl threading flavour called
+for short. There is another older Perl threading flavour called
the 5.005 model, unsurprisingly for 5.005 versions of Perl.
+The old model is deprecated, and will probably be removed around release
+5.10. You are strongly encouraged to migrate any existing 5.005 threads
+code to the new model as soon as possible.
You can see which (or neither) threading flavour you have by
running C<perl -V> and looking at the C<Platform> section.
If you have neither, you don't have any thread support built in.
If you have both, you are in trouble.
+The user-level interface to the 5.005 threads was via the L<Threads>
+class, while ithreads uses the L<threads> class. Note the change in case.
+
+=head1 Status
+
+The ithreads code has been available since Perl 5.6.0, and is considered
+stable. The user-level interface to ithreads (the L<threads> classes)
+appeared in the 5.8.0 release, and as of this time is considered stable,
+although as with all new features, should be treated with caution.
=head1 What Is A Thread Anyway?
often behave cooperatively, for example, while threads running at
normal priorities behave preemptively.)
-=head1 What kind of threads are perl threads?
+=head1 What kind of threads are Perl threads?
If you have experience with other thread implementations, you might
find that things aren't quite what you expect. It's very important to
However it is important to remember that Perl threads cannot magically
do things unless your operating systems threads allows it. So if your
-system blocks the entire process on sleep(), perl usually will as well.
+system blocks the entire process on sleep(), Perl usually will as well.
=head1 Threadsafe Modules
always assume a module is unsafe unless the documentation says
otherwise. This includes modules that are distributed as part of the
core. Threads are a new feature, and even some of the standard
-modules aren't thread-safe. (*** I think ActiveState checked this for
-psuedofork, check with GSAR)
+modules aren't thread-safe.
Even if a module is threadsafe, it doesn't mean that the module is optimized
to work well with threads. A module could possibly be rewritten to utilize
part of the C<threads::new> call, like this:
use threads;
+
$Param3 = "foo";
$thr = threads->new(\&sub1, "Param 1", "Param 2", $Param3);
$thr = threads->new(\&sub1, @ParamList);
the same subroutine, but in a separate thread with a separate
environment and potentially separate arguments.
+C<create()> is a synonym for C<new()>
+
=head2 Giving up control
There are times when you may find it useful to have a thread
use threads;
- sub loop {
- my $thread = shift;
- my $foo = 50;
- while($foo--) { print "in thread $thread\n" }
- threads->yield();
- $foo = 50;
- while($foo--) { print "in thread $thread\n" }
- }
+ sub loop {
+ my $thread = shift;
+ my $foo = 50;
+ while($foo--) { print "in thread $thread\n" }
+ threads->yield();
+ $foo = 50;
+ while($foo--) { print "in thread $thread\n" }
+ }
- my $thread1 = threads->new(\&loop, 'first');
- my $thread2 = threads->new(\&loop, 'second');
- my $thread3 = threads->new(\&loop, 'third');
+ my $thread1 = threads->new(\&loop, 'first');
+ my $thread2 = threads->new(\&loop, 'second');
+ my $thread3 = threads->new(\&loop, 'third');
It is important to remember that yield() is only a hint to give up the CPU,
it depends on your hardware, OS and threading libraries what actually happens.
use the join() method:
use threads;
+
$thr = threads->new(\&sub1);
@ReturnData = $thr->join;
important, especially for long-running programs that spawn lots of
threads. If you don't want the return values and don't want to wait
for the thread to finish, you should call the detach() method
-instead. detach() is covered later in the article.
+instead, as described next.
=head2 Ignoring A Thread
automatically.
use threads;
+
$thr = threads->new(\&sub1); # Spawn the thread
$thr->detach; # Now we officially don't care any more
}
-Once a thread is detached, it may not be joined, and any output that
-it might have produced (if it was done and waiting for a join) is
+Once a thread is detached, it may not be joined, and any return data
+that it might have produced (if it was done and waiting for a join) is
lost.
=head1 Threads And Data
=head2 Shared And Unshared Data
-The biggest difference between perl threading and the old 5.005 style
-threading, or most other threading systems out there, is that all data
-is not shared. When a new perl thread is created all data is cloned
-and is private to that thread!
+The biggest difference between Perl ithreads and the old 5.005 style
+threading, or for that matter, to most other threading systems out there,
+is that by default, no data is shared. When a new perl thread is created,
+all the data associated with the current thread is copied to the new
+thread, and is subsequently private to that new thread!
+This is similar in feel to what happens when a UNIX process forks,
+except that in this case, the data is just copied to a different part of
+memory within the same process rather than a real fork taking place.
To make use of threading however, one usually want the threads to share
-data between each other. This is done with the L<threads::shared> module
-and the C< : shared> attribute:
-
- use threads;
- use threads::shared;
- my $foo : shared = 1;
- my $bar = 1;
- threads->new(sub { $foo++; $bar++ })->join;
-
- print "$foo\n"; #prints 2 since $foo is shared
- print "$bar\n"; #prints 1 since $bar is not shared
+at least some data between themselves. This is done with the
+L<threads::shared> module and the C< : shared> attribute:
+
+ use threads;
+ use threads::shared;
+
+ my $foo : shared = 1;
+ my $bar = 1;
+ threads->new(sub { $foo++; $bar++ })->join;
+
+ print "$foo\n"; #prints 2 since $foo is shared
+ print "$bar\n"; #prints 1 since $bar is not shared
+
+In the case of a shared array, all the array's elements are shared, and for
+a shared hash, all the keys and values are shared. This places
+restrictions on what may be assigned to shared array and hash elements: only
+simple values or references to shared variables are allowed - this is
+so that a private variable can't accidently become shared. A bad
+assignment will cause the thread to die. For example:
+
+ use threads;
+ use threads::shared;
+
+ my $var = 1;
+ my $svar : shared = 2;
+ my %hash : shared;
+
+ ... create some threads ...
+
+ $hash{a} = 1; # all threads see exists($hash{a}) and $hash{a} == 1
+ $hash{a} = $var # okay - copy-by-value: same affect as previous
+ $hash{a} = $svar # okay - copy-by-value: same affect as previous
+ $hash{a} = \$svar # okay - a reference to a shared variable
+ $hash{a} = \$var # This will die
+ delete $hash{a} # okay - all threads will see !exists($hash{a})
+
+Note that a shared variable guarantees that if two or more threads try to
+modify it at the same time, the internal state of the variable will not
+become corrupted. However, there are no guarantees beyond this, as
+explained in the next section.
=head2 Thread Pitfalls: Races
use threads;
use threads::shared;
+
my $a : shared = 1;
$thr1 = threads->new(\&sub1);
$thr2 = threads->new(\&sub2);
$thr2->join;
print "$a\n";
- sub sub1 { $foo = $a; $a = $foo + 1; }
- sub sub2 { $bar = $a; $a = $bar + 1; }
+ sub sub1 { my $foo = $a; $a = $foo + 1; }
+ sub sub2 { my $bar = $a; $a = $bar + 1; }
What do you think $a will be? The answer, unfortunately, is "it
depends." Both sub1() and sub2() access the global variable $a, once
at any point, or be executed in any order. At the end, $a could be 3
or 4, and both $b and $c could be 2 or 3.
+Even C<$a += 5> or C<$a++> are not guaranteed to be atomic.
+
Whenever your program accesses data or resources that can be accessed
by other threads, you must take steps to coordinate access or risk
-data corruption and race conditions.
+data inconsistency and race conditions. Note that Perl will protect its
+internals from your race conditions, but it won't protect you from you.
+
+=head1 Synchonisation and control
+
+Perl provides a number of mechanisms to coordinate the interactions
+between themselves and their data, to avoid race conditions and the like.
+Some of these are designed to resemble the common techniques used in thread
+libraries such as C<pthreads>; others are Perl-specific. Often, the
+standard techniques are clumsly and difficult to get right (such as
+condition waits). Where possible, it is usually easier to use Perlish
+techniques such as queues, which remove some of the hard work involved.
=head2 Controlling access: lock()
The lock() function takes a shared variable and puts a lock on it.
-No other thread may lock the variable until the locking thread exits
-the innermost block containing the lock.
-Using lock() is straightforward:
+No other thread may lock the variable until the the variable is unlocked
+by the thread holding the lock. Unlocking happens automatically
+when the locking thread exists the outermost block that contains
+C<lock()> function. Using lock() is straightforward: this example has
+several threads doing some calculations in parallel, and occasionaly
+updating a running total:
+
+ use threads;
+ use threads::shared;
+
+ my $total : shared = 0;
+
+ sub calc {
+ for (;;) {
+ my $result;
+ # (... do some calculations and set $result ...)
+ {
+ lock($total); # block until we obtain the lock
+ $total += $result
+ } # lock implicity released at end of scope
+ last if $result == 0;
+ }
+ }
+
+ my $thr1 = threads->new(\&calc);
+ my $thr2 = threads->new(\&calc);
+ my $thr3 = threads->new(\&calc);
+ $thr1->join;
+ $thr2->join;
+ $thr3->join;
+ print "total=$total\n";
- use threads;
- my $a : shared = 4;
- $thr1 = threads->new(sub {
- $foo = 12;
- {
- lock ($a); # Block until we get access to $a
- $b = $a;
- $a = $b * $foo;
- }
- print "\$foo was $foo\n";
- });
- $thr2 = threads->new(sub {
- $bar = 7;
- {
- lock ($a); # Block until we can get access to $a
- $c = $a;
- $a = $c * $bar;
- }
- print "\$bar was $bar\n";
- });
- $thr1->join;
- $thr2->join;
- print "\$a is $a\n";
lock() blocks the thread until the variable being locked is
available. When lock() returns, your thread can be sure that no other
-thread can lock that variable until the innermost block containing the
+thread can lock that variable until the outermost block containing the
lock exits.
It's important to note that locks don't prevent access to the variable
though, will not block subsequent locks on array elements, just lock
attempts on the array itself.
-Finally, locks are recursive, which means it's okay for a thread to
+Locks are recursive, which means it's okay for a thread to
lock a variable more than once. The lock will last until the outermost
-lock() on the variable goes out of scope.
+lock() on the variable goes out of scope. For example:
+
+ my $x : shared;
+ doit();
+
+ sub doit {
+ {
+ {
+ lock($x); # wait for lock
+ lock($x): # NOOP - we already have the lock
+ {
+ lock($x); # NOOP
+ {
+ lock($x); # NOOP
+ lockit_some_more();
+ }
+ }
+ } # *** implicit unlock here ***
+ }
+ }
+
+ sub lockit_some_more {
+ lock($x); # NOOP
+ } # nothing happens here
+
+Note that there is no unlock() function - the only way to unlock a
+variable is to allow it to go out of scope.
+
+A lock can either be used to guard the data contained within the variable
+being locked, or it can be used to guard something else, like a section
+of code. In this latter case, the variable in question does not hold any
+useful data, and exists only for the purpose of being locked. In this
+respect, the variable behaves like the mutexes and basic semaphores of
+traditional thread libraries.
-=head2 Thread Pitfall: Deadlocks
+=head2 A Thread Pitfall: Deadlocks
-Locks are a handy tool to synchronize access to data. Using them
+Locks are a handy tool to synchronize access to data, and using them
properly is the key to safe shared data. Unfortunately, locks aren't
-without their dangers. Consider the following code:
+without their dangers, espacially when multiple locks are involved.
+Consider the following code:
use threads;
+
my $a : shared = 4;
my $b : shared = "foo";
my $thr1 = threads->new(sub {
lock($a);
- yield;
+ threads->yield;
sleep 20;
- lock ($b);
+ lock($b);
});
my $thr2 = threads->new(sub {
lock($b);
- yield;
+ threads->yield;
sleep 20;
- lock ($a);
+ lock($a);
});
This program will probably hang until you kill it. The only way it
-won't hang is if one of the two async() routines acquires both locks
+won't hang is if one of the two threads acquires both locks
first. A guaranteed-to-hang version is more complicated, but the
principle is the same.
-The first thread spawned by async() will grab a lock on $a then, a
-second or two later, try to grab a lock on $b. Meanwhile, the second
-thread grabs a lock on $b, then later tries to grab a lock on $a. The
-second lock attempt for both threads will block, each waiting for the
-other to release its lock.
+The first thread will grab a lock on $a, then, after a pause during which
+the second thread has probably had time to do some work, try to grab a
+lock on $b. Meanwhile, the second thread grabs a lock on $b, then later
+tries to grab a lock on $a. The second lock attempt for both threads will
+block, each waiting for the other to release its lock.
This condition is called a deadlock, and it occurs whenever two or
more threads are trying to get locks on resources that the others
$a before $b, and $b before $c. It's also best to hold on to locks for
as short a period of time to minimize the risks of deadlock.
+The other syncronisation primitives described below can suffer from
+similar problems.
+
=head2 Queues: Passing Data Around
A queue is a special thread-safe object that lets you put data in one
use threads;
use threads::shared::queue;
- my $DataQueue = new threads::shared::queue;
+ my $DataQueue = threads::shared::queue->new();
$thr = threads->new(sub {
while ($DataElement = $DataQueue->dequeue) {
print "Popped $DataElement off the queue\n";
something. This makes queues ideal for event loops and other
communications between threads.
-
-=head1 Threads And Code
-
-In addition to providing thread-safe access to data via locks and
-queues, threaded Perl also provides general-purpose semaphores for
-coarser synchronization than locks provide and thread-safe access to
-entire subroutines.
-
=head2 Semaphores: Synchronizing Data Access
-Semaphores are a kind of generic locking mechanism. Unlike lock, which
-gets a lock on a particular scalar, Perl doesn't associate any
-particular thing with a semaphore so you can use them to control
-access to anything you like. In addition, semaphores can allow more
-than one thread to access a resource at once, though by default
-semaphores only allow one thread access at a time.
+Semaphores are a kind of generic locking mechanism. In their most basic
+form, they behave very much like lockable scalars, except that thay
+can't hold data, and that they must be explicitly unlocked. In their
+advanced form, they act like a kind of counter, and can allow multiple
+threads to have the 'lock' at any one time.
-=over 4
+=head2 Basic semaphores
-=item Basic semaphores
-
-Semaphores have two methods, down and up. down decrements the resource
-count, while up increments it. down calls will block if the
+Semaphores have two methods, down() and up(): down() decrements the resource
+count, while up increments it. Calls to down() will block if the
semaphore's current count would decrement below zero. This program
gives a quick demonstration:
use threads qw(yield);
use threads::shared::semaphore;
+
my $semaphore = new threads::shared::semaphore;
- $GlobalVariable = 0;
+ my $GlobalVariable : shared = 0;
$thr1 = new threads \&sample_sub, 1;
$thr2 = new threads \&sample_sub, 2;
semaphore, though, makes sure that only one thread is accessing the
global variable at once.
-=item Advanced Semaphores
+=head2 Advanced Semaphores
By default, semaphores behave like locks, letting only one thread
down() them at a time. However, there are other uses for semaphores.
Larger increments or decrements are handy in those cases where a
thread needs to check out or return a number of resources at once.
-=back
+=head2 cond_wait() and cond_signal()
+
+These two functions can be used in conjunction with locks to notify
+co-operating threads that a resource has become available. They are
+very similar in use to the functions found in C<pthreads>. However
+for most purposes, queues are simpler to use and more intuitive. See
+L<threads::shared> for more details.
=head1 General Thread Utility Routines
=head2 What Thread Am I In?
-The C<threads->self> method provides your program with a way to get an
-object representing the thread it's currently in. You can use this
+The C<< threads->self >> class method provides your program with a way to
+get an object representing the thread it's currently in. You can use this
object in the same way as the ones returned from thread creation.
=head2 Thread IDs
=head2 What Threads Are Running?
-threads->list returns a list of thread objects, one for each thread
+C<< threads->list >> returns a list of thread objects, one for each thread
that's currently running and not detached. Handy for a number of things,
including cleaning up at the end of your program:
}
}
-If some threads have not finished running when the main perl thread
-ends, perl will warn you about it and die, since it is impossible for perl
+If some threads have not finished running when the main Perl thread
+ends, Perl will warn you about it and die, since it is impossible for Perl
to clean up itself while other threads are running
=head1 A Complete Example
That's how it works. It's pretty simple; as with many Perl programs,
the explanation is much longer than the program.
+=head1 Performance considerations
+
+The main thing to bear in mind when comparing ithreads to other threading
+models is the fact that for each new thread created, a complete copy of
+all the variables and data of the parent thread has to be taken. Thus
+thread creation can be quite expensive, both in terms of memory usage and
+time spent in creation. The ideal way to reduce these costs is to have a
+relatively short number of long-lived threads, all created fairly early
+on - before the base thread has accumulated too much data. Of course, this
+may not always be possible, so compromises have to be made. However, after
+a thread has been created, its performance and extra memory usage should
+be little different than ordinary code.
+
+Also note that under the current implementation, shared variables
+use a little more memory and are a little slower than ordinary variables.
+
=head1 Conclusion
A complete thread tutorial could fill a book (and has, many times),
=head1 Copyrights
-This article originally appeared in The Perl Journal #10, and is
-copyright 1998 The Perl Journal. It appears courtesy of Jon Orwant and
-The Perl Journal. This document may be distributed under the same terms
-as Perl itself.
-
+The original version of this article originally appeared in The Perl
+Journal #10, and is copyright 1998 The Perl Journal. It appears courtesy
+of Jon Orwant and The Perl Journal. This document may be distributed
+under the same terms as Perl itself.
For more information please see L<threads> and L<threads::shared>.