[p5sagit/p5-mst-13.2.git] / pod / perlfork.pod

=head1 NAME

perlfork - Perl's fork() emulation

=head1 SYNOPSIS

    NOTE:  As of the 5.8.0 release, fork() emulation has considerably
    matured.  However, there are still a few known bugs and differences
    from real fork() that might affect you.  See the "BUGS" and
    "CAVEATS AND LIMITATIONS" sections below.

Perl provides a fork() keyword that corresponds to the Unix system call
of the same name.  On most Unix-like platforms where the fork() system
call is available, Perl's fork() simply calls it.

On some platforms such as Windows where the fork() system call is not
available, Perl can be built to emulate fork() at the interpreter level.
While the emulation is designed to be as compatible as possible with the
real fork() at the level of the Perl program, there are certain
important differences that stem from the fact that all the pseudo child
"processes" created this way live in the same real process as far as the
operating system is concerned.

This document provides a general overview of the capabilities and
limitations of the fork() emulation.  Note that the issues discussed here
are not applicable to platforms where a real fork() is available and Perl
has been configured to use it.

=head1 DESCRIPTION

The fork() emulation is implemented at the level of the Perl interpreter.
What this means in general is that running fork() will actually clone the
running interpreter and all its state, and run the cloned interpreter in
a separate thread, beginning execution in the new thread just after the
point where the fork() was called in the parent.  We will refer to the
thread that implements this child "process" as the pseudo-process.

To the Perl program that called fork(), all this is designed to be
transparent.  The parent returns from the fork() with a pseudo-process
ID that can be subsequently used in any process manipulation functions;
the child returns from the fork() with a value of C<0> to signify that
it is the child pseudo-process.

=head2 Behavior of other Perl features in forked pseudo-processes

Most Perl features behave in a natural way within pseudo-processes.

=over 8

=item $$ or $PROCESS_ID

This special variable is correctly set to the pseudo-process ID.
It can be used to identify pseudo-processes within a particular
session.  Note that this value is subject to recycling if any
pseudo-processes are launched after others have been wait()-ed on.

=item %ENV

Each pseudo-process maintains its own virtual environment.  Modifications
to %ENV affect the virtual environment, and are only visible within that
pseudo-process, and in any processes (or pseudo-processes) launched from
it.

=item chdir() and all other builtins that accept filenames

Each pseudo-process maintains its own virtual idea of the current directory.
Modifications to the current directory using chdir() are only visible within
that pseudo-process, and in any processes (or pseudo-processes) launched from
it.  All file and directory accesses from the pseudo-process will correctly
map the virtual working directory to the real working directory appropriately.

=item wait() and waitpid()

wait() and waitpid() can be passed a pseudo-process ID returned by fork().
These calls will properly wait for the termination of the pseudo-process
and return its status.

=item kill()

kill() can be used to terminate a pseudo-process by passing it the ID returned
by fork().  This should not be used except under dire circumstances, because
the operating system may not guarantee integrity of the process resources
when a running thread is terminated.  Note that using kill() on a
pseudo-process() may typically cause memory leaks, because the thread that
implements the pseudo-process does not get a chance to clean up its resources.

=item exec()

Calling exec() within a pseudo-process actually spawns the requested
executable in a separate process and waits for it to complete before
exiting with the same exit status as that process.  This means that the
process ID reported within the running executable will be different from
what the earlier Perl fork() might have returned.  Similarly, any process
manipulation functions applied to the ID returned by fork() will affect the
waiting pseudo-process that called exec(), not the real process it is
waiting for after the exec().

=item exit()

exit() always exits just the executing pseudo-process, after automatically
wait()-ing for any outstanding child pseudo-processes.  Note that this means
that the process as a whole will not exit unless all running pseudo-processes
have exited.  See below for some limitations with open filehandles.

=item Open handles to files, directories and network sockets

All open handles are dup()-ed in pseudo-processes, so that closing
any handles in one process does not affect the others.  See below for
some limitations.

=back

=head2 Resource limits

In the eyes of the operating system, pseudo-processes created via the fork()
emulation are simply threads in the same process.  This means that any
process-level limits imposed by the operating system apply to all
pseudo-processes taken together.  This includes any limits imposed by the
operating system on the number of open file, directory and socket handles,
limits on disk space usage, limits on memory size, limits on CPU utilization
etc.

=head2 Killing the parent process

If the parent process is killed (either using Perl's kill() builtin, or
using some external means) all the pseudo-processes are killed as well,
and the whole process exits.

=head2 Lifetime of the parent process and pseudo-processes

During the normal course of events, the parent process and every
pseudo-process started by it will wait for their respective pseudo-children
to complete before they exit.  This means that the parent and every
pseudo-child created by it that is also a pseudo-parent will only exit
after their pseudo-children have exited.

A way to mark a pseudo-processes as running detached from their parent (so
that the parent would not have to wait() for them if it doesn't want to)
will be provided in future.

=head2 CAVEATS AND LIMITATIONS

=over 8

=item BEGIN blocks

The fork() emulation will not work entirely correctly when called from
within a BEGIN block.  The forked copy will run the contents of the
BEGIN block, but will not continue parsing the source stream after the
BEGIN block.  For example, consider the following code:

    BEGIN {
        fork and exit;		# fork child and exit the parent
	print "inner\n";
    }
    print "outer\n";

This will print:

    inner

rather than the expected:

    inner
    outer

This limitation arises from fundamental technical difficulties in
cloning and restarting the stacks used by the Perl parser in the
middle of a parse.

=item Open filehandles

Any filehandles open at the time of the fork() will be dup()-ed.  Thus,
the files can be closed independently in the parent and child, but beware
that the dup()-ed handles will still share the same seek pointer.  Changing
the seek position in the parent will change it in the child and vice-versa.
One can avoid this by opening files that need distinct seek pointers
separately in the child.

On some operating systems, notably Solaris and Unixware, calling C<exit()>
from a child process will flush and close open filehandles in the parent,
thereby corrupting the filehandles.  On these systems, calling C<_exit()>
is suggested instead.  C<_exit()> is available in Perl through the 
C<POSIX> module.  Please consult your systems manpages for more information
on this.

=item Forking pipe open() not yet implemented

The C<open(FOO, "|-")> and C<open(BAR, "-|")> constructs are not yet
implemented.  This limitation can be easily worked around in new code
by creating a pipe explicitly.  The following example shows how to
write to a forked child:

    # simulate open(FOO, "|-")
    sub pipe_to_fork ($) {
	my $parent = shift;
	pipe my $child, $parent or die;
	my $pid = fork();
	die "fork() failed: $!" unless defined $pid;
	if ($pid) {
	    close $child;
	}
	else {
	    close $parent;
	    open(STDIN, "<&=" . fileno($child)) or die;
	}
	$pid;
    }

    if (pipe_to_fork('FOO')) {
	# parent
	print FOO "pipe_to_fork\n";
	close FOO;
    }
    else {
	# child
	while (<STDIN>) { print; }
	exit(0);
    }

And this one reads from the child:

    # simulate open(FOO, "-|")
    sub pipe_from_fork ($) {
	my $parent = shift;
	pipe $parent, my $child or die;
	my $pid = fork();
	die "fork() failed: $!" unless defined $pid;
	if ($pid) {
	    close $child;
	}
	else {
	    close $parent;
	    open(STDOUT, ">&=" . fileno($child)) or die;
	}
	$pid;
    }

    if (pipe_from_fork('BAR')) {
	# parent
	while (<BAR>) { print; }
	close BAR;
    }
    else {
	# child
	print "pipe_from_fork\n";
	exit(0);
    }

Forking pipe open() constructs will be supported in future.

=item Global state maintained by XSUBs 

External subroutines (XSUBs) that maintain their own global state may
not work correctly.  Such XSUBs will either need to maintain locks to
protect simultaneous access to global data from different pseudo-processes,
or maintain all their state on the Perl symbol table, which is copied
naturally when fork() is called.  A callback mechanism that provides
extensions an opportunity to clone their state will be provided in the
near future.

=item Interpreter embedded in larger application

The fork() emulation may not behave as expected when it is executed in an
application which embeds a Perl interpreter and calls Perl APIs that can
evaluate bits of Perl code.  This stems from the fact that the emulation
only has knowledge about the Perl interpreter's own data structures and
knows nothing about the containing application's state.  For example, any
state carried on the application's own call stack is out of reach.

=item Thread-safety of extensions

Since the fork() emulation runs code in multiple threads, extensions
calling into non-thread-safe libraries may not work reliably when
calling fork().  As Perl's threading support gradually becomes more
widely adopted even on platforms with a native fork(), such extensions
are expected to be fixed for thread-safety.

=back

=head1 BUGS

=over 8

=item *

Having pseudo-process IDs be negative integers breaks down for the integer
C<-1> because the wait() and waitpid() functions treat this number as
being special.  The tacit assumption in the current implementation is that
the system never allocates a thread ID of C<1> for user threads.  A better
representation for pseudo-process IDs will be implemented in future.

=item *

In certain cases, the OS-level handles created by the pipe(), socket(),
and accept() operators are apparently not duplicated accurately in
pseudo-processes.  This only happens in some situations, but where it
does happen, it may result in deadlocks between the read and write ends
of pipe handles, or inability to send or receive data across socket
handles.

=item *

This document may be incomplete in some respects.

=back

=head1 AUTHOR

Support for concurrent interpreters and the fork() emulation was implemented
by ActiveState, with funding from Microsoft Corporation.

This document is authored and maintained by Gurusamy Sarathy
E<lt>gsar@activestate.comE<gt>.

=head1 SEE ALSO

L<perlfunc/"fork">, L<perlipc>

=cut
Commit	Line	Data
7766f137	1	=head1 NAME
7766f137	2
c3c83ace	3	perlfork - Perl's fork() emulation
7766f137	4
	5	=head1 SYNOPSIS
	6
c3c83ace	7	NOTE: As of the 5.8.0 release, fork() emulation has considerably
	8	matured. However, there are still a few known bugs and differences
	9	from real fork() that might affect you. See the "BUGS" and
	10	"CAVEATS AND LIMITATIONS" sections below.
c7fa416b	11
7766f137	12	Perl provides a fork() keyword that corresponds to the Unix system call
	13	of the same name. On most Unix-like platforms where the fork() system
	14	call is available, Perl's fork() simply calls it.
	15
	16	On some platforms such as Windows where the fork() system call is not
	17	available, Perl can be built to emulate fork() at the interpreter level.
	18	While the emulation is designed to be as compatible as possible with the
106325ad	19	real fork() at the level of the Perl program, there are certain
7766f137	20	important differences that stem from the fact that all the pseudo child
	21	"processes" created this way live in the same real process as far as the
	22	operating system is concerned.
	23
	24	This document provides a general overview of the capabilities and
	25	limitations of the fork() emulation. Note that the issues discussed here
	26	are not applicable to platforms where a real fork() is available and Perl
	27	has been configured to use it.
	28
	29	=head1 DESCRIPTION
	30
	31	The fork() emulation is implemented at the level of the Perl interpreter.
	32	What this means in general is that running fork() will actually clone the
	33	running interpreter and all its state, and run the cloned interpreter in
	34	a separate thread, beginning execution in the new thread just after the
	35	point where the fork() was called in the parent. We will refer to the
	36	thread that implements this child "process" as the pseudo-process.
	37
	38	To the Perl program that called fork(), all this is designed to be
	39	transparent. The parent returns from the fork() with a pseudo-process
	40	ID that can be subsequently used in any process manipulation functions;
	41	the child returns from the fork() with a value of C<0> to signify that
	42	it is the child pseudo-process.
	43
	44	=head2 Behavior of other Perl features in forked pseudo-processes
	45
	46	Most Perl features behave in a natural way within pseudo-processes.
	47
	48	=over 8
	49
	50	=item $$ or $PROCESS_ID
	51
	52	This special variable is correctly set to the pseudo-process ID.
	53	It can be used to identify pseudo-processes within a particular
	54	session. Note that this value is subject to recycling if any
	55	pseudo-processes are launched after others have been wait()-ed on.
	56
	57	=item %ENV
	58
4375e838	59	Each pseudo-process maintains its own virtual environment. Modifications
7766f137	60	to %ENV affect the virtual environment, and are only visible within that
	61	pseudo-process, and in any processes (or pseudo-processes) launched from
	62	it.
	63
	64	=item chdir() and all other builtins that accept filenames
	65
	66	Each pseudo-process maintains its own virtual idea of the current directory.
	67	Modifications to the current directory using chdir() are only visible within
	68	that pseudo-process, and in any processes (or pseudo-processes) launched from
	69	it. All file and directory accesses from the pseudo-process will correctly
	70	map the virtual working directory to the real working directory appropriately.
	71
	72	=item wait() and waitpid()
	73
	74	wait() and waitpid() can be passed a pseudo-process ID returned by fork().
	75	These calls will properly wait for the termination of the pseudo-process
	76	and return its status.
	77
	78	=item kill()
	79
	80	kill() can be used to terminate a pseudo-process by passing it the ID returned
	81	by fork(). This should not be used except under dire circumstances, because
	82	the operating system may not guarantee integrity of the process resources
	83	when a running thread is terminated. Note that using kill() on a
	84	pseudo-process() may typically cause memory leaks, because the thread that
	85	implements the pseudo-process does not get a chance to clean up its resources.
	86
	87	=item exec()
	88
	89	Calling exec() within a pseudo-process actually spawns the requested
	90	executable in a separate process and waits for it to complete before
	91	exiting with the same exit status as that process. This means that the
	92	process ID reported within the running executable will be different from
	93	what the earlier Perl fork() might have returned. Similarly, any process
	94	manipulation functions applied to the ID returned by fork() will affect the
	95	waiting pseudo-process that called exec(), not the real process it is
	96	waiting for after the exec().
	97
	98	=item exit()
	99
	100	exit() always exits just the executing pseudo-process, after automatically
	101	wait()-ing for any outstanding child pseudo-processes. Note that this means
	102	that the process as a whole will not exit unless all running pseudo-processes
1d335e36	103	have exited. See below for some limitations with open filehandles.
7766f137	104
	105	=item Open handles to files, directories and network sockets
	106
	107	All open handles are dup()-ed in pseudo-processes, so that closing
	108	any handles in one process does not affect the others. See below for
	109	some limitations.
	110
	111	=back
	112
	113	=head2 Resource limits
	114
	115	In the eyes of the operating system, pseudo-processes created via the fork()
	116	emulation are simply threads in the same process. This means that any
	117	process-level limits imposed by the operating system apply to all
	118	pseudo-processes taken together. This includes any limits imposed by the
	119	operating system on the number of open file, directory and socket handles,
	120	limits on disk space usage, limits on memory size, limits on CPU utilization
	121	etc.
	122
	123	=head2 Killing the parent process
	124
	125	If the parent process is killed (either using Perl's kill() builtin, or
	126	using some external means) all the pseudo-processes are killed as well,
	127	and the whole process exits.
	128
	129	=head2 Lifetime of the parent process and pseudo-processes
	130
	131	During the normal course of events, the parent process and every
	132	pseudo-process started by it will wait for their respective pseudo-children
	133	to complete before they exit. This means that the parent and every
	134	pseudo-child created by it that is also a pseudo-parent will only exit
	135	after their pseudo-children have exited.
	136
	137	A way to mark a pseudo-processes as running detached from their parent (so
	138	that the parent would not have to wait() for them if it doesn't want to)
	139	will be provided in future.
	140
	141	=head2 CAVEATS AND LIMITATIONS
	142
	143	=over 8
	144
	145	=item BEGIN blocks
	146
	147	The fork() emulation will not work entirely correctly when called from
	148	within a BEGIN block. The forked copy will run the contents of the
	149	BEGIN block, but will not continue parsing the source stream after the
	150	BEGIN block. For example, consider the following code:
	151
	152	BEGIN {
	153	fork and exit; # fork child and exit the parent
	154	print "inner\n";
	155	}
	156	print "outer\n";
	157
	158	This will print:
	159
	160	inner
	161
	162	rather than the expected:
	163
	164	inner
	165	outer
	166
	167	This limitation arises from fundamental technical difficulties in
168	cloning and restarting the stacks used by the Perl parser in the
169	middle of a parse.
170
171	=item Open filehandles
172
173	Any filehandles open at the time of the fork() will be dup()-ed. Thus,
174	the files can be closed independently in the parent and child, but beware
175	that the dup()-ed handles will still share the same seek pointer. Changing
176	the seek position in the parent will change it in the child and vice-versa.
177	One can avoid this by opening files that need distinct seek pointers
178	separately in the child.
179
1d335e36	180	On some operating systems, notably Solaris and Unixware, calling C<exit()>
	181	from a child process will flush and close open filehandles in the parent,
	182	thereby corrupting the filehandles. On these systems, calling C<_exit()>
	183	is suggested instead. C<_exit()> is available in Perl through the
	184	C<POSIX> module. Please consult your systems manpages for more information
	185	on this.
	186
030866aa	187	=item Forking pipe open() not yet implemented
	188
	189	The C<open(FOO, "\|-")> and C<open(BAR, "-\|")> constructs are not yet
	190	implemented. This limitation can be easily worked around in new code
	191	by creating a pipe explicitly. The following example shows how to
	192	write to a forked child:
	193
	194	# simulate open(FOO, "\|-")
	195	sub pipe_to_fork ($) {
	196	my $parent = shift;
	197	pipe my $child, $parent or die;
	198	my $pid = fork();
	199	die "fork() failed: $!" unless defined $pid;
	200	if ($pid) {
	201	close $child;
	202	}
	203	else {
	204	close $parent;
	205	open(STDIN, "<&=" . fileno($child)) or die;
	206	}
	207	$pid;
	208	}
	209
	210	if (pipe_to_fork('FOO')) {
	211	# parent
	212	print FOO "pipe_to_fork\n";
	213	close FOO;
	214	}
	215	else {
	216	# child
	217	while (<STDIN>) { print; }
030866aa	218	exit(0);
	219	}
	220
	221	And this one reads from the child:
	222
	223	# simulate open(FOO, "-\|")
	224	sub pipe_from_fork ($) {
	225	my $parent = shift;
	226	pipe $parent, my $child or die;
	227	my $pid = fork();
	228	die "fork() failed: $!" unless defined $pid;
	229	if ($pid) {
	230	close $child;
	231	}
	232	else {
	233	close $parent;
	234	open(STDOUT, ">&=" . fileno($child)) or die;
	235	}
	236	$pid;
	237	}
	238
	239	if (pipe_from_fork('BAR')) {
	240	# parent
	241	while (<BAR>) { print; }
	242	close BAR;
	243	}
	244	else {
	245	# child
	246	print "pipe_from_fork\n";
030866aa	247	exit(0);
	248	}
	249
	250	Forking pipe open() constructs will be supported in future.
	251
7766f137	252	=item Global state maintained by XSUBs
	253
	254	External subroutines (XSUBs) that maintain their own global state may
	255	not work correctly. Such XSUBs will either need to maintain locks to
	256	protect simultaneous access to global data from different pseudo-processes,
	257	or maintain all their state on the Perl symbol table, which is copied
	258	naturally when fork() is called. A callback mechanism that provides
	259	extensions an opportunity to clone their state will be provided in the
	260	near future.
	261
	262	=item Interpreter embedded in larger application
	263
	264	The fork() emulation may not behave as expected when it is executed in an
	265	application which embeds a Perl interpreter and calls Perl APIs that can
	266	evaluate bits of Perl code. This stems from the fact that the emulation
	267	only has knowledge about the Perl interpreter's own data structures and
	268	knows nothing about the containing application's state. For example, any
	269	state carried on the application's own call stack is out of reach.
	270
7e396c59	271	=item Thread-safety of extensions
	272
	273	Since the fork() emulation runs code in multiple threads, extensions
	274	calling into non-thread-safe libraries may not work reliably when
	275	calling fork(). As Perl's threading support gradually becomes more
	276	widely adopted even on platforms with a native fork(), such extensions
	277	are expected to be fixed for thread-safety.
	278
7766f137	279	=back
	280
	281	=head1 BUGS
	282
	283	=over 8
	284
	285	=item *
	286
	287	Having pseudo-process IDs be negative integers breaks down for the integer
	288	C<-1> because the wait() and waitpid() functions treat this number as
	289	being special. The tacit assumption in the current implementation is that
	290	the system never allocates a thread ID of C<1> for user threads. A better
	291	representation for pseudo-process IDs will be implemented in future.
	292
	293	=item *
	294
c3c83ace	295	In certain cases, the OS-level handles created by the pipe(), socket(),
	296	and accept() operators are apparently not duplicated accurately in
	297	pseudo-processes. This only happens in some situations, but where it
	298	does happen, it may result in deadlocks between the read and write ends
	299	of pipe handles, or inability to send or receive data across socket
	300	handles.
	301
	302	=item *
	303
7766f137	304	This document may be incomplete in some respects.
7766f137	305
a45bd81d	306	=back
a45bd81d	307
7766f137	308	=head1 AUTHOR
7766f137	309
7e396c59	310	Support for concurrent interpreters and the fork() emulation was implemented
7e396c59	311	by ActiveState, with funding from Microsoft Corporation.
7766f137	312
	313	This document is authored and maintained by Gurusamy Sarathy
	314	E<lt>gsar@activestate.comE<gt>.
	315
	316	=head1 SEE ALSO
	317
	318	L<perlfunc/"fork">, L<perlipc>
	319
	320	=cut