various fixes for race conditions under threads: mutex locks based
on PL_threadnum were seriously flawed, since it means more than one
thread could enter the critical region; PL_na was global instead of
thread-local; child thread could finish and free thr structures
before Thread->new() got around to creating the Thread object;
cv_clone() needed locking, as it mucks with PL_comppad and other
global data; new_struct_thread() needed to lock template-thread's
mutex while copying its data
p4raw-id: //depot/perl@2385