From: Jarkko Hietaniemi Date: Sun, 9 Jul 2006 19:42:10 +0000 (+0300) Subject: some coding guidelines/tips to perlhack (+ one perltodo) X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=d7889f521df6df292ab04aa2a200fc0a28ed85d8;p=p5sagit%2Fp5-mst-13.2.git some coding guidelines/tips to perlhack (+ one perltodo) Message-ID: <44B131E2.8050805@iki.fi> p4raw-id: //depot/perl@28516 --- diff --git a/pod/perlhack.pod b/pod/perlhack.pod index ee1e362..7949bf6 100644 --- a/pod/perlhack.pod +++ b/pod/perlhack.pod @@ -2352,6 +2352,448 @@ running 'make test_notty'. =back +=head2 Common problems when patching Perl source code + +Perl source plays by ANSI C89 rules: no C99 (or C++) extensions. In +some cases we have to take pre-ANSI requirements into consideration. +You don't care about some particular platform having broken Perl? +I hear there is still a strong demand for J2EE programmers. + +=head2 Perl environment problems + +=over 4 + +=item * + +Not compiling with threading + +Compiling with threading (-Duseithreads) completely rewrites +the function prototypes of Perl. You better try your changes +with that. Related to this is the difference between "Perl_"-less +and "Perl_-ly" APIs, for example: + + Perl_sv_setiv(aTHX_ ...); + sv_setiv(...); + +The first one explicitly passes in the context, which is needed for +e.g. threaded builds. The second one does that implicitly; do not get +them mixed. + +See L +for further discussion about context. + +=item * + +Not compiling with -DDEBUGGING + +The DEBUGGING define exposes more code to the compiler, +therefore more ways for things to go wrong. + +=item * + +Not exporting your new function + +Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any +function that is part of the public API (the shared Perl library) +to be explicitly marked as exported. See the discussion about +F in L. + +=item * + +Exporting your new function + +The new shiny result of either genuine new functionality or your +arduous refactoring is now ready and correctly exported. So what +could possibly be wrong? + +Maybe simply that your function did not need to be exported in the +first place. Perl has a long and not so glorious history of exporting +functions that it should not have. + +If the function is used only inside one source code file, make it +static. See the discussion about F in L. + +If the function is used across several files, but intended only for +Perl's internal use (and this should be the common case), do not +export it to the public API. See the discussion about F +in L. + +=back + +=head Portability problems + +The following are common causes of compilation and/or execution +failures, not common to Perl as such. The C FAQ is good bedtime +reading. Please test your changes with as many C compilers as +possible -- we will, anyway, and it's nice to save oneself from +public embarrassment. + +=over 4 + +=item * + +Casting pointers to integers or casting integers to pointers + + void castaway(U8* p) + { + IV i = p; + +or + + void castaway(U8* p) + { + IV i = (IV)p; + +Either are bad, and broken, and unportable. Use the PTR2IV() +macro that does it right. (Likewise, there are PTR2UV(), PTR2NV(), +INT2PTR(), and NUM2PTR().) + +=item * + +Technically speaking casting between function pointers and data +pointers is unportable and undefined, but practically speaking +it seems to work, but you should use the FPTR2DPTR() and DPTR2FPTR() +macros. + +=item * + +Assuming sizeof(int) == sizeof(long) + +There are platforms where longs are 64 bits, and platforms where ints +are 64 bits, and while we are out to shock you, even platforms where +shorts are 64 bits. This is all legal according to the C standard. +(In other words, "long long" is not a portable way to specify 64 bits, +and "long long" is not even guaranteed to be any wider than "long".) +Use definitions like IVSIZE, I32SIZE, and so forth. + +=item * + +Assuming one can dereference any type of pointer for any type of data + + char *p = ...; + long pony = *p; + +Many platforms, quite rightly so, will give you a core dump instead +of a pony if the p happens not be correctly aligned. + +=item * + +Lvalue casts + + (int)*p = ...; + +Simply not portable. Get your lvalue to be of the right type, +or maybe use temporary variables. + +=item * + +Using //-comments + + // This function bamfoodles the zorklator. + +That is C99 or C++. Perl is C89. Using the //-comments is silently +allowed by many C compilers but cranking up the ANSI strictness (which +we like to do) causes the compilation to fail. + +=item * + +Mixing declarations and code + + void zorklator() + { + int n = 3; + set_zorkmids(n); + int q = 4; + +That is C99 or C++. Some compilers allow that, but you shouldn't. + +=item * + +Mixing signed char pointers with unsigned char pointers + + int foo(char *s) { ... } + ... + unsigned char *t = ...; /* Or U8* t = ... */ + foo(t); + +While this is legal practice, it is certainly dubious, and downright +fatal in at least one platform: for example VMS cc considers this a +fatal error. One cause for people often making this mistake is that a +"naked char" and therefore deferencing a "naked char pointer" have an +undefined sign: it depends on the compilers and the platform whether +the result is signed or unsigned. + +=item * + +Macros that have string constants and their arguments as substrings of +the string constants + + #define FOO(n) printf("number = %d\n", n) + FOO(10); + +Pre-ANSI semantics for that was equivalent to + + printf("10umber = %d\10"); + +which is probably not what you were expecting. Unfortunately at +least one C compiler does real backward compatibility here, in AIX +that is what still happens even though the rest of the AIX compiler +is very happily C89. + +=back + +=head2 Security problems + +Last but not least, here are various tips for safer coding. + +=over 4 + +=item * + +Do not use gets() + +Or we will publicly ridicule you. Seriously. + +=item * + +Do not use strcpy() or strcat() + +Where there still linger some uses of these in the Perl source code, +we have inspected them for safety and are very, very ashamed of them, +and plan to get rid of them. In places where there are strlcpy() +and strlcat() we prefer to use them, and there is a plan to integrate +the strlcpy/strlcat implementation of INN. + +=item * + +Do not use sprintf() or vsprintf() + +Use my_snprintf() and my_vnsprintf() instead, which will try to use +snprintf() and vsnprintf() if those safer APIs are available. + +=back + +=head2 Common problems when patching Perl source code + +Perl source plays by ANSI C89 rules: no C99 (or C++) extensions. In +some cases we have to take pre-ANSI requirements into consideration. +You don't care about some particular platform having broken Perl? +I hear there is still a strong demand for J2EE programmers. + +=head2 Perl environment problems + +=over 4 + +=item * + +Not compiling with threading + +Compiling with threading (-Duseithreads) completely rewrites +the function prototypes of Perl. You better try your changes +with that. Related to this is the difference between "Perl_"-less +and "Perl_-ly" APIs, for example: + + Perl_sv_setiv(aTHX_ ...); + sv_setiv(...); + +The first one explicitly passes in the context, which is needed for +e.g. threaded builds. The second one does that implicitly; do not get +them mixed. + +See L +for further discussion about context. + +=item * + +Not compiling with -DDEBUGGING + +The DEBUGGING define exposes more code to the compiler, +therefore more ways for things to go wrong. + +=item * + +Not exporting your new function + +Some platforms (Win32, AIX, VMS, OS/2, to name a few) require any +function that is part of the public API (the shared Perl library) +to be explicitly marked as exported. See the discussion about +F in L. + +=item * + +Exporting your new function + +The new shiny result of either genuine new functionality or your +arduous refactoring is now ready and correctly exported. So what +could possibly be wrong? + +Maybe simply that your function did not need to be exported in the +first place. Perl has a long and not so glorious history of exporting +functions that it should not have. + +If the function is used only inside one source code file, make it +static. See the discussion about F in L. + +If the function is used across several files, but intended only for +Perl's internal use (and this should be the common case), do not +export it to the public API. See the discussion about F +in L. + +=back + +=head Portability problems + +The following are common causes of compilation and/or execution +failures, not common to Perl as such. The C FAQ is good bedtime +reading. Please test your changes with as many C compilers as +possible -- we will, anyway, and it's nice to save oneself from +public embarrassment. + +=over 4 + +=item * + +Casting pointers to integers or casting integers to pointers + + void castaway(U8* p) + { + IV i = p; + +or + + void castaway(U8* p) + { + IV i = (IV)p; + +Either are bad, and broken, and unportable. Use the PTR2IV() +macro that does it right. (Likewise, there are PTR2UV(), PTR2NV(), +INT2PTR(), and NUM2PTR().) + +=item * + +Technically speaking casting between function pointers and data +pointers is unportable and undefined, but practically speaking +it seems to work, but you should use the FPTR2DPTR() and DPTR2FPTR() +macros. + +=item * + +Assuming sizeof(int) == sizeof(long) + +There are platforms where longs are 64 bits, and platforms where ints +are 64 bits, and while we are out to shock you, even platforms where +shorts are 64 bits. This is all legal according to the C standard. +(In other words, "long long" is not a portable way to specify 64 bits, +and "long long" is not even guaranteed to be any wider than "long".) +Use definitions like IVSIZE, I32SIZE, and so forth. + +=item * + +Assuming one can dereference any type of pointer for any type of data + + char *p = ...; + long pony = *p; + +Many platforms, quite rightly so, will give you a core dump instead +of a pony if the p happens not be correctly aligned. + +=item * + +Lvalue casts + + (int)*p = ...; + +Simply not portable. Get your lvalue to be of the right type, +or maybe use temporary variables. + +=item * + +Using //-comments + + // This function bamfoodles the zorklator. + +That is C99 or C++. Perl is C89. Using the //-comments is silently +allowed by many C compilers but cranking up the ANSI strictness (which +we like to do) causes the compilation to fail. + +=item * + +Mixing declarations and code + + void zorklator() + { + int n = 3; + set_zorkmids(n); + int q = 4; + +That is C99 or C++. Some compilers allow that, but you shouldn't. + +=item * + +Mixing signed char pointers with unsigned char pointers + + int foo(char *s) { ... } + ... + unsigned char *t = ...; /* Or U8* t = ... */ + foo(t); + +While this is legal practice, it is certainly dubious, and downright +fatal in at least one platform: for example VMS cc considers this a +fatal error. One cause for people often making this mistake is that a +"naked char" and therefore deferencing a "naked char pointer" have an +undefined sign: it depends on the compilers and the platform whether +the result is signed or unsigned. + +=item * + +Macros that have string constants and their arguments as substrings of +the string constants + + #define FOO(n) printf("number = %d\n", n) + FOO(10); + +Pre-ANSI semantics for that was equivalent to + + printf("10umber = %d\10"); + +which is probably not what you were expecting. Unfortunately at +least one C compiler does real backward compatibility here, in AIX +that is what still happens even though the rest of the AIX compiler +is very happily C89. + +=back + +=head2 Security problems + +Last but not least, here are various tips for safer coding. + +=over 4 + +=item * + +Do not use gets() + +Or we will publicly ridicule you. Seriously. + +=item * + +Do not use strcpy() or strcat() + +Where there still linger some uses of these in the Perl source code, +we have inspected them for safety and are very, very ashamed of them, +and plan to get rid of them. In places where there are strlcpy() +and strlcat() we prefer to use them, and there is a plan to integrate +the strlcpy/strlcat implementation of INN. + +=item * + +Do not use sprintf() or vsprintf() + +Use my_snprintf() and my_vnsprintf() instead, which will try to use +snprintf() and vsnprintf() if those safer APIs are available. + +=back + =head1 EXTERNAL TOOLS FOR DEBUGGING PERL Sometimes it helps to use external tools while debugging and diff --git a/pod/perltodo.pod b/pod/perltodo.pod index cbf750c..bade067 100644 --- a/pod/perltodo.pod +++ b/pod/perltodo.pod @@ -620,9 +620,9 @@ set. The pad API only takes a C pointer, so that's all bytes too. The tokeniser ignores the UTF-8-ness of C, or any SVs returned from source filters. All this could be fixed. +=head2 Integrate Russ Allbery's strlcat/strlcpy implementation - - +And remove the last remaining uses of strcat() and strcpy(). =head1 Big projects