7 This section deals with questions related to networking, the internet,
10 =head2 What is the correct form of response from a CGI script?
12 (Alan Flavell <flavell+www@a5.ph.gla.ac.uk> answers...)
14 The Common Gateway Interface (CGI) specifies a software interface between
15 a program ("CGI script") and a web server (HTTPD). It is not specific
16 to Perl, and has its own FAQs and tutorials, and usenet group,
17 comp.infosystems.www.authoring.cgi
19 The CGI specification is outlined in an informational RFC:
20 http://www.ietf.org/rfc/rfc3875
22 Other relevant documentation listed in: http://www.perl.org/CGI_MetaFAQ.html
24 These Perl FAQs very selectively cover some CGI issues. However, Perl
25 programmers are strongly advised to use the C<CGI.pm> module, to take care
26 of the details for them.
28 The similarity between CGI response headers (defined in the CGI
29 specification) and HTTP response headers (defined in the HTTP
30 specification, RFC2616) is intentional, but can sometimes be confusing.
32 The CGI specification defines two kinds of script: the "Parsed Header"
33 script, and the "Non Parsed Header" (NPH) script. Check your server
34 documentation to see what it supports. "Parsed Header" scripts are
35 simpler in various respects. The CGI specification allows any of the
36 usual newline representations in the CGI response (it's the server's
37 job to create an accurate HTTP response based on it). So "\n" written in
38 text mode is technically correct, and recommended. NPH scripts are more
39 tricky: they must put out a complete and accurate set of HTTP
40 transaction response headers; the HTTP specification calls for records
41 to be terminated with carriage-return and line-feed, i.e ASCII \015\012
42 written in binary mode.
44 Using C<CGI.pm> gives excellent platform independence, including EBCDIC
45 systems. C<CGI.pm> selects an appropriate newline representation
46 (C<$CGI::CRLF>) and sets binmode as appropriate.
48 =head2 My CGI script runs from the command line but not the browser. (500 Server Error)
50 Several things could be wrong. You can go through the "Troubleshooting
51 Perl CGI scripts" guide at
53 http://www.perl.org/troubleshooting_CGI.html
55 If, after that, you can demonstrate that you've read the FAQs and that
56 your problem isn't something simple that can be easily answered, you'll
57 probably receive a courteous and useful reply to your question if you
58 post it on comp.infosystems.www.authoring.cgi (if it's something to do
59 with HTTP or the CGI protocols). Questions that appear to be Perl
60 questions but are really CGI ones that are posted to comp.lang.perl.misc
61 are not so well received.
63 The useful FAQs, related documents, and troubleshooting guides are
64 listed in the CGI Meta FAQ:
66 http://www.perl.org/CGI_MetaFAQ.html
68 =head2 How can I get better error messages from a CGI program?
70 Use the C<CGI::Carp> module. It replaces C<warn> and C<die>, plus the
71 normal C<Carp> modules C<carp>, C<croak>, and C<confess> functions with
72 more verbose and safer versions. It still sends them to the normal
76 warn "This is a complaint";
77 die "But this one is serious";
79 The following use of C<CGI::Carp> also redirects errors to a file of your choice,
80 placed in a C<BEGIN> block to catch compile-time warnings as well:
83 use CGI::Carp qw(carpout);
84 open(LOG, ">>/var/local/cgi-logs/mycgi-log")
85 or die "Unable to append to mycgi-log: $!\n";
89 You can even arrange for fatal errors to go back to the client browser,
90 which is nice for your own debugging, but might confuse the end user.
92 use CGI::Carp qw(fatalsToBrowser);
95 Even if the error happens before you get the HTTP header out, the module
96 will try to take care of this to avoid the dreaded server 500 errors.
97 Normal warnings still go out to the server error log (or wherever
98 you've sent them with C<carpout>) with the application name and date
101 =head2 How do I remove HTML from a string?
103 The most correct way (albeit not the fastest) is to use C<HTML::Parser>
104 from CPAN. Another mostly correct
105 way is to use C<HTML::FormatText> which not only removes HTML but also
106 attempts to do a little simple formatting of the resulting plain text.
108 Many folks attempt a simple-minded regular expression approach, like
109 C<< s/<.*?>//g >>, but that fails in many cases because the tags
110 may continue over line breaks, they may contain quoted angle-brackets,
111 or HTML comment may be present. Plus, folks forget to convert
112 entities--like C<<> for example.
114 Here's one "simple-minded" approach, that works for most files:
116 #!/usr/bin/perl -p0777
117 s/<(?:[^>'"]*|(['"]).*?\1)*>//gs
119 If you want a more complete solution, see the 3-stage striphtml
121 http://www.cpan.org/authors/Tom_Christiansen/scripts/striphtml.gz
124 Here are some tricky cases that you should think about when picking
127 <IMG SRC = "foo.gif" ALT = "A > B">
134 <script>if (a<b && a>c)</script>
138 <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
140 If HTML comments include other tags, those solutions would also break
143 <!-- This section commented out.
144 <B>You can't see me!</B>
147 =head2 How do I extract URLs?
149 You can easily extract all sorts of URLs from HTML with
150 C<HTML::SimpleLinkExtor> which handles anchors, images, objects,
151 frames, and many other tags that can contain a URL. If you need
152 anything more complex, you can create your own subclass of
153 C<HTML::LinkExtor> or C<HTML::Parser>. You might even use
154 C<HTML::SimpleLinkExtor> as an example for something specifically
155 suited to your needs.
157 You can use C<URI::Find> to extract URLs from an arbitrary text document.
159 Less complete solutions involving regular expressions can save
160 you a lot of processing time if you know that the input is simple. One
161 solution from Tom Christiansen runs 100 times faster than most
162 module based approaches but only extracts URLs from anchors where the first
163 attribute is HREF and there are no other attributes.
166 # qxurl - tchrist@perl.com
167 print "$2\n" while m{
169 A \s+ HREF \s* = \s* (["']) (.*?) \1
173 =head2 How do I download a file from the user's machine? How do I open a file on another machine?
175 In this case, download means to use the file upload feature of HTML
176 forms. You allow the web surfer to specify a file to send to your web
177 server. To you it looks like a download, and to the user it looks
178 like an upload. No matter what you call it, you do it with what's
179 known as B<multipart/form-data> encoding. The C<CGI.pm> module (which
180 comes with Perl as part of the Standard Library) supports this in the
181 C<start_multipart_form()> method, which isn't the same as the C<startform()>
184 See the section in the C<CGI.pm> documentation on file uploads for code
185 examples and details.
187 =head2 How do I make an HTML pop-up menu with Perl?
189 (contributed by brian d foy)
191 The C<CGI.pm> module (which comes with Perl) has functions to create
192 the HTML form widgets. See the C<CGI.pm> documentation for more
195 use CGI qw/:standard/;
197 start_html('Favorite Animals'),
200 "What's your favorite animal? ",
203 -values => [ qw( Llama Alpaca Camel Ram ) ]
210 =head2 How do I fetch an HTML file?
212 (contributed by brian d foy)
214 Use the libwww-perl distribution. The C<LWP::Simple> module can fetch web
215 resources and give their content back to you as a string:
217 use LWP::Simple qw(get);
219 my $html = get( "http://www.example.com/index.html" );
221 It can also store the resource directly in a file:
223 use LWP::Simple qw(getstore);
225 getstore( "http://www.example.com/index.html", "foo.html" );
227 If you need to do something more complicated, you can use
228 C<LWP::UserAgent> module to create your own user-agent (e.g. browser)
229 to get the job done. If you want to simulate an interactive web
230 browser, you can use the C<WWW::Mechanize> module.
232 =head2 How do I automate an HTML form submission?
234 If you are doing something complex, such as moving through many pages
235 and forms or a web site, you can use C<WWW::Mechanize>. See its
236 documentation for all the details.
238 If you're submitting values using the GET method, create a URL and encode
239 the form using the C<query_form> method:
244 my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
245 $url->query_form(module => 'DB_File', readme => 1);
246 $content = get($url);
248 If you're using the POST method, create your own user agent and encode
249 the content appropriately.
251 use HTTP::Request::Common qw(POST);
254 $ua = LWP::UserAgent->new();
255 my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod',
256 [ module => 'DB_File', readme => 1 ];
257 $content = $ua->request($req)->as_string;
259 =head2 How do I decode or create those %-encodings on the web?
260 X<URI> X<CGI.pm> X<CGI> X<URI::Escape> X<RFC 2396>
262 (contributed by brian d foy)
264 Those C<%> encodings handle reserved characters in URIs, as described
265 in RFC 2396, Section 2. This encoding replaces the reserved character
266 with the hexadecimal representation of the character's number from
267 the US-ASCII table. For instance, a colon, C<:>, becomes C<%3A>.
269 In CGI scripts, you don't have to worry about decoding URIs if you are
270 using C<CGI.pm>. You shouldn't have to process the URI yourself,
271 either on the way in or the way out.
273 If you have to encode a string yourself, remember that you should
274 never try to encode an already-composed URI. You need to escape the
275 components separately then put them together. To encode a string, you
276 can use the the C<URI::Escape> module. The C<uri_escape> function
277 returns the escaped string:
279 my $original = "Colon : Hash # Percent %";
281 my $escaped = uri_escape( $original );
283 print "$escaped\n"; # 'Colon%20%3A%20Hash%20%23%20Percent%20%25'
285 To decode the string, use the C<uri_unescape> function:
287 my $unescaped = uri_unescape( $escaped );
289 print $unescaped; # back to original
291 If you wanted to do it yourself, you simply need to replace the
292 reserved characters with their encodings. A global substitution
296 $string =~ s/([^^A-Za-z0-9\-_.!~*'()])/ sprintf "%%%0x", ord $1 /eg;
299 $string =~ s/%([A-Fa-f\d]{2})/chr hex $1/eg;
301 =head2 How do I redirect to another page?
303 Specify the complete URL of the destination (even if it is on the same
304 server). This is one of the two different kinds of CGI "Location:"
305 responses which are defined in the CGI specification for a Parsed Headers
306 script. The other kind (an absolute URLpath) is resolved internally to
307 the server without any HTTP redirection. The CGI specifications do not
308 allow relative URLs in either case.
310 Use of C<CGI.pm> is strongly recommended. This example shows redirection
311 with a complete URL. This redirection is handled by the web browser.
313 use CGI qw/:standard/;
315 my $url = 'http://www.cpan.org/';
316 print redirect($url);
318 This example shows a redirection with an absolute URLpath. This
319 redirection is handled by the local web server.
321 my $url = '/CPAN/index.html';
322 print redirect($url);
324 But if coded directly, it could be as follows (the final "\n" is
325 shown separately, for clarity), using either a complete URL or
328 print "Location: $url\n"; # CGI response header
329 print "\n"; # end of headers
331 =head2 How do I put a password on my web pages?
333 To enable authentication for your web server, you need to configure
334 your web server. The configuration is different for different sorts
335 of web servers--apache does it differently from iPlanet which does
336 it differently from IIS. Check your web server documentation for
337 the details for your particular server.
339 =head2 How do I edit my .htpasswd and .htgroup files with Perl?
341 The C<HTTPD::UserAdmin> and C<HTTPD::GroupAdmin> modules provide a
342 consistent OO interface to these files, regardless of how they're
343 stored. Databases may be text, dbm, Berkeley DB or any database with
344 a DBI compatible driver. C<HTTPD::UserAdmin> supports files used by the
345 "Basic" and "Digest" authentication schemes. Here's an example:
347 use HTTPD::UserAdmin ();
349 ->new(DB => "/foo/.htpasswd")
350 ->add($username => $password);
352 =head2 How do I make sure users can't enter values into a form that cause my CGI script to do bad things?
354 See the security references listed in the CGI Meta FAQ
356 http://www.perl.org/CGI_MetaFAQ.html
358 =head2 How do I parse a mail header?
360 For a quick-and-dirty solution, try this solution derived
361 from L<perlfunc/split>:
365 $header =~ s/\n\s+/ /g; # merge continuation lines
366 %head = ( UNIX_FROM_LINE, split /^([-\w]+):\s*/m, $header );
368 That solution doesn't do well if, for example, you're trying to
369 maintain all the Received lines. A more complete approach is to use
370 the C<Mail::Header> module from CPAN (part of the C<MailTools> package).
372 =head2 How do I decode a CGI form?
374 (contributed by brian d foy)
376 Use the C<CGI.pm> module that comes with Perl. It's quick,
377 it's easy, and it actually does quite a bit of work to
378 ensure things happen correctly. It handles GET, POST, and
379 HEAD requests, multipart forms, multivalued fields, query
380 string and message body combinations, and many other things
381 you probably don't want to think about.
383 It doesn't get much easier: the C<CGI.pm> module automatically
384 parses the input and makes each value available through the
387 use CGI qw(:standard);
389 my $total = param( 'price' ) + param( 'shipping' );
391 my @items = param( 'item' ); # multiple values, same field name
393 If you want an object-oriented approach, C<CGI.pm> can do that too.
397 my $cgi = CGI->new();
399 my $total = $cgi->param( 'price' ) + $cgi->param( 'shipping' );
401 my @items = $cgi->param( 'item' );
403 You might also try C<CGI::Minimal> which is a lightweight version
404 of the same thing. Other CGI::* modules on CPAN might work better
407 Many people try to write their own decoder (or copy one from
408 another program) and then run into one of the many "gotchas"
409 of the task. It's much easier and less hassle to use C<CGI.pm>.
411 =head2 How do I check a valid mail address?
413 (partly contributed by Aaron Sherman)
415 This isn't as simple a question as it sounds. There are two parts:
417 a) How do I verify that an email address is correctly formatted?
419 b) How do I verify that an email address targets a valid recipient?
421 Without sending mail to the address and seeing whether there's a human
422 on the other end to answer you, you cannot fully answer part I<b>, but
423 either the C<Email::Valid> or the C<RFC::RFC822::Address> module will do
424 both part I<a> and part I<b> as far as you can in real-time.
426 If you want to just check part I<a> to see that the address is valid
427 according to the mail header standard with a simple regular expression,
428 you can have problems, because there are deliverable addresses that
429 aren't RFC-2822 (the latest mail header standard) compliant, and
430 addresses that aren't deliverable which, are compliant. However, the
431 following will match valid RFC-2822 addresses that do not have comments,
432 folding whitespace, or any other obsolete or non-essential elements.
433 This I<just> matches the address itself:
435 my $atom = qr{[a-zA-Z0-9_!#\$\%&'*+/=?\^`{}~|\-]+};
436 my $dot_atom = qr{$atom(?:\.$atom)*};
437 my $quoted = qr{"(?:\\[^\r\n]|[^\\"])*"};
438 my $local = qr{(?:$dot_atom|$quoted)};
439 my $quotedpair = qr{\\[\x00-\x09\x0B-\x0c\x0e-\x7e]};
440 my $domain_lit = qr{\[(?:$quotedpair|[\x21-\x5a\x5e-\x7e])*\]};
441 my $domain = qr{(?:$dot_atom|$domain_lit)};
442 my $addr_spec = qr{$local\@$domain};
444 Just match an address against C</^${addr_spec}$/> to see if it follows
445 the RFC2822 specification. However, because it is impossible to be
446 sure that such a correctly formed address is actually the correct way
447 to reach a particular person or even has a mailbox associated with it,
448 you must be very careful about how you use this.
450 Our best advice for verifying a person's mail address is to have them
451 enter their address twice, just as you normally do to change a
452 password. This usually weeds out typos. If both versions match, send
453 mail to that address with a personal message. If you get the message
454 back and they've followed your directions, you can be reasonably
455 assured that it's real.
457 A related strategy that's less open to forgery is to give them a PIN
458 (personal ID number). Record the address and PIN (best that it be a
459 random one) for later processing. In the mail you send, ask them to
460 include the PIN in their reply. But if it bounces, or the message is
461 included via a "vacation" script, it'll be there anyway. So it's
462 best to ask them to mail back a slight alteration of the PIN, such as
463 with the characters reversed, one added or subtracted to each digit, etc.
465 =head2 How do I decode a MIME/BASE64 string?
467 The C<MIME-Base64> package (available from CPAN) handles this as well as
468 the MIME/QP encoding. Decoding BASE64 becomes as simple as:
471 $decoded = decode_base64($encoded);
473 The C<MIME-Tools> package (available from CPAN) supports extraction with
474 decoding of BASE64 encoded attachments and content directly from email
477 If the string to decode is short (less than 84 bytes long)
478 a more direct approach is to use the C<unpack()> function's "u"
479 format after minor transliterations:
481 tr#A-Za-z0-9+/##cd; # remove non-base64 chars
482 tr#A-Za-z0-9+/# -_#; # convert to uuencoded format
483 $len = pack("c", 32 + 0.75*length); # compute length byte
484 print unpack("u", $len . $_); # uudecode and print
486 =head2 How do I return the user's mail address?
488 On systems that support getpwuid, the C<< $< >> variable, and the
489 C<Sys::Hostname> module (which is part of the standard perl distribution),
490 you can probably try using something like this:
493 $address = sprintf('%s@%s', scalar getpwuid($<), hostname);
495 Company policies on mail address can mean that this generates addresses
496 that the company's mail system will not accept, so you should ask for
497 users' mail addresses when this matters. Furthermore, not all systems
498 on which Perl runs are so forthcoming with this information as is Unix.
500 The C<Mail::Util> module from CPAN (part of the C<MailTools> package) provides a
501 C<mailaddress()> function that tries to guess the mail address of the user.
502 It makes a more intelligent guess than the code above, using information
503 given when the module was installed, but it could still be incorrect.
504 Again, the best way is often just to ask the user.
506 =head2 How do I send mail?
508 Use the C<sendmail> program directly:
510 open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq")
511 or die "Can't fork for sendmail: $!\n";
512 print SENDMAIL <<"EOF";
513 From: User Originating Mail <me\@host>
514 To: Final Destination <you\@otherhost>
515 Subject: A relevant subject line
517 Body of the message goes here after the blank line
518 in as many lines as you like.
520 close(SENDMAIL) or warn "sendmail didn't close nicely";
522 The B<-oi> option prevents C<sendmail> from interpreting a line consisting
523 of a single dot as "end of message". The B<-t> option says to use the
524 headers to decide who to send the message to, and B<-odq> says to put
525 the message into the queue. This last option means your message won't
526 be immediately delivered, so leave it out if you want immediate
529 Alternate, less convenient approaches include calling C<mail> (sometimes
530 called C<mailx>) directly or simply opening up port 25 have having an
531 intimate conversation between just you and the remote SMTP daemon,
532 probably C<sendmail>.
534 Or you might be able use the CPAN module C<Mail::Mailer>:
538 $mailer = Mail::Mailer->new();
539 $mailer->open({ From => $from_address,
543 or die "Can't open: $!\n";
547 The C<Mail::Internet> module uses C<Net::SMTP> which is less Unix-centric than
548 C<Mail::Mailer>, but less reliable. Avoid raw SMTP commands. There
549 are many reasons to use a mail transport agent like C<sendmail>. These
550 include queuing, MX records, and security.
552 =head2 How do I use MIME to make an attachment to a mail message?
554 This answer is extracted directly from the C<MIME::Lite> documentation.
555 Create a multipart message (i.e., one with attachments).
559 ### Create a new multipart message:
560 $msg = MIME::Lite->new(
561 From =>'me@myhost.com',
562 To =>'you@yourhost.com',
563 Cc =>'some@other.com, some@more.com',
564 Subject =>'A message with 2 parts...',
565 Type =>'multipart/mixed'
568 ### Add parts (each "attach" has same arguments as "new"):
569 $msg->attach(Type =>'TEXT',
570 Data =>"Here's the GIF file you wanted"
572 $msg->attach(Type =>'image/gif',
573 Path =>'aaa000123.gif',
574 Filename =>'logo.gif'
577 $text = $msg->as_string;
579 C<MIME::Lite> also includes a method for sending these things.
583 This defaults to using L<sendmail> but can be customized to use
584 SMTP via L<Net::SMTP>.
586 =head2 How do I read mail?
588 While you could use the C<Mail::Folder> module from CPAN (part of the
589 C<MailFolder> package) or the C<Mail::Internet> module from CPAN (part
590 of the C<MailTools> package), often a module is overkill. Here's a
597 $/ = ''; # paragraph reads
600 /^Subject:\s*(?:Re:\s*)*(.*)/mi;
601 $sub[++$msgno] = lc($1) || '';
605 for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {
612 # bysub2 - awkish sort-by-subject
613 BEGIN { $msgno = -1 }
614 $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;
616 END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
618 =head2 How do I find out my hostname, domainname, or IP address?
619 X<hostname, domainname, IP address, host, domain, hostfqdn, inet_ntoa,
620 gethostbyname, Socket, Net::Domain, Sys::Hostname>
622 (contributed by brian d foy)
624 The C<Net::Domain> module, which is part of the standard distribution starting
625 in perl5.7.3, can get you the fully qualified domain name (FQDN), the host
626 name, or the domain name.
628 use Net::Domain qw(hostname hostfqdn hostdomain);
630 my $host = hostfqdn();
632 The C<Sys::Hostname> module, included in the standard distribution since
633 perl5.6, can also get the hostname.
639 To get the IP address, you can use the C<gethostbyname> built-in function
640 to turn the name into a number. To turn that number into the dotted octet
641 form (a.b.c.d) that most people expect, use the C<inet_ntoa> function
642 from the C<Socket> module, which also comes with perl.
646 my $address = inet_ntoa(
647 scalar gethostbyname( $host || 'localhost' )
650 =head2 How do I fetch a news article or the active newsgroups?
652 Use the C<Net::NNTP> or C<News::NNTPClient> modules, both available from CPAN.
653 This can make tasks like fetching the newsgroup list as simple as
655 perl -MNews::NNTPClient
656 -e 'print News::NNTPClient->new->list("newsgroups")'
658 =head2 How do I fetch/put an FTP file?
660 C<LWP::Simple> (available from CPAN) can fetch but not put. C<Net::FTP> (also
661 available from CPAN) is more complex but can put as well as fetch.
663 =head2 How can I do RPC in Perl?
665 (Contributed by brian d foy)
667 Use one of the RPC modules you can find on CPAN (
668 http://search.cpan.org/search?query=RPC&mode=all ).
670 =head1 AUTHOR AND COPYRIGHT
672 Copyright (c) 1997-2010 Tom Christiansen, Nathan Torkington, and
673 other authors as noted. All rights reserved.
675 This documentation is free; you can redistribute it and/or modify it
676 under the same terms as Perl itself.
678 Irrespective of its distribution, all code examples in this file
679 are hereby placed into the public domain. You are permitted and
680 encouraged to use this code in your own programs for fun
681 or for profit as you see fit. A simple comment in the code giving
682 credit would be courteous but is not required.