=head1 NAME
-perlfaq9 - Networking ($Revision: 1.16 $, $Date: 1997/04/23 18:12:06 $)
+perlfaq9 - Networking ($Revision: 1.26 $, $Date: 1999/05/23 16:08:30 $)
=head1 DESCRIPTION
This section deals with questions related to networking, the internet,
and a few on the web.
-=head2 My CGI script runs from the command line but not the browser. Can you help me fix it?
+=head2 My CGI script runs from the command line but not the browser. (500 Server Error)
-Sure, but you probably can't afford our contracting rates :-)
+If you can demonstrate that you've read the following FAQs and that
+your problem isn't something simple that can be easily answered, you'll
+probably receive a courteous and useful reply to your question if you
+post it on comp.infosystems.www.authoring.cgi (if it's something to do
+with HTTP, HTML, or the CGI protocols). Questions that appear to be Perl
+questions but are really CGI ones that are posted to comp.lang.perl.misc
+may not be so well received.
-Seriously, if you can demonstrate that you've read the following FAQs
-and that your problem isn't something simple that can be easily
-answered, you'll probably receive a courteous and useful reply to your
-question if you post it on comp.infosystems.www.authoring.cgi (if it's
-something to do with HTTP, HTML, or the CGI protocols). Questions that
-appear to be Perl questions but are really CGI ones that are posted to
-comp.lang.perl.misc may not be so well received.
+The useful FAQs and related documents are:
-The useful FAQs are:
+ CGI FAQ
+ http://www.webthing.com/tutorials/cgifaq.html
- http://www.perl.com/perl/faq/idiots-guide.html
- http://www3.pair.com/webthing/docs/cgi/faqs/cgifaq.shtml
- http://www.perl.com/perl/faq/perl-cgi-faq.html
- http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html
- http://www.boutell.com/faq/
+ Web FAQ
+ http://www.boutell.com/faq/
+
+ WWW Security FAQ
+ http://www.w3.org/Security/Faq/
+
+ HTTP Spec
+ http://www.w3.org/pub/WWW/Protocols/HTTP/
+
+ HTML Spec
+ http://www.w3.org/TR/REC-html40/
+ http://www.w3.org/pub/WWW/MarkUp/
+
+ CGI Spec
+ http://www.w3.org/CGI/
+
+ CGI Security FAQ
+ http://www.go2net.com/people/paulp/cgi-security/safe-cgi.txt
+
+=head2 How can I get better error messages from a CGI program?
+
+Use the CGI::Carp module. It replaces C<warn> and C<die>, plus the
+normal Carp modules C<carp>, C<croak>, and C<confess> functions with
+more verbose and safer versions. It still sends them to the normal
+server error log.
+
+ use CGI::Carp;
+ warn "This is a complaint";
+ die "But this one is serious";
+
+The following use of CGI::Carp also redirects errors to a file of your choice,
+placed in a BEGIN block to catch compile-time warnings as well:
+
+ BEGIN {
+ use CGI::Carp qw(carpout);
+ open(LOG, ">>/var/local/cgi-logs/mycgi-log")
+ or die "Unable to append to mycgi-log: $!\n";
+ carpout(*LOG);
+ }
+
+You can even arrange for fatal errors to go back to the client browser,
+which is nice for your own debugging, but might confuse the end user.
+
+ use CGI::Carp qw(fatalsToBrowser);
+ die "Bad error here";
+
+Even if the error happens before you get the HTTP header out, the module
+will try to take care of this to avoid the dreaded server 500 errors.
+Normal warnings still go out to the server error log (or wherever
+you've sent them with C<carpout>) with the application name and date
+stamp prepended.
=head2 How do I remove HTML from a string?
-The most correct way (albeit not the fastest) is to use HTML::Parse
-from CPAN (part of the libwww-perl distribution, which is a must-have
-module for all web hackers).
+The most correct way (albeit not the fastest) is to use HTML::Parser
+from CPAN. Another mostly correct
+way is to use HTML::FormatText which not only removes HTML but also
+attempts to do a little simple formatting of the resulting plain text.
Many folks attempt a simple-minded regular expression approach, like
-C<s/E<lt>.*?E<gt>//g>, but that fails in many cases because the tags
+C<< s/<.*?>//g >>, but that fails in many cases because the tags
may continue over line breaks, they may contain quoted angle-brackets,
or HTML comment may be present. Plus folks forget to convert
entities, like C<<> for example.
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/striphtml.gz
.
+Here are some tricky cases that you should think about when picking
+a solution:
+
+ <IMG SRC = "foo.gif" ALT = "A > B">
+
+ <IMG SRC = "foo.gif"
+ ALT = "A > B">
+
+ <!-- <A comment> -->
+
+ <script>if (a<b && a>c)</script>
+
+ <# Just data #>
+
+ <![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
+
+If HTML comments include other tags, those solutions would also break
+on text like this:
+
+ <!-- This section commented out.
+ <B>You can't see me!</B>
+ -->
+
=head2 How do I extract URLs?
A quick but imperfect approach is
}gsix;
This version does not adjust relative URLs, understand alternate
-bases, deal with HTML comments, deal with HREF and NAME attributes in
-the same tag, or accept URLs themselves as arguments. It also runs
-about 100x faster than a more "complete" solution using the LWP suite
-of modules, such as the
-http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz
-program.
+bases, deal with HTML comments, deal with HREF and NAME attributes
+in the same tag, understand extra qualifiers like TARGET, or accept
+URLs themselves as arguments. It also runs about 100x faster than a
+more "complete" solution using the LWP suite of modules, such as the
+http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/xurl.gz program.
=head2 How do I download a file from the user's machine? How do I open a file on another machine?
=head2 How do I make a pop-up menu in HTML?
-Use the B<E<lt>SELECTE<gt>> and B<E<lt>OPTIONE<gt>> tags. The CGI.pm
+Use the B<< <SELECT> >> and B<< <OPTION> >> tags. The CGI.pm
module (available from CPAN) supports this widget, as well as many
others, including some that it cleverly synthesizes on its own.
$html_code = `lynx -source $url`;
$text_data = `lynx -dump $url`;
-The libwww-perl (LWP) modules from CPAN provide a more powerful way to
-do this. They work through proxies, and don't require lynx:
+The libwww-perl (LWP) modules from CPAN provide a more powerful way
+to do this. They don't require lynx, but like lynx, can still work
+through proxies:
- # print HTML from a URL
+ # simplest version
use LWP::Simple;
- getprint "http://www.sn.no/libwww-perl/";
+ $content = get($URL);
- # print ASCII from HTML from a URL
+ # or print HTML from a URL
use LWP::Simple;
- use HTML::Parse;
+ getprint "http://www.linpro.no/lwp/";
+
+ # or print ASCII from HTML from a URL
+ # also need HTML-Tree package from CPAN
+ use LWP::Simple;
+ use HTML::Parser;
use HTML::FormatText;
my ($html, $ascii);
$html = get("http://www.perl.com/");
$ascii = HTML::FormatText->new->format(parse_html($html));
print $ascii;
-=head2 how do I decode or create those %-encodings on the web?
+=head2 How do I automate an HTML form submission?
+
+If you're submitting values using the GET method, create a URL and encode
+the form using the C<query_form> method:
+
+ use LWP::Simple;
+ use URI::URL;
+
+ my $url = url('http://www.perl.com/cgi-bin/cpan_mod');
+ $url->query_form(module => 'DB_File', readme => 1);
+ $content = get($url);
+
+If you're using the POST method, create your own user agent and encode
+the content appropriately.
+
+ use HTTP::Request::Common qw(POST);
+ use LWP::UserAgent;
+
+ $ua = LWP::UserAgent->new();
+ my $req = POST 'http://www.perl.com/cgi-bin/cpan_mod',
+ [ module => 'DB_File', readme => 1 ];
+ $content = $ua->request($req)->as_string;
+
+=head2 How do I decode or create those %-encodings on the web?
Here's an example of decoding:
It's important that characters with special meaning like C</> and C<?>
I<not> be translated. Probably the easiest way to get this right is
to avoid reinventing the wheel and just use the URI::Escape module,
-which is part of the libwww-perl package (LWP) available from CPAN.
+available from CPAN.
=head2 How do I redirect to another page?
Note that relative URLs in these headers can cause strange effects
because of "optimizations" that servers do.
+ $url = "http://www.perl.com/CPAN/";
+ print "Location: $url\n\n";
+ exit;
+
+To target a particular frame in a frameset, include the "Window-target:"
+in the header.
+
+ print <<EOF;
+ Location: http://www.domain.com/newpage
+ Window-target: <FrameName>
+
+ EOF
+
+To be correct to the spec, each of those virtual newlines should really be
+physical C<"\015\012"> sequences by the time you hit the client browser.
+Except for NPH scripts, though, that local newline should get translated
+by your server into standard form, so you shouldn't have a problem
+here, even if you are stuck on MacOS. Everybody else probably won't
+even notice.
+
=head2 How do I put a password on my web pages?
That depends. You'll need to read the documentation for your web
single-argument form of system() or exec(). Instead, supply the
command and arguments as a list, which prevents shell globbing.
-=head2 How do I parse an email header?
+=head2 How do I parse a mail header?
For a quick-and-dirty solution, try this solution derived
from page 222 of the 2nd edition of "Programming Perl":
=head2 How do I decode a CGI form?
-A lot of people are tempted to code this up themselves, so you've
-probably all seen a lot of code involving C<$ENV{CONTENT_LENGTH}> and
-C<$ENV{QUERY_STRING}>. It's true that this can work, but there are
-also a lot of versions of this floating around that are quite simply
-broken!
-
-Please do not be tempted to reinvent the wheel. Instead, use the
-CGI.pm or CGI_Lite.pm (available from CPAN), or if you're trapped in
-the module-free land of perl1 .. perl4, you might look into cgi-lib.pl
-(available from http://www.bio.cam.ac.uk/web/form.html).
-
-=head2 How do I check a valid email address?
-
-You can't.
-
-Without sending mail to the address and seeing whether it bounces (and
-even then you face the halting problem), you cannot determine whether
-an email address is valid. Even if you apply the email header
-standard, you can have problems, because there are deliverable
-addresses that aren't RFC-822 (the mail header standard) compliant,
-and addresses that aren't deliverable which are compliant.
-
-Many are tempted to try to eliminate many frequently-invalid email
-addresses with a simple regexp, such as
-C</^[\w.-]+\@([\w.-]\.)+\w+$/>. However, this also throws out many
-valid ones, and says nothing about potential deliverability, so is not
-suggested. Instead, see
+You use a standard module, probably CGI.pm. Under no circumstances
+should you attempt to do so by hand!
+
+You'll see a lot of CGI programs that blindly read from STDIN the number
+of bytes equal to CONTENT_LENGTH for POSTs, or grab QUERY_STRING for
+decoding GETs. These programs are very poorly written. They only work
+sometimes. They typically forget to check the return value of the read()
+system call, which is a cardinal sin. They don't handle HEAD requests.
+They don't handle multipart forms used for file uploads. They don't deal
+with GET/POST combinations where query fields are in more than one place.
+They don't deal with keywords in the query string.
+
+In short, they're bad hacks. Resist them at all costs. Please do not be
+tempted to reinvent the wheel. Instead, use the CGI.pm or CGI_Lite.pm
+(available from CPAN), or if you're trapped in the module-free land
+of perl1 .. perl4, you might look into cgi-lib.pl (available from
+http://cgi-lib.stanford.edu/cgi-lib/ ).
+
+Make sure you know whether to use a GET or a POST in your form.
+GETs should only be used for something that doesn't update the server.
+Otherwise you can get mangled databases and repeated feedback mail
+messages. The fancy word for this is ``idempotency''. This simply
+means that there should be no difference between making a GET request
+for a particular URL once or multiple times. This is because the
+HTTP protocol definition says that a GET request may be cached by the
+browser, or server, or an intervening proxy. POST requests cannot be
+cached, because each request is independent and matters. Typically,
+POST requests change or depend on state on the server (query or update
+a database, send mail, or purchase a computer).
+
+=head2 How do I check a valid mail address?
+
+You can't, at least, not in real time. Bummer, eh?
+
+Without sending mail to the address and seeing whether there's a human
+on the other hand to answer you, you cannot determine whether a mail
+address is valid. Even if you apply the mail header standard, you
+can have problems, because there are deliverable addresses that aren't
+RFC-822 (the mail header standard) compliant, and addresses that aren't
+deliverable which are compliant.
+
+Many are tempted to try to eliminate many frequently-invalid
+mail addresses with a simple regex, such as
+C</^[\w.-]+\@([\w.-]\.)+\w+$/>. It's a very bad idea. However,
+this also throws out many valid ones, and says nothing about
+potential deliverability, so is not suggested. Instead, see
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz ,
which actually checks against the full RFC spec (except for nested
-comments), looks for addresses you may not wish to accept email to
+comments), looks for addresses you may not wish to accept mail to
(say, Bill Clinton or your postmaster), and then makes sure that the
-hostname given can be looked up in DNS. It's not fast, but it works.
+hostname given can be looked up in the DNS MX records. It's not fast,
+but it works for what it tries to do.
+
+Our best advice for verifying a person's mail address is to have them
+enter their address twice, just as you normally do to change a password.
+This usually weeds out typos. If both versions match, send
+mail to that address with a personal message that looks somewhat like:
+
+ Dear someuser@host.com,
-Here's an alternative strategy used by many CGI script authors: Check
-the email address with a simple regexp (such as the one above). If
-the regexp matched the address, accept the address. If the regexp
-didn't match the address, request confirmation from the user that the
-email address they entered was correct.
+ Please confirm the mail address you gave us Wed May 6 09:38:41
+ MDT 1998 by replying to this message. Include the string
+ "Rumpelstiltskin" in that reply, but spelled in reverse; that is,
+ start with "Nik...". Once this is done, your confirmed address will
+ be entered into our records.
+
+If you get the message back and they've followed your directions,
+you can be reasonably assured that it's real.
+
+A related strategy that's less open to forgery is to give them a PIN
+(personal ID number). Record the address and PIN (best that it be a
+random one) for later processing. In the mail you send, ask them to
+include the PIN in their reply. But if it bounces, or the message is
+included via a ``vacation'' script, it'll be there anyway. So it's
+best to ask them to mail back a slight alteration of the PIN, such as
+with the characters reversed, one added or subtracted to each digit, etc.
=head2 How do I decode a MIME/BASE64 string?
$len = pack("c", 32 + 0.75*length); # compute length byte
print unpack("u", $len . $_); # uudecode and print
-=head2 How do I return the user's email address?
+=head2 How do I return the user's mail address?
-On systems that support getpwuid, the $E<lt> variable and the
+On systems that support getpwuid, the $< variable and the
Sys::Hostname module (which is part of the standard perl distribution),
you can probably try using something like this:
use Sys::Hostname;
- $address = sprintf('%s@%s', getpwuid($<), hostname);
+ $address = sprintf('%s@%s', scalar getpwuid($<), hostname);
-Company policies on email address can mean that this generates addresses
-that the company's email system will not accept, so you should ask for
-users' email addresses when this matters. Furthermore, not all systems
+Company policies on mail address can mean that this generates addresses
+that the company's mail system will not accept, so you should ask for
+users' mail addresses when this matters. Furthermore, not all systems
on which Perl runs are so forthcoming with this information as is Unix.
The Mail::Util module from CPAN (part of the MailTools package) provides a
given when the module was installed, but it could still be incorrect.
Again, the best way is often just to ask the user.
-=head2 How do I send/read mail?
-
-Sending mail: the Mail::Mailer module from CPAN (part of the MailTools
-package) is UNIX-centric, while Mail::Internet uses Net::SMTP which is
-not UNIX-centric. Reading mail: use the Mail::Folder module from CPAN
-(part of the MailFolder package) or the Mail::Internet module from
-CPAN (also part of the MailTools package).
-
- # sending mail
- use Mail::Internet;
- use Mail::Header;
- # say which mail host to use
- $ENV{SMTPHOSTS} = 'mail.frii.com';
- # create headers
- $header = new Mail::Header;
- $header->add('From', 'gnat@frii.com');
- $header->add('Subject', 'Testing');
- $header->add('To', 'gnat@frii.com');
- # create body
- $body = 'This is a test, ignore';
- # create mail object
- $mail = new Mail::Internet(undef, Header => $header, Body => \[$body]);
- # send it
- $mail->smtpsend or die;
+=head2 How do I send mail?
+
+Use the C<sendmail> program directly:
+
+ open(SENDMAIL, "|/usr/lib/sendmail -oi -t -odq")
+ or die "Can't fork for sendmail: $!\n";
+ print SENDMAIL <<"EOF";
+ From: User Originating Mail <me\@host>
+ To: Final Destination <you\@otherhost>
+ Subject: A relevant subject line
+
+ Body of the message goes here after the blank line
+ in as many lines as you like.
+ EOF
+ close(SENDMAIL) or warn "sendmail didn't close nicely";
+
+The B<-oi> option prevents sendmail from interpreting a line consisting
+of a single dot as "end of message". The B<-t> option says to use the
+headers to decide who to send the message to, and B<-odq> says to put
+the message into the queue. This last option means your message won't
+be immediately delivered, so leave it out if you want immediate
+delivery.
+
+Alternate, less convenient approaches include calling mail (sometimes
+called mailx) directly or simply opening up port 25 have having an
+intimate conversation between just you and the remote SMTP daemon,
+probably sendmail.
+
+Or you might be able use the CPAN module Mail::Mailer:
+
+ use Mail::Mailer;
+
+ $mailer = Mail::Mailer->new();
+ $mailer->open({ From => $from_address,
+ To => $to_address,
+ Subject => $subject,
+ })
+ or die "Can't open: $!\n";
+ print $mailer $body;
+ $mailer->close();
+
+The Mail::Internet module uses Net::SMTP which is less Unix-centric than
+Mail::Mailer, but less reliable. Avoid raw SMTP commands. There
+are many reasons to use a mail transport agent like sendmail. These
+include queueing, MX records, and security.
+
+=head2 How do I read mail?
+
+While you could use the Mail::Folder module from CPAN (part of the
+MailFolder package) or the Mail::Internet module from CPAN (also part
+of the MailTools package), often a module is overkill, though. Here's a
+mail sorter.
+
+ #!/usr/bin/perl
+ # bysub1 - simple sort by subject
+ my(@msgs, @sub);
+ my $msgno = -1;
+ $/ = ''; # paragraph reads
+ while (<>) {
+ if (/^From/m) {
+ /^Subject:\s*(?:Re:\s*)*(.*)/mi;
+ $sub[++$msgno] = lc($1) || '';
+ }
+ $msgs[$msgno] .= $_;
+ }
+ for my $i (sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msgs)) {
+ print $msgs[$i];
+ }
+
+Or more succinctly,
+
+ #!/usr/bin/perl -n00
+ # bysub2 - awkish sort-by-subject
+ BEGIN { $msgno = -1 }
+ $sub[++$msgno] = (/^Subject:\s*(?:Re:\s*)*(.*)/mi)[0] if /^From/m;
+ $msg[$msgno] .= $_;
+ END { print @msg[ sort { $sub[$a] cmp $sub[$b] || $a <=> $b } (0 .. $#msg) ] }
=head2 How do I find out my hostname/domainname/IP address?
-A lot of code has historically cavalierly called the C<`hostname`>
-program. While sometimes expedient, this isn't very portable. It's
-one of those tradeoffs of convenience versus portability.
+The normal way to find your own hostname is to call the C<`hostname`>
+program. While sometimes expedient, this has some problems, such as
+not knowing whether you've got the canonical name or not. It's one of
+those tradeoffs of convenience versus portability.
The Sys::Hostname module (part of the standard perl distribution) will
give you the hostname after which you can find out the IP address
use Socket;
use Sys::Hostname;
my $host = hostname();
- my $addr = inet_ntoa(scalar(gethostbyname($name)) || 'localhost');
+ my $addr = inet_ntoa(scalar gethostbyname($host || 'localhost'));
Probably the simplest way to learn your DNS domain name is to grok
it out of /etc/resolv.conf, at least under Unix. Of course, this
A DCE::RPC module is being developed (but is not yet available), and
will be released as part of the DCE-Perl package (available from
-CPAN). No ONC::RPC module is known.
+CPAN). The rpcgen suite, available from CPAN/authors/id/JAKE/, is
+an RPC stub generator and includes an RPC::ONC module.
=head1 AUTHOR AND COPYRIGHT
-Copyright (c) 1997 Tom Christiansen and Nathan Torkington.
-All rights reserved. See L<perlfaq> for distribution information.
-
+Copyright (c) 1997-1999 Tom Christiansen and Nathan Torkington.
+All rights reserved.
+
+When included as part of the Standard Version of Perl, or as part of
+its complete documentation whether printed or otherwise, this work
+may be distributed only under the terms of Perl's Artistic License.
+Any distribution of this file or derivatives thereof I<outside>
+of that package require that special arrangements be made with
+copyright holder.
+
+Irrespective of its distribution, all code examples in this file
+are hereby placed into the public domain. You are permitted and
+encouraged to use this code in your own programs for fun
+or for profit as you see fit. A simple comment in the code giving
+credit would be courteous but is not required.