Commit | Line | Data |
bfe16a1a |
1 | =head1 NAME |
2 | |
3 | perlintro -- a brief introduction and overview of Perl |
4 | |
5 | =head1 DESCRIPTION |
6 | |
7 | This document is intended to give you a quick overview of the Perl |
8 | programming language, along with pointers to further documentation. It |
9 | is intended as a "bootstrap" guide for those who are new to the |
10 | language, and provides just enough information for you to be able to |
11 | read other peoples' Perl and understand roughly what it's doing, or |
12 | write your own simple scripts. |
13 | |
14 | This introductory document does not aim to be complete. It does not |
15 | even aim to be entirely accurate. In some cases perfection has been |
16 | sacrificed in the goal of getting the general idea across. You are |
98fcdafd |
17 | I<strongly> advised to follow this introduction with more information |
bfe16a1a |
18 | from the full Perl manual, the table of contents to which can be found |
19 | in L<perltoc>. |
20 | |
21 | Throughout this document you'll see references to other parts of the |
22 | Perl documentation. You can read that documentation using the C<perldoc> |
98fcdafd |
23 | command or whatever method you're using to read this document. |
bfe16a1a |
24 | |
25 | =head2 What is Perl? |
26 | |
27 | Perl is a general-purpose programming language originally developed for |
28 | text manipulation and now used for a wide range of tasks including |
29 | system administration, web development, network programming, GUI |
30 | development, and more. |
31 | |
98fcdafd |
32 | The language is intended to be practical (easy to use, efficient, |
33 | complete) rather than beautiful (tiny, elegant, minimal). Its major |
34 | features are that it's easy to use, supports both procedural and |
35 | object-oriented (OO) programming, has powerful built-in support for text |
36 | processing, and has one of the world's most impressive collections of |
37 | third-party modules. |
bfe16a1a |
38 | |
39 | Different definitions of Perl are given in L<perl>, L<perlfaq1> and |
40 | no doubt other places. From this we can determine that Perl is different |
41 | things to different people, but that lots of people think it's at least |
42 | worth writing about. |
43 | |
44 | =head2 Running Perl programs |
45 | |
46 | To run a Perl program from the Unix command line: |
47 | |
48 | perl progname.pl |
49 | |
50 | Alternatively, put this as the first line of your script: |
51 | |
52 | #!/usr/bin/env perl |
53 | |
54 | ... and run the script as C</path/to/script.pl>. Of course, it'll need |
55 | to be executable first, so C<chmod 755 script.pl> (under Unix). |
56 | |
57 | For more information, including instructions for other platforms such as |
58 | Windows and MacOS, read L<perlrun>. |
59 | |
60 | =head2 Basic syntax overview |
61 | |
62 | A Perl script or program consists of one or more statements. These |
63 | statements are simply written in the script in a straightforward |
98fcdafd |
64 | fashion. There is no need to have a C<main()> function or anything of |
65 | that kind. |
bfe16a1a |
66 | |
67 | Perl statements end in a semi-colon: |
68 | |
69 | print "Hello, world"; |
70 | |
71 | Comments start with a hash symbol and run to the end of the line |
72 | |
73 | # This is a comment |
74 | |
75 | Whitespace is irrelevant: |
76 | |
77 | print |
78 | "Hello, world" |
79 | ; |
80 | |
81 | ... except inside quoted strings: |
82 | |
83 | # this would print with a linebreak in the middle |
84 | print "Hello |
85 | world"; |
86 | |
87 | Double quotes or single quotes may be used around literal strings: |
88 | |
89 | print "Hello, world"; |
90 | print 'Hello, world'; |
91 | |
92 | However, only double quotes "interpolate" variables and special |
93 | characters such as newlines (C<\n>): |
94 | |
95 | print "Hello, $name\n"; # works fine |
96 | print 'Hello, $name\n'; # prints $name\n literally |
97 | |
98 | Numbers don't need quotes around them: |
99 | |
100 | print 42; |
101 | |
102 | You can use parentheses for functions' arguments or omit them |
103 | according to your personal taste. They are only required |
104 | occasionally to clarify issues of precedence. |
105 | |
106 | print("Hello, world\n"); |
107 | print "Hello, world\n"; |
108 | |
109 | More detailed information about Perl syntax can be found in L<perlsyn>. |
110 | |
111 | =head2 Perl variable types |
112 | |
113 | Perl has three main variable types: scalars, arrays, and hashes. |
114 | |
115 | =over 4 |
116 | |
117 | =item Scalars |
118 | |
119 | A scalar represents a single value: |
120 | |
121 | my $animal = "camel"; |
122 | my $answer = 42; |
123 | |
124 | Scalar values can be strings, integers or floating point numbers, and Perl |
125 | will automatically convert between them as required. There is no need |
126 | to pre-declare your variable types. |
127 | |
128 | Scalar values can be used in various ways: |
129 | |
130 | print $animal; |
131 | print "The animal is $animal\n"; |
132 | print "The square of $answer is ", $answer * $answer, "\n"; |
133 | |
134 | There are a number of "magic" scalars with names that look like |
135 | punctuation or line noise. These special variables are used for all |
136 | kinds of purposes, and are documented in L<perlvar>. The only one you |
137 | need to know about for now is C<$_> which is the "default variable". |
138 | It's used as the default argument to a number of functions in Perl, and |
139 | it's set implicitly by certain looping constructs. |
140 | |
141 | print; # prints contents of $_ by default |
142 | |
143 | =item Arrays |
144 | |
145 | An array represents a list of values: |
146 | |
147 | my @animals = ("camel", "llama", "owl"); |
148 | my @numbers = (23, 42, 69); |
149 | my @mixed = ("camel", 42, 1.23); |
150 | |
151 | Arrays are zero-indexed. Here's how you get at elements in an array: |
152 | |
153 | print $animals[0]; # prints "camel" |
154 | print $animals[1]; # prints "llama" |
155 | |
156 | The special variable C<$#array> tells you the index of the last element |
157 | of an array: |
158 | |
159 | print $mixed[$#mixed]; # last element, prints 1.23 |
160 | |
161 | You might be tempted to use C<$#array + 1> to tell you how many items there |
162 | are in an array. Don't bother. As it happens, using C<@array> where Perl |
163 | expects to find a scalar value ("in scalar context") will give you the number |
164 | of elements in the array: |
165 | |
166 | if (@animals < 5) { ... } |
167 | |
168 | The elements we're getting from the array start with a C<$> because |
169 | we're getting just a single value out of the array -- you ask for a scalar, |
170 | you get a scalar. |
171 | |
d1be9408 |
172 | To get multiple values from an array: |
bfe16a1a |
173 | |
174 | @animals[0,1]; # gives ("camel", "llama"); |
175 | @animals[0..2]; # gives ("camel", "llama", "owl"); |
176 | @animals[1..$#animals]; # gives all except the first element |
177 | |
178 | This is called an "array slice". |
179 | |
180 | You can do various useful things to lists: |
181 | |
182 | my @sorted = sort @animals; |
183 | my @backwards = reverse @numbers; |
184 | |
185 | There are a couple of special arrays too, such as C<@ARGV> (the command |
186 | line arguments to your script) and C<@_> (the arguments passed to a |
187 | subroutine). These are documented in L<perlvar>. |
188 | |
189 | =item Hashes |
190 | |
191 | A hash represents a set of key/value pairs: |
192 | |
193 | my %fruit_color = ("apple", "red", "banana", "yellow"); |
194 | |
195 | You can use whitespace and the C<< => >> operator to lay them out more |
196 | nicely: |
197 | |
198 | my %fruit_color = ( |
199 | apple => "red", |
200 | banana => "yellow", |
201 | ); |
202 | |
203 | To get at hash elements: |
204 | |
205 | $fruit_color{"apple"}; # gives "red" |
206 | |
207 | You can get at lists of keys and values with C<keys()> and |
208 | C<values()>. |
209 | |
210 | my @fruits = keys %fruit_colors; |
211 | my @colors = values %fruit_colors; |
212 | |
213 | Hashes have no particular internal order, though you can sort the keys |
214 | and loop through them. |
215 | |
216 | Just like special scalars and arrays, there are also special hashes. |
217 | The most well known of these is C<%ENV> which contains environment |
218 | variables. Read all about it (and other special variables) in |
219 | L<perlvar>. |
220 | |
221 | =back |
222 | |
223 | Scalars, arrays and hashes are documented more fully in L<perldata>. |
224 | |
225 | More complex data types can be constructed using references, which allow |
226 | you to build lists and hashes within lists and hashes. |
227 | |
228 | A reference is a scalar value and can refer to any other Perl data |
229 | type. So by storing a reference as the value of an array or hash |
230 | element, you can easily create lists and hashes within lists and |
231 | hashes. The following example shows a 2 level hash of hash |
232 | structure using anonymous hash references. |
233 | |
234 | my $variables = { |
235 | scalar => { |
236 | description => "single item", |
237 | sigil => '$', |
238 | }, |
239 | array => { |
240 | description => "ordered list of items", |
241 | sigil => '@', |
242 | }, |
243 | hash => { |
244 | description => "key/value pairs", |
245 | sigil => '%', |
246 | }, |
247 | }; |
248 | |
249 | print "Scalars begin with a $variables->{'scalar'}->{'sigil'}\n"; |
250 | |
251 | Exhaustive information on the topic of references can be found in |
252 | L<perlreftut>, L<perllol>, L<perlref> and L<perldsc>. |
253 | |
254 | =head2 Variable scoping |
255 | |
256 | Throughout the previous section all the examples have used the syntax: |
257 | |
258 | my $var = "value"; |
259 | |
260 | The C<my> is actually not required; you could just use: |
261 | |
262 | $var = "value"; |
263 | |
264 | However, the above usage will create global variables throughout your |
265 | program, which is bad programming practice. C<my> creates lexically |
266 | scoped variables instead. The variables are scoped to the block |
267 | (i.e. a bunch of statements surrounded by curly-braces) in which they |
268 | are defined. |
269 | |
270 | my $a = "foo"; |
271 | if ($some_condition) { |
272 | my $b = "bar"; |
273 | print $a; # prints "foo" |
274 | print $b; # prints "bar" |
275 | } |
276 | print $a; # prints "foo" |
277 | print $b; # prints nothing; $b has fallen out of scope |
278 | |
279 | Using C<my> in combination with a C<use strict;> at the top of |
280 | your Perl scripts means that the interpreter will pick up certain common |
281 | programming errors. For instance, in the example above, the final |
282 | C<print $b> would cause a compile-time error and prevent you from |
283 | running the program. Using C<strict> is highly recommended. |
284 | |
285 | =head2 Conditional and looping constructs |
286 | |
287 | Perl has most of the usual conditional and looping constructs except for |
98fcdafd |
288 | case/switch (but if you really want it, there is a Switch module in Perl |
289 | 5.8 and newer, and on CPAN. See the section on modules, below, for more |
290 | information about modules and CPAN). |
bfe16a1a |
291 | |
292 | The conditions can be any Perl expression. See the list of operators in |
293 | the next section for information on comparison and boolean logic operators, |
294 | which are commonly used in conditional statements. |
295 | |
296 | =over 4 |
297 | |
298 | =item if |
299 | |
300 | if ( condition ) { |
301 | ... |
302 | } elsif ( other condition ) { |
303 | ... |
304 | } else { |
305 | ... |
306 | } |
307 | |
308 | There's also a negated version of it: |
309 | |
310 | unless ( condition ) { |
311 | ... |
312 | } |
313 | |
2cd1776c |
314 | This is provided as a more readable version of C<if (!I<condition>)>. |
bfe16a1a |
315 | |
316 | Note that the braces are required in Perl, even if you've only got one |
317 | line in the block. However, there is a clever way of making your one-line |
318 | conditional blocks more English like: |
319 | |
320 | # the traditional way |
321 | if ($zippy) { |
322 | print "Yow!"; |
323 | } |
324 | |
325 | # the Perlish post-condition way |
326 | print "Yow!" if $zippy; |
327 | print "We have no bananas" unless $bananas; |
328 | |
329 | =item while |
330 | |
331 | while ( condition ) { |
332 | ... |
333 | } |
334 | |
335 | There's also a negated version, for the same reason we have C<unless>: |
336 | |
337 | until ( condition ) { |
338 | ... |
339 | } |
340 | |
341 | You can also use C<while> in a post-condition: |
342 | |
343 | print "LA LA LA\n" while 1; # loops forever |
344 | |
345 | =item for |
346 | |
347 | Exactly like C: |
348 | |
349 | for ($i=0; $i <= $max; $i++) { |
350 | ... |
351 | } |
352 | |
353 | The C style for loop is rarely needed in Perl since Perl provides |
da75cd15 |
354 | the more friendly list scanning C<foreach> loop. |
bfe16a1a |
355 | |
356 | =item foreach |
357 | |
358 | foreach (@array) { |
359 | print "This element is $_\n"; |
360 | } |
361 | |
362 | # you don't have to use the default $_ either... |
363 | foreach my $key (keys %hash) { |
364 | print "The value of $key is $hash{$key}\n"; |
365 | } |
366 | |
367 | =back |
368 | |
369 | For more detail on looping constructs (and some that weren't mentioned in |
370 | this overview) see L<perlsyn>. |
371 | |
372 | =head2 Builtin operators and functions |
373 | |
374 | Perl comes with a wide selection of builtin functions. Some of the ones |
375 | we've already seen include C<print>, C<sort> and C<reverse>. A list of |
376 | them is given at the start of L<perlfunc> and you can easily read |
2cd1776c |
377 | about any given function by using C<perldoc -f I<functionname>>. |
bfe16a1a |
378 | |
379 | Perl operators are documented in full in L<perlop>, but here are a few |
380 | of the most common ones: |
381 | |
382 | =over 4 |
383 | |
384 | =item Arithmetic |
385 | |
386 | + addition |
387 | - subtraction |
388 | * multiplication |
389 | / division |
390 | |
391 | =item Numeric comparison |
392 | |
393 | == equality |
394 | != inequality |
395 | < less than |
396 | > greater than |
397 | <= less than or equal |
398 | >= greater than or equal |
399 | |
400 | =item String comparison |
401 | |
402 | eq equality |
403 | ne inequality |
404 | lt less than |
405 | gt greater than |
406 | le less than or equal |
407 | ge greater than or equal |
408 | |
409 | (Why do we have separate numeric and string comparisons? Because we don't |
410 | have special variable types, and Perl needs to know whether to sort |
411 | numerically (where 99 is less than 100) or alphabetically (where 100 comes |
412 | before 99). |
413 | |
414 | =item Boolean logic |
415 | |
416 | && and |
417 | || or |
418 | ! not |
419 | |
420 | (C<and>, C<or> and C<not> aren't just in the above table as descriptions |
421 | of the operators -- they're also supported as operators in their own |
422 | right. They're more readable than the C-style operators, but have |
423 | different precedence to C<&&> and friends. Check L<perlop> for more |
424 | detail.) |
425 | |
426 | =item Miscellaneous |
427 | |
428 | = assignment |
429 | . string concatenation |
430 | x string multiplication |
431 | .. range operator (creates a list of numbers) |
432 | |
433 | =back |
434 | |
435 | Many operators can be combined with a C<=> as follows: |
436 | |
437 | $a += 1; # same as $a = $a + 1 |
438 | $a -= 1; # same as $a = $a - 1 |
439 | $a .= "\n"; # same as $a = $a . "\n"; |
440 | |
441 | =head2 Files and I/O |
442 | |
443 | You can open a file for input or output using the C<open()> function. |
444 | It's documented in extravagant detail in L<perlfunc> and L<perlopentut>, |
445 | but in short: |
446 | |
447 | open(INFILE, "input.txt") or die "Can't open input.txt: $!"; |
448 | open(OUTFILE, ">output.txt") or die "Can't open output.txt: $!"; |
449 | open(LOGFILE, ">>my.log") or die "Can't open logfile: $!"; |
450 | |
451 | You can read from an open filehandle using the C<< <> >> operator. In |
452 | scalar context it reads a single line from the filehandle, and in list |
453 | context it reads the whole file in, assigning each line to an element of |
454 | the list: |
455 | |
456 | my $line = <INFILE>; |
457 | my @lines = <INFILE>; |
458 | |
459 | Reading in the whole file at one time is called slurping. It can |
460 | be useful but it may be a memory hog. Most text file processing |
461 | can be done a line at a time with Perl's looping constructs. |
462 | |
463 | The C<< <> >> operator is most often seen in a C<while> loop: |
464 | |
465 | while (<INFILE>) { # assigns each line in turn to $_ |
466 | print "Just read in this line: $_"; |
467 | } |
468 | |
469 | We've already seen how to print to standard output using C<print()>. |
470 | However, C<print()> can also take an optional first argument specifying |
471 | which filehandle to print to: |
472 | |
473 | print STDERR "This is your final warning.\n"; |
474 | print OUTFILE $record; |
475 | print LOGFILE $logmessage; |
476 | |
477 | When you're done with your filehandles, you should C<close()> them |
478 | (though to be honest, Perl will clean up after you if you forget): |
479 | |
480 | close INFILE; |
481 | |
482 | =head2 Regular expressions |
483 | |
484 | Perl's regular expression support is both broad and deep, and is the |
485 | subject of lengthy documentation in L<perlrequick>, L<perlretut>, and |
486 | elsewhere. However, in short: |
487 | |
488 | =over 4 |
489 | |
490 | =item Simple matching |
491 | |
492 | if (/foo/) { ... } # true if $_ contains "foo" |
493 | if ($a =~ /foo/) { ... } # true if $a contains "foo" |
494 | |
495 | The C<//> matching operator is documented in L<perlop>. It operates on |
496 | C<$_> by default, or can be bound to another variable using the C<=~> |
497 | binding operator (also documented in L<perlop>). |
498 | |
499 | =item Simple substitution |
500 | |
501 | s/foo/bar/; # replaces foo with bar in $_ |
502 | $a =~ s/foo/bar/; # replaces foo with bar in $a |
503 | $a =~ s/foo/bar/g; # replaces ALL INSTANCES of foo with bar in $a |
504 | |
505 | The C<s///> substitution operator is documented in L<perlop>. |
506 | |
507 | =item More complex regular expressions |
508 | |
509 | You don't just have to match on fixed strings. In fact, you can match |
510 | on just about anything you could dream of by using more complex regular |
511 | expressions. These are documented at great length in L<perlre>, but for |
512 | the meantime, here's a quick cheat sheet: |
513 | |
514 | . a single character |
515 | \s a whitespace character (space, tab, newline) |
516 | \S non-whitespace character |
517 | \d a digit (0-9) |
518 | \D a non-digit |
519 | \w a word character (a-z, A-Z, 0-9, _) |
520 | \W a non-word character |
521 | [aeiou] matches a single character in the given set |
522 | [^aeiou] matches a single character outside the given set |
523 | (foo|bar|baz) matches any of the alternatives specified |
524 | |
525 | ^ start of string |
526 | $ end of string |
527 | |
528 | Quantifiers can be used to specify how many of the previous thing you |
529 | want to match on, where "thing" means either a literal character, one |
530 | of the metacharacters listed above, or a group of characters or |
531 | metacharacters in parentheses. |
532 | |
533 | * zero or more of the previous thing |
534 | + one or more of the previous thing |
535 | ? zero or one of the previous thing |
536 | {3} matches exactly 3 of the previous thing |
537 | {3,6} matches between 3 and 6 of the previous thing |
538 | {3,} matches 3 or more of the previous thing |
539 | |
540 | Some brief examples: |
541 | |
542 | /^\d+/ string starts with one or more digits |
543 | /^$/ nothing in the string (start and end are adjacent) |
544 | /(\d\s){3}/ a three digits, each followed by a whitespace |
545 | character (eg "3 4 5 ") |
546 | /(a.)+/ matches a string in which every odd-numbered letter |
547 | is a (eg "abacadaf") |
548 | |
549 | # This loop reads from STDIN, and prints non-blank lines: |
550 | while (<>) { |
551 | next if /^$/; |
552 | print; |
553 | } |
554 | |
555 | =item Parentheses for capturing |
556 | |
557 | As well as grouping, parentheses serve a second purpose. They can be |
558 | used to capture the results of parts of the regexp match for later use. |
559 | The results end up in C<$1>, C<$2> and so on. |
560 | |
561 | # a cheap and nasty way to break an email address up into parts |
562 | |
9086c882 |
563 | if ($email =~ /([^@])+@(.+)/) { |
bfe16a1a |
564 | print "Username is $1\n"; |
565 | print "Hostname is $2\n"; |
566 | } |
567 | |
568 | =item Other regexp features |
569 | |
570 | Perl regexps also support backreferences, lookaheads, and all kinds of |
571 | other complex details. Read all about them in L<perlrequick>, |
572 | L<perlretut>, and L<perlre>. |
573 | |
574 | =back |
575 | |
576 | =head2 Writing subroutines |
577 | |
578 | Writing subroutines is easy: |
579 | |
580 | sub log { |
581 | my $logmessage = shift; |
582 | print LOGFILE $logmessage; |
583 | } |
584 | |
585 | What's that C<shift>? Well, the arguments to a subroutine are available |
586 | to us as a special array called C<@_> (see L<perlvar> for more on that). |
587 | The default argument to the C<shift> function just happens to be C<@_>. |
588 | So C<my $logmessage = shift;> shifts the first item off the list of |
589 | arguments and assigns it to C<$logmessage>. |
590 | |
591 | We can manipulate C<@_> in other ways too: |
592 | |
593 | my ($logmessage, $priority) = @_; # common |
594 | my $logmessage = $_[0]; # uncommon, and ugly |
595 | |
596 | Subroutines can also return values: |
597 | |
598 | sub square { |
599 | my $num = shift; |
600 | my $result = $num * $num; |
601 | return $result; |
602 | } |
603 | |
604 | For more information on writing subroutines, see L<perlsub>. |
605 | |
606 | =head2 OO Perl |
607 | |
608 | OO Perl is relatively simple and is implemented using references which |
609 | know what sort of object they are based on Perl's concept of packages. |
610 | However, OO Perl is largely beyond the scope of this document. |
611 | Read L<perlboot>, L<perltoot>, L<perltooc> and L<perlobj>. |
612 | |
613 | As a beginning Perl programmer, your most common use of OO Perl will be |
614 | in using third-party modules, which are documented below. |
615 | |
616 | =head2 Using Perl modules |
617 | |
618 | Perl modules provide a range of features to help you avoid reinventing |
619 | the wheel, and can be downloaded from CPAN (http://www.cpan.org). A |
620 | number of popular modules are included with the Perl distribution |
621 | itself. |
622 | |
623 | Categories of modules range from text manipulation to network protocols |
624 | to database integration to graphics. A categorized list of modules is |
625 | also available from CPAN. |
626 | |
627 | To learn how to install modules you download from CPAN, read |
628 | L<perlmodinstall> |
629 | |
2cd1776c |
630 | To learn how to use a particular module, use C<perldoc I<Module::Name>>. |
631 | Typically you will want to C<use I<Module::Name>>, which will then give |
632 | you access to exported functions or an OO interface to the module. |
bfe16a1a |
633 | |
634 | L<perlfaq> contains questions and answers related to many common |
635 | tasks, and often provides suggestions for good CPAN modules to use. |
636 | |
637 | L<perlmod> describes Perl modules in general. L<perlmodlib> lists the |
638 | modules which came with your Perl installation. |
639 | |
640 | If you feel the urge to write Perl modules, L<perlnewmod> will give you |
641 | good advice. |
642 | |
643 | =head1 AUTHOR |
644 | |
645 | Kirrily "Skud" Robert <skud@cpan.org> |