initial commit
[urisagit/Perl-Docs.git] / extras / new_text
CommitLineData
635c7876 1Somewhere along the line, I learned about a way to slurp files faster
2than by setting $/ to undef. The method is very simple, you do a single
3read call with the size of the file (which the -s operator provides).
4This bypasses the I/O loop inside perl that checks for EOF and does all
5sorts of processing. I then decided to experiment and found that
6sysread is even faster as you would expect. sysread bypasses all of
7Perl's stdio and reads the file from the kernel buffers directly into a
8Perl scalar. This is why the slurp code in File::Slurp uses
9sysopen/sysread/syswrite. All the rest of the code is just to support
10the various options and data passing techniques.
11
12
13Benchmarks can be enlightening, informative, frustrating and
14deceiving. It would make no sense to create a new and more complex slurp
15module unless it also gained signifigantly in speed. So I created a
16benchmark script which compares various slurp methods with differing
17file sizes and calling contexts. This script can be run from the main
18directory of the tarball like this:
19
20 perl -Ilib extras/slurp_bench.pl
21
22If you pass in an argument on the command line, it will be passed to
23timethese() and it will control the duration. It defaults to -2 which
24makes each benchmark run to at least 2 seconds of cpu time.
25
26The following numbers are from a run I did on my 300Mhz sparc. You will
27most likely get much faster counts on your boxes but the relative speeds
28shouldn't change by much. If you see major differences on your
29benchmarks, please send me the results and your Perl and OS
30versions. Also you can play with the benchmark script and add more slurp
31variations or data files.
32
33The rest of this section will be discussing the results of the
34benchmarks. You can refer to extras/slurp_bench.pl to see the code for
35the individual benchmarks. If the benchmark name starts with cpan_, it
36is either from Slurp.pm or File::Slurp.pm. Those starting with new_ are
37from the new File::Slurp.pm. Those that start with file_contents_ are
38from a client's code base. The rest are variations I created to
39highlight certain aspects of the benchmarks.
40
41The short and long file data is made like this:
42
43 my @lines = ( 'abc' x 30 . "\n") x 100 ;
44 my $text = join( '', @lines ) ;
45
46 @lines = ( 'abc' x 40 . "\n") x 1000 ;
47 $text = join( '', @lines ) ;
48
49So the short file is 9,100 bytes and the long file is 121,000
50bytes.
51
52=head3 Scalar Slurp of Short File
53
54 file_contents 651/s
55 file_contents_no_OO 828/s
56 cpan_read_file 1866/s
57 cpan_slurp 1934/s
58 read_file 2079/s
59 new 2270/s
60 new_buf_ref 2403/s
61 new_scalar_ref 2415/s
62 sysread_file 2572/s
63
64=head3 Scalar Slurp of Long File
65
66 file_contents_no_OO 82.9/s
67 file_contents 85.4/s
68 cpan_read_file 250/s
69 cpan_slurp 257/s
70 read_file 323/s
71 new 468/s
72 sysread_file 489/s
73 new_scalar_ref 766/s
74 new_buf_ref 767/s
75
76The primary inference you get from looking at the mumbers above is that
77when slurping a file into a scalar, the longer the file, the more time
78you save by returning the result via a scalar reference. The time for
79the extra buffer copy can add up. The new module came out on top overall
80except for the very simple sysread_file entry which was added to
81highlight the overhead of the more flexible new module which isn't that
82much. The file_contents entries are always the worst since they do a
83list slurp and then a join, which is a classic newbie and cargo culted
84style which is extremely slow. Also the OO code in file_contents slows
85it down even more (I added the file_contents_no_OO entry to show this).
86The two CPAN modules are decent with small files but they are laggards
87compared to the new module when the file gets much larger.
88
89=head3 List Slurp of Short File
90
91 cpan_read_file 589/s
92 cpan_slurp_to_array 620/s
93 read_file 824/s
94 new_array_ref 824/s
95 sysread_file 828/s
96 new 829/s
97 new_in_anon_array 833/s
98 cpan_slurp_to_array_ref 836/s
99
100=head3 List Slurp of Long File
101
102 cpan_read_file 62.4/s
103 cpan_slurp_to_array 62.7/s
104 read_file 92.9/s
105 sysread_file 94.8/s
106 new_array_ref 95.5/s
107 new 96.2/s
108 cpan_slurp_to_array_ref 96.3/s
109 new_in_anon_array 97.2/s
110
111
112=head3 Scalar Spew of Short File
113
114 cpan_write_file 1035/s
115 print_file 1055/s
116 syswrite_file 1135/s
117 new 1519/s
118 print_join_file 1766/s
119 new_ref 1900/s
120 syswrite_file2 2138/s
121
122=head3 Scalar Spew of Long File
123
124 cpan_write_file 164/s 20
125 print_file 211/s 26
126 syswrite_file 236/s 25
127 print_join_file 277/s 2
128 new 295/s 2
129 syswrite_file2 428/s 25
130 new_ref 608/s 2
131
132
133=head3 List Spew of Short File
134
135 cpan_write_file 794/s
136 syswrite_file 1000/s
137 print_file 1013/s
138 new 1399/s
139 print_join_file 1557/s
140
141=head3 List Spew of Long File
142
143 cpan_write_file 112/s 12
144 print_file 179/s 21
145 syswrite_file 181/s 19
146 print_join_file 205/s 2
147 new 228/s 2
148