5 title: A Little Past History
7 *: TPC3 in Monterey, CA 1999
8 *: Co-wrote and presented "A Fresh Look at Efficient Perl Sorting"
10 <A href="http://www.sysarch.com/perl/sort_paper.html">http://www.sysarch.com/perl/sort_paper.html</A>
11 *: Co-author was Larry Rosler (retired from HP and Perl)
12 *: Introduced the GRT as a way to speed up Perl sorts
13 *: It won the prize for Best Technical Paper
14 *: A module Sort::Records was promised which would implement the GRT
15 *: Sort::Records was started but a code and design review showed many
17 *: Sort::Records was shelved during development because of a code review
18 *: People still write to me asking about that module
24 *: I have been thinking about this module for 5 years
25 *: Solved the major design flaw - have the user provide key extraction
27 *: A thread in perl6-language gave needed inspiration
28 *: Sort::Maker emerges
29 *: Much help in the API design and some coding tricks from Damian Conway
32 title: Introducing Sort::Maker
34 *: Easy description of sorts
35 *: Generates a sort sub
36 *: Choice of generated sort styles
37 *: Can generate the highest speed sorts in Perl
38 *: Generated source can be printed
40 **: For pasting in code to save the generation time overhead
41 *: These slides can be found at
44 <A href="http://www.stemsystems.com/sort/slides/slides/index.html">http://www.stemsystems.com/slides/slides/index.html</A><BR>
45 *: The current tarball is at
48 <A href="http://www.stemsystems.com/sort/Sort-Maker-0.01.tar.gz">http://www.stemsystems.com/sort/Sort-Maker-0.01.tar.gz</A><BR>
52 title: Sort::Maker Synopsis
54 *: Exports a single sub 'make_sorter'
55 *: Each key has its own description
56 *: Call make_sorter with sort description arguments and it returns a
58 *: Call that reference with unsorted input records
59 *: Returns sorted records
63 my $sorter = make_sorter(
71 my @sorted = $sorter->( @unsorted ) ;
75 title: Key Descriptions
77 *: Each key has a description
78 *: Keys must be in the order they will sort (higher level keys must be
80 *: The key type must be specified as 'string' or 'number'
81 *: Keys can have attributes such as 'descending' or 'case'
82 *: Attributes can have defaults set for all the keys
83 *: Default attributes can be overridden in any key description
87 title: Extraction Code
89 *: Each key needs code to extract it from a record
90 *: Each record is aliased to $_ (via map)
91 *: Key extraction code operates on $_ and gets the value for this key
92 *: The extraction code is executed in list context so m/(foo)/ works
93 *: The code is inside a do{} block so you can have multiple statements
99 *: Four different sorting styles to choose from
102 **: Schwartian Transform (ST)
103 **: Guttman-Rosler Transform (GRT)
104 *: Each has its uses and advantages
105 *: Styles are really different ways to cache extracted keys
106 *: Caching keys moves key extraction from O( N log N ) to O( N )
107 *: In larger sizes of sort sets, caching keys is a very big win
108 *: This is a classic sort of arrays of numbers
109 **: Compare this code to the generated code for the different sort styles
112 sort { $a->[0] cmp $b->[0] ||
121 *: Similar to common sorts with a code block
122 *: Good for small sort sets as there is no caching
123 *: Pass 'plain' option to make_sorter
130 do{ my( $left, $right ) = map { $_->[0] } $a, $b;
133 do{ my( $left, $right ) = map { $_->[1] } $a, $b;
141 title: Orcish Manouvre Sort
143 *: Caches extracted keys in a hash
144 *: Assignments to the hash in ||=
145 *: Called the orcish because of Or-Cache
146 *: Created by Joseph Hall
147 *: It will re-extract keys for any record with a false value
148 *: The caching hash must be cleared beforehand
149 **: Sort::Maker declares a caching hash in the anonymous sub
150 *: Hash lookups are still O( N log N )
151 *: Good for medium sort sets
152 *: Pass 'orcish' option to make_sorter
162 do{ my ($val) = map { $_->[0] } $a ; $val } )
165 do{ my ($val) = map { $_->[0] } $b ; $val } )
170 do{ my ($val) = map { $_->[1] } $a ; $val } )
173 do{ my ($val) = map { $_->[1] } $b ; $val } )
181 title: Schwartian Transform (ST) Sort
183 *: Caches extracted keys in an anonymous array for each input record
184 *: Stores the record itself in slot 0 of the array
185 *: Uses the map/sort/map idiom
186 *: Popularized by Randal Schwartz
187 *: Good for medium to large sort sets
188 *: Key extraction is O( N )
189 *: Does a full callback to Perl in the comparison block
190 *: Pass 'ST' option to make_sorter
202 do{ my ($val) = $_->[0] ; $val },
203 do{ my ($val) = $_->[1] ; $val }
209 title: Guttman-Rosler Transform (GRT) Sort
211 *: Caches extracted keys in a single string for each input record
212 *: String records can be stored in that string
213 *: Reference records have their index stored in the cache string
214 **: The sorted indices are used to slice into the input array to create
216 *: Key extraction is O( N )
217 *: The comparison code is internal and uses C with no Perl callback (big
219 *: More complex to use properly and efficiently
220 *: GRT has special key description attributes for optimization
221 **: Numbers can be integer/float, signed/unsigned
222 **: Strings can be fixed/varying
223 **: Fastest sort style for larger sets of sort records
224 *: Pass 'GRT' option to make_sorter
230 map unpack( 'N', substr( $_, -4 ) ),
233 do{ my( $val ) = $_->[0] ; ( $val ) },
234 do{ my( $val ) = $_->[1] ; ( $val ) },
242 title: Tests and Benchmarks
244 *: Test system is table driven
245 *: Scripts can do both tests and benchmarks
246 *: Provide a set of data
248 **: Generate in place with map
249 **: Generate with a anonymous sub
250 *: Provide a hand written 'golden' sort sub
251 *: Provide the arguments to make_sorter()
252 *: Many tests are in but more are wanted
253 **: Send in tests if you want
254 *: More complex tests are needed
259 name => 'arrays of multiple strings',
262 [ rand_token( 8, 20 ), rand_token( 8, 20 ), ]
265 gold => sub { $a->[0] cmp $b->[0] ||
266 $a->[1] cmp $b->[1] },
267 args => [ qw( string $_->[0] string $_->[1] ) ],