Commit | Line | Data |
7468c584 |
1 | |
2 | book: Sort::Maker |
3 | chapter: Sort::Maker |
4 | |
5 | title: A Little Past History |
6 | |
7 | *: TPC3 in Monterey, CA 1999 |
8 | *: Co-wrote and presented "A Fresh Look at Efficient Perl Sorting" |
9 | html: |
10 | <A href="http://www.sysarch.com/perl/sort_paper.html">http://www.sysarch.com/perl/sort_paper.html</A> |
11 | *: Co-author was Larry Rosler (retired from HP and Perl) |
12 | *: Introduced the GRT as a way to speed up Perl sorts |
13 | *: It won the prize for Best Technical Paper |
14 | *: A module Sort::Records was promised which would implement the GRT |
15 | *: Sort::Records was started but a code and design review showed many |
16 | problems. |
17 | *: Sort::Records was shelved during development because of a code review |
18 | *: People still write to me asking about that module |
19 | |
20 | PAGE_END |
21 | |
22 | title: Recent History |
23 | |
24 | *: I have been thinking about this module for 5 years |
25 | *: Solved the major design flaw - have the user provide key extraction |
26 | code |
27 | *: A thread in perl6-language gave needed inspiration |
28 | *: Sort::Maker emerges |
29 | *: Much help in the API design and some coding tricks from Damian Conway |
30 | PAGE_END |
31 | |
32 | title: Introducing Sort::Maker |
33 | |
34 | *: Easy description of sorts |
35 | *: Generates a sort sub |
36 | *: Choice of generated sort styles |
37 | *: Can generate the highest speed sorts in Perl |
38 | *: Generated source can be printed |
39 | **: For study |
40 | **: For pasting in code to save the generation time overhead |
41 | *: These slides can be found at |
42 | |
43 | html: |
44 | <A href="http://www.stemsystems.com/sort/slides/slides/index.html">http://www.stemsystems.com/slides/slides/index.html</A><BR> |
45 | *: The current tarball is at |
46 | |
47 | html: |
48 | <A href="http://www.stemsystems.com/sort/Sort-Maker-0.01.tar.gz">http://www.stemsystems.com/sort/Sort-Maker-0.01.tar.gz</A><BR> |
49 | |
50 | PAGE_END |
51 | |
52 | title: Sort::Maker Synopsis |
53 | |
54 | *: Exports a single sub 'make_sorter' |
55 | *: Each key has its own description |
56 | *: Call make_sorter with sort description arguments and it returns a |
57 | code reference. |
58 | *: Call that reference with unsorted input records |
59 | *: Returns sorted records |
60 | |
61 | code: |
62 | |
63 | my $sorter = make_sorter( |
64 | qw( plain ), |
65 | number => [ |
66 | code => '/(\d+)/', |
67 | 'descending', |
68 | ], |
69 | ) ; |
70 | |
71 | my @sorted = $sorter->( @unsorted ) ; |
72 | |
73 | PAGE_END |
74 | |
75 | title: Key Descriptions |
76 | |
77 | *: Each key has a description |
78 | *: Keys must be in the order they will sort (higher level keys must be |
79 | earlier) |
80 | *: The key type must be specified as 'string' or 'number' |
81 | *: Keys can have attributes such as 'descending' or 'case' |
82 | *: Attributes can have defaults set for all the keys |
83 | *: Default attributes can be overridden in any key description |
84 | |
85 | PAGE_END |
86 | |
87 | title: Extraction Code |
88 | |
89 | *: Each key needs code to extract it from a record |
90 | *: Each record is aliased to $_ (via map) |
91 | *: Key extraction code operates on $_ and gets the value for this key |
92 | *: The extraction code is executed in list context so m/(foo)/ works |
93 | *: The code is inside a do{} block so you can have multiple statements |
94 | |
95 | PAGE_END |
96 | |
97 | title: Sort Styles |
98 | |
99 | *: Four different sorting styles to choose from |
100 | **: Plain |
101 | **: Orcish Manouvre |
102 | **: Schwartian Transform (ST) |
103 | **: Guttman-Rosler Transform (GRT) |
104 | *: Each has its uses and advantages |
105 | *: Styles are really different ways to cache extracted keys |
106 | *: Caching keys moves key extraction from O( N log N ) to O( N ) |
107 | *: In larger sizes of sort sets, caching keys is a very big win |
108 | *: This is a classic sort of arrays of numbers |
109 | **: Compare this code to the generated code for the different sort styles |
110 | code: |
111 | |
112 | sort { $a->[0] cmp $b->[0] || |
113 | $a->[1] cmp $b->[1] |
114 | } |
115 | |
116 | PAGE_END |
117 | |
118 | title: Plain Sort |
119 | |
120 | *: No key caching |
121 | *: Similar to common sorts with a code block |
122 | *: Good for small sort sets as there is no caching |
123 | *: Pass 'plain' option to make_sorter |
124 | |
125 | code: |
126 | |
127 | sub { |
128 | |
129 | sort { |
130 | do{ my( $left, $right ) = map { $_->[0] } $a, $b; |
131 | $left cmp $right } |
132 | || |
133 | do{ my( $left, $right ) = map { $_->[1] } $a, $b; |
134 | $left cmp $right } |
135 | |
136 | } @_ ; |
137 | } |
138 | |
139 | PAGE_END |
140 | |
141 | title: Orcish Manouvre Sort |
142 | |
143 | *: Caches extracted keys in a hash |
144 | *: Assignments to the hash in ||= |
145 | *: Called the orcish because of Or-Cache |
146 | *: Created by Joseph Hall |
147 | *: It will re-extract keys for any record with a false value |
148 | *: The caching hash must be cleared beforehand |
149 | **: Sort::Maker declares a caching hash in the anonymous sub |
150 | *: Hash lookups are still O( N log N ) |
151 | *: Good for medium sort sets |
152 | *: Pass 'orcish' option to make_sorter |
153 | |
154 | code: |
155 | sub { |
156 | my %or_cache ; |
157 | |
158 | |
159 | sort { |
160 | ( |
161 | ( $or_cache{$a} ||= |
162 | do{ my ($val) = map { $_->[0] } $a ; $val } ) |
163 | cmp |
164 | ( $or_cache{$b} ||= |
165 | do{ my ($val) = map { $_->[0] } $b ; $val } ) |
166 | ) |
167 | || |
168 | ( |
169 | ( $or_cache{$a} ||= |
170 | do{ my ($val) = map { $_->[1] } $a ; $val } ) |
171 | cmp |
172 | ( $or_cache{$b} ||= |
173 | do{ my ($val) = map { $_->[1] } $b ; $val } ) |
174 | ) |
175 | |
176 | } @_ ; |
177 | } |
178 | |
179 | PAGE_END |
180 | |
181 | title: Schwartian Transform (ST) Sort |
182 | |
183 | *: Caches extracted keys in an anonymous array for each input record |
184 | *: Stores the record itself in slot 0 of the array |
185 | *: Uses the map/sort/map idiom |
186 | *: Popularized by Randal Schwartz |
187 | *: Good for medium to large sort sets |
188 | *: Key extraction is O( N ) |
189 | *: Does a full callback to Perl in the comparison block |
190 | *: Pass 'ST' option to make_sorter |
191 | |
192 | code: |
193 | sub { |
194 | map $_->[0], |
195 | sort { |
196 | $a->[1] cmp $b->[1] |
197 | || |
198 | $a->[2] cmp $b->[2] |
199 | |
200 | } |
201 | map [ $_, |
202 | do{ my ($val) = $_->[0] ; $val }, |
203 | do{ my ($val) = $_->[1] ; $val } |
204 | ], @_ ; |
205 | } |
206 | |
207 | PAGE_END |
208 | |
209 | title: Guttman-Rosler Transform (GRT) Sort |
210 | |
211 | *: Caches extracted keys in a single string for each input record |
212 | *: String records can be stored in that string |
213 | *: Reference records have their index stored in the cache string |
214 | **: The sorted indices are used to slice into the input array to create |
215 | sorted records |
216 | *: Key extraction is O( N ) |
217 | *: The comparison code is internal and uses C with no Perl callback (big |
218 | win) |
219 | *: More complex to use properly and efficiently |
220 | *: GRT has special key description attributes for optimization |
221 | **: Numbers can be integer/float, signed/unsigned |
222 | **: Strings can be fixed/varying |
223 | **: Fastest sort style for larger sets of sort records |
224 | *: Pass 'GRT' option to make_sorter |
225 | code: |
226 | sub { |
227 | my $rec_ind = 0 ; |
228 | |
229 | return @_[ |
230 | map unpack( 'N', substr( $_, -4 ) ), |
231 | sort |
232 | map pack( "Z*Z*N", |
233 | do{ my( $val ) = $_->[0] ; ( $val ) }, |
234 | do{ my( $val ) = $_->[1] ; ( $val ) }, |
235 | $rec_ind++ |
236 | ), @_ |
237 | ] ; |
238 | } |
239 | |
240 | PAGE_END |
241 | |
242 | title: Tests and Benchmarks |
243 | |
244 | *: Test system is table driven |
245 | *: Scripts can do both tests and benchmarks |
246 | *: Provide a set of data |
247 | **: Manually |
248 | **: Generate in place with map |
249 | **: Generate with a anonymous sub |
250 | *: Provide a hand written 'golden' sort sub |
251 | *: Provide the arguments to make_sorter() |
252 | *: Many tests are in but more are wanted |
253 | **: Send in tests if you want |
254 | *: More complex tests are needed |
255 | code: |
256 | |
257 | { |
258 | skip => 0, |
259 | name => 'arrays of multiple strings', |
260 | source => 1, |
261 | data => [ map { |
262 | [ rand_token( 8, 20 ), rand_token( 8, 20 ), ] |
263 | } 1 .. 100 |
264 | ], |
265 | gold => sub { $a->[0] cmp $b->[0] || |
266 | $a->[1] cmp $b->[1] }, |
267 | args => [ qw( string $_->[0] string $_->[1] ) ], |
268 | }, |
269 | |
270 | PAGE_END |
271 | |