1 Outline for TPJ Sort Article
11 1. O(n) order function
13 2. Constant and linear overhead for initialization (and post
16 3. Constant overhead per compare
18 4. Why reducing compare count is critical
20 5. Why amount of sort data matters
22 D. Common sort algorithms
28 5. quicksort O(n log n)
32 1. Knuth on sorting (Vol. 3, Chap. 5)
34 2. perlfaq5: "How do I sort an array by (anthing)?"
36 3. Any good textbook on sorting and/or algorithms
38 II. Perl's sort function
40 A. Internally uses C qsort
42 B. Builtin compare is cmp
44 1. C's memcmp for chars
45 2. something larry mentioned for locale
47 C. Custom compare code
55 4. sub ref (to be done)
57 C. Special vars $a and $b in compare
59 III Basic Ways to sort in Perl
61 A. Default ascending lexographic order
63 B. Trivial sorts with callback compare code
71 { $b cmp $a } OR reverse sort @unsorted (depending on input size)
75 { $hash{$a} cmp $hash{$b} }
77 larry said this about hash value: (please clafiry)
78 This is a derived-key sort.
82 { $array[$a] cmp $array[$b] }
84 C. Basic multi-field sorts
86 1. Multi-field compares
88 { $a->[0] cmp $b->[0] ||
91 2. Multi-field compares with processing (bad)
93 larry comments: (please clarify)
94 No single-valued sort key per datum. Beyond the scope of
98 IV. Efficient sorting in Perl
100 A. When an investment in improving efficiency is not warranted
104 2. Rarely executed program
106 3. Low cost in CPU time compared to other parts of the program
110 1. builtin vs. {$a cmp $b}
112 2. complex compare code
114 C. General ideas on how to speed up sort by compare reduction
116 1. Reducing sort compare to default or trivial compare
118 2. Pre- and Postprocessing is O(n) vs. complex compare's O(n log n)
120 3. Most these methods are isomorphic as they are O(n) processing
121 wrapped around O(n log n) sorting
123 D. Reduction to trivial sorts
125 1. Schwartzian transform (from Randal Schwartz)
127 a. Converts input to LOL (original data/ref is one element of list)
129 b. array index sort (trival sort 4)
131 c. convert LOL to output {extract saved data/ref)
133 d. uses maps or temp arrays
135 e. examples and benchmarks (IP addresses)
137 2. Orcish (Or-cache) manoever (from Joseph Hall)
139 a. Converts sort data in compare code
141 b. Converts only once per unique sort data
143 c. Stores converted keys in private hash
145 d. hash value sort (trival sort 3)
147 e. No need for post processing, as sort is on real data
149 f. Can be converted use pre- and post processing (unknown
152 g. Requires a single sort key (to hash to converted key)
153 (single key can be a ref since that will hash uniquely)
155 h. examples and benchmarks (IP addresses)
157 E. Reduction to default sorts
159 1. Most efficient since no perl code is called during compare
161 2. Can sort multiple keys, string or integers, IEEE floats,
162 ascending or descending (per key).
164 3. Preprocessing of input data to sortable strings
166 a. uses pack or sprintf (or other perl code line . and join) to
167 create sort key (in a map or foreach/push)
169 b. Converted key can be a combination of binary and ascii
171 c. Original data is appended onto created sort key
173 4. Extraction of original data
175 d. Use substr or unpack (in a map or foreach/push)
177 5. Can't sort refs directly, but an array index sort works around
180 Store the refs in an array. store the array index for each ref
181 as the original data in the string.
185 a. primarily uses sprintf to create printable keys
187 b. fully portable as is
189 c. need to blank/zero pad integers to keep created keys same size
191 d. reverse sort order via ??? (larry, clarify your
192 idea here) (also can floats be reversed?)
194 e. examples and benchmarks (IP addresses)
198 a. primarily uses pack to create binary keys
200 b. needs to handle porting issues (endian, IEEE floats)
202 c. no padding needed for binary key parts
204 d. reverse sort order via ??? (larry, clarify your
205 idea here) (also can floats be reversed?)
207 e. examples and benchmarks (IP addresses)