paper/sort_outline

   1                       Outline for TPJ Sort Article
   2
   3 I. Overview of sorting
   4
   5   A. What is sorting
   6
   7   B. Why is it needed
   8
   9   C. Sort theory
  10
  11      1. O(n) order function
  12
  13      2. Constant and linear overhead for initialization (and post
  14         processing)
  15
  16      3. Constant overhead per compare
  17
  18      4. Why reducing compare count is critical
  19
  20      5. Why amount of sort data matters
  21
  22   D. Common sort algorithms
  23
  24     1. bubble           O(n**2)
  25     2. insertion        O(n**2)
  26     3. tree             O(n log n)
  27     4. shell            O(n log n)
  28     5. quicksort        O(n log n)
  29
  30   E. Other information
  31
  32     1. Knuth on sorting (Vol. 3, Chap. 5)
  33
  34     2. perlfaq5: "How do I sort an array by (anthing)?"
  35
  36     3. Any good textbook on sorting and/or algorithms
  37
  38 II. Perl's sort function
  39
  40   A. Internally uses C qsort
  41
  42   B. Builtin compare is cmp
  43
  44     1. C's memcmp for chars
  45     2. something larry mentioned for locale
  46
  47   C. Custom compare code
  48
  49         1. block of code
  50
  51         2. sub name
  52
  53         3. sub type glob
  54
  55         4. sub ref (to be done)
  56
  57   C. Special vars $a and $b in compare
  58
  59 III Basic Ways to sort in Perl
  60
  61   A. Default ascending lexographic order
  62
  63   B. Trivial sorts with callback compare code
  64
  65     1. Numeric sort
  66
  67         { $a <=> $b }
  68
  69     2. Reverse sort
  70
  71         { $b cmp $a } OR reverse sort @unsorted (depending on input size)
  72
  73     3. Hash value sort
  74
  75         { $hash{$a} cmp $hash{$b} }
  76
  77 larry said this about hash value: (please clafiry)
  78 This is a derived-key sort.
  79
  80     4. Array index sort
  81
  82         { $array[$a] cmp $array[$b] }
  83
  84   C. Basic multi-field sorts
  85
  86     1. Multi-field compares
  87
  88         { $a->[0] cmp $b->[0] ||
  89           $a->[1] cmp $b->[1] }
  90
  91     2. Multi-field compares with processing (bad)
  92
  93 larry comments: (please clarify)
  94 No single-valued sort key per datum.  Beyond the scope of
  95 this article.
  96
  97
  98 IV. Efficient sorting in Perl
  99
 100   A. When an investment in improving efficiency is not warranted
 101
 102      1. Short data sets
 103
 104      2. Rarely executed program
 105
 106      3. Low cost in CPU time compared to other parts of the program
 107
 108   B. Sort overhead
 109
 110     1. builtin vs. {$a cmp $b}
 111
 112     2. complex compare code
 113
 114   C. General ideas on how to speed up sort by compare reduction
 115
 116     1. Reducing sort compare to default or trivial compare
 117
 118     2. Pre- and Postprocessing is O(n) vs. complex compare's O(n log n)
 119
 120     3. Most these methods are isomorphic as they are O(n) processing
 121        wrapped around O(n log n) sorting
 122
 123   D. Reduction to trivial sorts
 124
 125     1. Schwartzian transform (from Randal Schwartz)
 126
 127       a. Converts input to LOL (original data/ref  is one element of list)
 128
 129       b. array index sort (trival sort 4)
 130
 131       c. convert LOL to output {extract saved data/ref)
 132
 133       d. uses maps or temp arrays
 134
 135       e. examples and benchmarks (IP addresses)
 136
 137     2. Orcish (Or-cache) manoever (from Joseph Hall)
 138
 139       a. Converts sort data in compare code
 140
 141       b. Converts only once per unique sort data
 142
 143       c. Stores converted keys in private hash
 144
 145       d. hash value sort (trival sort 3)
 146
 147       e. No need for post processing, as sort is on real data
 148
 149       f. Can be converted use pre- and post processing (unknown
 150          benchmark)
 151
 152       g. Requires a single sort key (to hash to converted key)
 153          (single key can be a ref since that will hash uniquely)
 154
 155       h. examples and benchmarks (IP addresses)
 156
 157   E. Reduction to default sorts
 158
 159     1. Most efficient since no perl code is called during compare
 160
 161     2. Can sort multiple keys, string or integers, IEEE floats,
 162        ascending or descending (per key).
 163
 164     3. Preprocessing of input data to sortable strings
 165
 166       a. uses pack or sprintf (or other perl code line . and join) to
 167          create sort key (in a map or foreach/push)
 168
 169       b. Converted key can be a combination of binary and ascii
 170
 171       c. Original data is appended onto created sort key
 172
 173     4. Extraction of original data
 174
 175       d. Use substr or unpack (in a map or foreach/push)
 176
 177     5. Can't sort refs directly, but an array index sort works around
 178        that:
 179
 180         Store the refs in an array. store the array index for each ref
 181         as the original data in the string.
 182
 183     6. Printed sort
 184
 185       a. primarily uses sprintf to create printable keys
 186
 187       b. fully portable as is
 188
 189       c. need to blank/zero pad integers to keep created keys same size
 190
 191       d. reverse sort order via ??? (larry, clarify your
 192          idea here) (also can floats be reversed?)
 193
 194       e. examples and benchmarks (IP addresses)
 195
 196     7. Packed sort
 197
 198       a. primarily uses pack to create binary keys
 199
 200       b. needs to handle porting issues (endian, IEEE floats)
 201
 202       c. no padding needed for binary key parts
 203
 204       d. reverse sort order via ??? (larry, clarify your
 205          idea here) (also can floats be reversed?)
 206
 207       e. examples and benchmarks (IP addresses)