book: Sort::Maker chapter: Sort::Maker title: A Little Past History *: TPC3 in Monterey, CA 1999 *: Co-wrote and presented "A Fresh Look at Efficient Perl Sorting" html: http://www.sysarch.com/perl/sort_paper.html *: Co-author was Larry Rosler (retired from HP and Perl) *: Introduced the GRT as a way to speed up Perl sorts *: It won the prize for Best Technical Paper *: A module Sort::Records was promised which would implement the GRT *: Sort::Records was started but a code and design review showed many problems. *: Sort::Records was shelved during development because of a code review *: People still write to me asking about that module PAGE_END title: Recent History *: I have been thinking about this module for 5 years *: Solved the major design flaw - have the user provide key extraction code *: A thread in perl6-language gave needed inspiration *: Sort::Maker emerges *: Much help in the API design and some coding tricks from Damian Conway PAGE_END title: Introducing Sort::Maker *: Easy description of sorts *: Generates a sort sub *: Choice of generated sort styles *: Can generate the highest speed sorts in Perl *: Generated source can be printed **: For study **: For pasting in code to save the generation time overhead *: These slides can be found at html: http://www.stemsystems.com/slides/slides/index.html
*: The current tarball is at html: http://www.stemsystems.com/sort/Sort-Maker-0.01.tar.gz
PAGE_END title: Sort::Maker Synopsis *: Exports a single sub 'make_sorter' *: Each key has its own description *: Call make_sorter with sort description arguments and it returns a code reference. *: Call that reference with unsorted input records *: Returns sorted records code: my $sorter = make_sorter( qw( plain ), number => [ code => '/(\d+)/', 'descending', ], ) ; my @sorted = $sorter->( @unsorted ) ; PAGE_END title: Key Descriptions *: Each key has a description *: Keys must be in the order they will sort (higher level keys must be earlier) *: The key type must be specified as 'string' or 'number' *: Keys can have attributes such as 'descending' or 'case' *: Attributes can have defaults set for all the keys *: Default attributes can be overridden in any key description PAGE_END title: Extraction Code *: Each key needs code to extract it from a record *: Each record is aliased to $_ (via map) *: Key extraction code operates on $_ and gets the value for this key *: The extraction code is executed in list context so m/(foo)/ works *: The code is inside a do{} block so you can have multiple statements PAGE_END title: Sort Styles *: Four different sorting styles to choose from **: Plain **: Orcish Manouvre **: Schwartian Transform (ST) **: Guttman-Rosler Transform (GRT) *: Each has its uses and advantages *: Styles are really different ways to cache extracted keys *: Caching keys moves key extraction from O( N log N ) to O( N ) *: In larger sizes of sort sets, caching keys is a very big win *: This is a classic sort of arrays of numbers **: Compare this code to the generated code for the different sort styles code: sort { $a->[0] cmp $b->[0] || $a->[1] cmp $b->[1] } PAGE_END title: Plain Sort *: No key caching *: Similar to common sorts with a code block *: Good for small sort sets as there is no caching *: Pass 'plain' option to make_sorter code: sub { sort { do{ my( $left, $right ) = map { $_->[0] } $a, $b; $left cmp $right } || do{ my( $left, $right ) = map { $_->[1] } $a, $b; $left cmp $right } } @_ ; } PAGE_END title: Orcish Manouvre Sort *: Caches extracted keys in a hash *: Assignments to the hash in ||= *: Called the orcish because of Or-Cache *: Created by Joseph Hall *: It will re-extract keys for any record with a false value *: The caching hash must be cleared beforehand **: Sort::Maker declares a caching hash in the anonymous sub *: Hash lookups are still O( N log N ) *: Good for medium sort sets *: Pass 'orcish' option to make_sorter code: sub { my %or_cache ; sort { ( ( $or_cache{$a} ||= do{ my ($val) = map { $_->[0] } $a ; $val } ) cmp ( $or_cache{$b} ||= do{ my ($val) = map { $_->[0] } $b ; $val } ) ) || ( ( $or_cache{$a} ||= do{ my ($val) = map { $_->[1] } $a ; $val } ) cmp ( $or_cache{$b} ||= do{ my ($val) = map { $_->[1] } $b ; $val } ) ) } @_ ; } PAGE_END title: Schwartian Transform (ST) Sort *: Caches extracted keys in an anonymous array for each input record *: Stores the record itself in slot 0 of the array *: Uses the map/sort/map idiom *: Popularized by Randal Schwartz *: Good for medium to large sort sets *: Key extraction is O( N ) *: Does a full callback to Perl in the comparison block *: Pass 'ST' option to make_sorter code: sub { map $_->[0], sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] } map [ $_, do{ my ($val) = $_->[0] ; $val }, do{ my ($val) = $_->[1] ; $val } ], @_ ; } PAGE_END title: Guttman-Rosler Transform (GRT) Sort *: Caches extracted keys in a single string for each input record *: String records can be stored in that string *: Reference records have their index stored in the cache string **: The sorted indices are used to slice into the input array to create sorted records *: Key extraction is O( N ) *: The comparison code is internal and uses C with no Perl callback (big win) *: More complex to use properly and efficiently *: GRT has special key description attributes for optimization **: Numbers can be integer/float, signed/unsigned **: Strings can be fixed/varying **: Fastest sort style for larger sets of sort records *: Pass 'GRT' option to make_sorter code: sub { my $rec_ind = 0 ; return @_[ map unpack( 'N', substr( $_, -4 ) ), sort map pack( "Z*Z*N", do{ my( $val ) = $_->[0] ; ( $val ) }, do{ my( $val ) = $_->[1] ; ( $val ) }, $rec_ind++ ), @_ ] ; } PAGE_END title: Tests and Benchmarks *: Test system is table driven *: Scripts can do both tests and benchmarks *: Provide a set of data **: Manually **: Generate in place with map **: Generate with a anonymous sub *: Provide a hand written 'golden' sort sub *: Provide the arguments to make_sorter() *: Many tests are in but more are wanted **: Send in tests if you want *: More complex tests are needed code: { skip => 0, name => 'arrays of multiple strings', source => 1, data => [ map { [ rand_token( 8, 20 ), rand_token( 8, 20 ), ] } 1 .. 100 ], gold => sub { $a->[0] cmp $b->[0] || $a->[1] cmp $b->[1] }, args => [ qw( string $_->[0] string $_->[1] ) ], }, PAGE_END