Move Text::Soundex from ext/ to cpan/
[p5sagit/p5-mst-13.2.git] / cpan / Text-Soundex / README
CommitLineData
1edf8e51 1Text::Soundex - Implementation of the soundex algorithm.
11f885b5 2
3Basic Usage:
4
5 Soundex is used to do a one way transformation of a name, converting
6 a character string given as input into a set of codes representing
7 the identifiable sounds those characters might make in the output.
8
9 For example:
10
11 use Text::Soundex;
12
13 print soundex("Mark"), "\n"; # prints: M620
14 print soundex("Marc"), "\n"; # prints: M620
15
16 print soundex("Hansen"), "\n"; # prints: H525
17 print soundex("Hanson"), "\n"; # prints: H525
18 print soundex("Henson"), "\n"; # prints: H525
19
20 In many situations, code such as the following:
21
22 if ($name1 eq $name2) {
23 ...
24 }
25
26 Can be substituted with:
27
28 if (soundex($name1) eq soundex($name2)) {
29 ...
30 }
31
32Installation:
33
34 Once the archive has been unpacked then the following steps are needed
35 to build, test and install the module (to be done in the directory which
36 contains the Makefile.PL)
37
38 perl Makefile.PL
39 make
40 make test
41
42 If the make test succeeds then the next step may need to be run as root
43 (on a Unix-like system) or with special privileges on other systems.
44
45 make install
46
47 If you do not want to use the XS code (for whatever reason) do the following
48 instead of the above:
49
50 perl Makefile.PL --no-xs
51 make
52 make test
53 make install
54
55 If any of the tests report 'not ok' and you are running perl 5.6.0 or later
56 then please contact Mark Mielke <mark@mielke.cc>
57
58History:
59
1edf8e51 60 Version 3.03:
61 Updated to allow the XS implementation to work properly under an
62 EBCDIC/EBCDIC-UTF8 character set environment.
63
64 Updated documentation to better describe the history of the
65 soundex algorithm and how it applies to this module.
66
11f885b5 67 Version 3.02:
68 3.01 and 3.00 used the 'U8' type incorrectly causing some strict
1edf8e51 69 compilers to complain or refuse to compile the XS code. Also, Unicode
11f885b5 70 support did not work properly for Perl 5.6.x. Both of these problems
71 are now fixed.
72
73 Version 3.01:
74 A bug with non-UTF 8 strings that contain non-ASCII alphabetic characters
75 was fixed. The soundex_unicode() and soundex_nara_unicode() wrapper
76 routines were included and the documentation refers the user to the
77 excellent Text::Unidecode module to perform soundex encodings using
78 unicode strings. The Perl versions of the routines have been further
79 optimized, and correct a border case involving non-alphabetic characters
80 at the beginning of the string.
81
82 Version 3.00:
83 Support for UTF-8 strings (unicode strings) is now in place. Note
84 that this allows UTF-8 strings to be passed to the XS version of
85 the soundex() routine. The Soundex algorithm treats characters
86 outside the ascii range (0x00 - 0x7F) as if they were not
87 alphabetical.
88
89 The interface has been simplified. In order to explicitly use the
90 non-XS implementation of soundex():
91
92 use Text::Soundex ();
93 $code = Text::Soundex::soundex_noxs($name);
94
95 In order to use the NARA soundex algorithm:
96
97 use Text::Soundex 'soundex_nara';
98 $code = soundex_nara($name);
99
100 Use of the ':NARA-Ruleset' import directive is now obsolete. To
101 emulate the old behaviour:
102
103 use Text::Soundex ();
104 *soundex = \&Text::Soundex::soundex_nara;
105 $code = soundex($name);
106
107 Version 2.20:
108 This version includes support for the algorithm used to index
109 the U.S. Federal Censuses. There is a slight descrepancy in the
110 definition for a soundex code which is not commonly known or
111 recognized involved similar sounding letters being seperated
112 by the characters H or W. This is defined as the NARA ruleset,
113 as this descrepency was discovered by them. (Calling it "the
114 US Census ruleset" was too unwieldy...)
115
116 NARA can be found at:
117 http://www.nara.gov/genealogy/
118
1edf8e51 119 The algorithm used by NARA can be found at:
11f885b5 120 http://home.utah-inter.net/kinsearch/Soundex.html
121
11f885b5 122 Version 2.00:
123 This version is a full re-write of the 1.0 engine by Mark Mielke.
124 The goal was for speed... and this was achieved. There is an optional
125 XS module which can be used completely transparently by the user
126 which offers a further speed increase of a factor of more than 7.5X.
127
128 Version 1.00:
129 This version can be found in the perl core distribution from at
130 least Perl 5.8.0 and down. It was written by Mike Stok. It can be
131 identified by the fact that it does not contain a $VERSION
132 in the beginning of the module, and as well it uses an RCS
133 tag with a version of 1.x. This version, before some perl5'ish
134 packaging was introduced, was actually written for perl4.