Commit | Line | Data |
11f885b5 |
1 | Text::Soundex Version 3.02 |
2 | |
3 | NOTE: Users of Text::Soundex Version 2.x should consult the 'History' |
4 | section at the end of this document before installing this module. |
5 | The interface has been simplified, and existing code that takes |
6 | advantages of Version 2.x features may need to be altered to function |
7 | properly. |
8 | |
9 | This is a perl 5 module implementing the Soundex algorithm described by |
10 | Knuth. The algorithm is used quite often for locating a person by name |
11 | where the actual spelling of the name is not known. |
12 | |
13 | This version directly supercedes the version of Text::Soundex that can be |
14 | found in the core from Perl 5.8.0 and down. (This version is a drop-in |
15 | replacement) |
16 | |
17 | The algorithm used by soundex() is NOT fully compatible with the |
18 | algorithm used to index names for US Censuses. Use the soundex_nara() |
19 | subroutine to return codes for this purpose. |
20 | |
21 | Basic Usage: |
22 | |
23 | Soundex is used to do a one way transformation of a name, converting |
24 | a character string given as input into a set of codes representing |
25 | the identifiable sounds those characters might make in the output. |
26 | |
27 | For example: |
28 | |
29 | use Text::Soundex; |
30 | |
31 | print soundex("Mark"), "\n"; # prints: M620 |
32 | print soundex("Marc"), "\n"; # prints: M620 |
33 | |
34 | print soundex("Hansen"), "\n"; # prints: H525 |
35 | print soundex("Hanson"), "\n"; # prints: H525 |
36 | print soundex("Henson"), "\n"; # prints: H525 |
37 | |
38 | In many situations, code such as the following: |
39 | |
40 | if ($name1 eq $name2) { |
41 | ... |
42 | } |
43 | |
44 | Can be substituted with: |
45 | |
46 | if (soundex($name1) eq soundex($name2)) { |
47 | ... |
48 | } |
49 | |
50 | Installation: |
51 | |
52 | Once the archive has been unpacked then the following steps are needed |
53 | to build, test and install the module (to be done in the directory which |
54 | contains the Makefile.PL) |
55 | |
56 | perl Makefile.PL |
57 | make |
58 | make test |
59 | |
60 | If the make test succeeds then the next step may need to be run as root |
61 | (on a Unix-like system) or with special privileges on other systems. |
62 | |
63 | make install |
64 | |
65 | If you do not want to use the XS code (for whatever reason) do the following |
66 | instead of the above: |
67 | |
68 | perl Makefile.PL --no-xs |
69 | make |
70 | make test |
71 | make install |
72 | |
73 | If any of the tests report 'not ok' and you are running perl 5.6.0 or later |
74 | then please contact Mark Mielke <mark@mielke.cc> |
75 | |
76 | History: |
77 | |
78 | Version 3.02: |
79 | 3.01 and 3.00 used the 'U8' type incorrectly causing some strict |
80 | compilers to complain or refuse to compile the XS code. Also, unicode |
81 | support did not work properly for Perl 5.6.x. Both of these problems |
82 | are now fixed. |
83 | |
84 | Version 3.01: |
85 | A bug with non-UTF 8 strings that contain non-ASCII alphabetic characters |
86 | was fixed. The soundex_unicode() and soundex_nara_unicode() wrapper |
87 | routines were included and the documentation refers the user to the |
88 | excellent Text::Unidecode module to perform soundex encodings using |
89 | unicode strings. The Perl versions of the routines have been further |
90 | optimized, and correct a border case involving non-alphabetic characters |
91 | at the beginning of the string. |
92 | |
93 | Version 3.00: |
94 | Support for UTF-8 strings (unicode strings) is now in place. Note |
95 | that this allows UTF-8 strings to be passed to the XS version of |
96 | the soundex() routine. The Soundex algorithm treats characters |
97 | outside the ascii range (0x00 - 0x7F) as if they were not |
98 | alphabetical. |
99 | |
100 | The interface has been simplified. In order to explicitly use the |
101 | non-XS implementation of soundex(): |
102 | |
103 | use Text::Soundex (); |
104 | $code = Text::Soundex::soundex_noxs($name); |
105 | |
106 | In order to use the NARA soundex algorithm: |
107 | |
108 | use Text::Soundex 'soundex_nara'; |
109 | $code = soundex_nara($name); |
110 | |
111 | Use of the ':NARA-Ruleset' import directive is now obsolete. To |
112 | emulate the old behaviour: |
113 | |
114 | use Text::Soundex (); |
115 | *soundex = \&Text::Soundex::soundex_nara; |
116 | $code = soundex($name); |
117 | |
118 | Version 2.20: |
119 | This version includes support for the algorithm used to index |
120 | the U.S. Federal Censuses. There is a slight descrepancy in the |
121 | definition for a soundex code which is not commonly known or |
122 | recognized involved similar sounding letters being seperated |
123 | by the characters H or W. This is defined as the NARA ruleset, |
124 | as this descrepency was discovered by them. (Calling it "the |
125 | US Census ruleset" was too unwieldy...) |
126 | |
127 | NARA can be found at: |
128 | http://www.nara.gov/genealogy/ |
129 | |
130 | The algorithm requested by NARA can be found at: |
131 | http://home.utah-inter.net/kinsearch/Soundex.html |
132 | |
133 | Ways to use it in your code: |
134 | |
135 | Transparently change existing code like this: |
136 | ============================================= |
137 | use Text::Soundex qw(:NARA-Ruleset); |
138 | |
139 | ... soundex(...) ... |
140 | |
141 | -- |
142 | |
143 | Make the change visibly distinct like this: |
144 | =========================================== |
145 | use Text::Soundex qw(soundex_nara); |
146 | |
147 | ... soundex_nara(...) ... |
148 | |
149 | Version 2.00: |
150 | This version is a full re-write of the 1.0 engine by Mark Mielke. |
151 | The goal was for speed... and this was achieved. There is an optional |
152 | XS module which can be used completely transparently by the user |
153 | which offers a further speed increase of a factor of more than 7.5X. |
154 | |
155 | Version 1.00: |
156 | This version can be found in the perl core distribution from at |
157 | least Perl 5.8.0 and down. It was written by Mike Stok. It can be |
158 | identified by the fact that it does not contain a $VERSION |
159 | in the beginning of the module, and as well it uses an RCS |
160 | tag with a version of 1.x. This version, before some perl5'ish |
161 | packaging was introduced, was actually written for perl4. |