Commit | Line | Data |
1edf8e51 |
1 | Text::Soundex - Implementation of the soundex algorithm. |
11f885b5 |
2 | |
3 | Basic Usage: |
4 | |
5 | Soundex is used to do a one way transformation of a name, converting |
6 | a character string given as input into a set of codes representing |
7 | the identifiable sounds those characters might make in the output. |
8 | |
9 | For example: |
10 | |
11 | use Text::Soundex; |
12 | |
13 | print soundex("Mark"), "\n"; # prints: M620 |
14 | print soundex("Marc"), "\n"; # prints: M620 |
15 | |
16 | print soundex("Hansen"), "\n"; # prints: H525 |
17 | print soundex("Hanson"), "\n"; # prints: H525 |
18 | print soundex("Henson"), "\n"; # prints: H525 |
19 | |
20 | In many situations, code such as the following: |
21 | |
22 | if ($name1 eq $name2) { |
23 | ... |
24 | } |
25 | |
26 | Can be substituted with: |
27 | |
28 | if (soundex($name1) eq soundex($name2)) { |
29 | ... |
30 | } |
31 | |
32 | Installation: |
33 | |
34 | Once the archive has been unpacked then the following steps are needed |
35 | to build, test and install the module (to be done in the directory which |
36 | contains the Makefile.PL) |
37 | |
38 | perl Makefile.PL |
39 | make |
40 | make test |
41 | |
42 | If the make test succeeds then the next step may need to be run as root |
43 | (on a Unix-like system) or with special privileges on other systems. |
44 | |
45 | make install |
46 | |
47 | If you do not want to use the XS code (for whatever reason) do the following |
48 | instead of the above: |
49 | |
50 | perl Makefile.PL --no-xs |
51 | make |
52 | make test |
53 | make install |
54 | |
55 | If any of the tests report 'not ok' and you are running perl 5.6.0 or later |
56 | then please contact Mark Mielke <mark@mielke.cc> |
57 | |
58 | History: |
59 | |
1edf8e51 |
60 | Version 3.03: |
61 | Updated to allow the XS implementation to work properly under an |
62 | EBCDIC/EBCDIC-UTF8 character set environment. |
63 | |
64 | Updated documentation to better describe the history of the |
65 | soundex algorithm and how it applies to this module. |
66 | |
11f885b5 |
67 | Version 3.02: |
68 | 3.01 and 3.00 used the 'U8' type incorrectly causing some strict |
1edf8e51 |
69 | compilers to complain or refuse to compile the XS code. Also, Unicode |
11f885b5 |
70 | support did not work properly for Perl 5.6.x. Both of these problems |
71 | are now fixed. |
72 | |
73 | Version 3.01: |
74 | A bug with non-UTF 8 strings that contain non-ASCII alphabetic characters |
75 | was fixed. The soundex_unicode() and soundex_nara_unicode() wrapper |
76 | routines were included and the documentation refers the user to the |
77 | excellent Text::Unidecode module to perform soundex encodings using |
78 | unicode strings. The Perl versions of the routines have been further |
79 | optimized, and correct a border case involving non-alphabetic characters |
80 | at the beginning of the string. |
81 | |
82 | Version 3.00: |
83 | Support for UTF-8 strings (unicode strings) is now in place. Note |
84 | that this allows UTF-8 strings to be passed to the XS version of |
85 | the soundex() routine. The Soundex algorithm treats characters |
86 | outside the ascii range (0x00 - 0x7F) as if they were not |
87 | alphabetical. |
88 | |
89 | The interface has been simplified. In order to explicitly use the |
90 | non-XS implementation of soundex(): |
91 | |
92 | use Text::Soundex (); |
93 | $code = Text::Soundex::soundex_noxs($name); |
94 | |
95 | In order to use the NARA soundex algorithm: |
96 | |
97 | use Text::Soundex 'soundex_nara'; |
98 | $code = soundex_nara($name); |
99 | |
100 | Use of the ':NARA-Ruleset' import directive is now obsolete. To |
101 | emulate the old behaviour: |
102 | |
103 | use Text::Soundex (); |
104 | *soundex = \&Text::Soundex::soundex_nara; |
105 | $code = soundex($name); |
106 | |
107 | Version 2.20: |
108 | This version includes support for the algorithm used to index |
109 | the U.S. Federal Censuses. There is a slight descrepancy in the |
110 | definition for a soundex code which is not commonly known or |
111 | recognized involved similar sounding letters being seperated |
112 | by the characters H or W. This is defined as the NARA ruleset, |
113 | as this descrepency was discovered by them. (Calling it "the |
114 | US Census ruleset" was too unwieldy...) |
115 | |
116 | NARA can be found at: |
117 | http://www.nara.gov/genealogy/ |
118 | |
1edf8e51 |
119 | The algorithm used by NARA can be found at: |
11f885b5 |
120 | http://home.utah-inter.net/kinsearch/Soundex.html |
121 | |
11f885b5 |
122 | Version 2.00: |
123 | This version is a full re-write of the 1.0 engine by Mark Mielke. |
124 | The goal was for speed... and this was achieved. There is an optional |
125 | XS module which can be used completely transparently by the user |
126 | which offers a further speed increase of a factor of more than 7.5X. |
127 | |
128 | Version 1.00: |
129 | This version can be found in the perl core distribution from at |
130 | least Perl 5.8.0 and down. It was written by Mike Stok. It can be |
131 | identified by the fact that it does not contain a $VERSION |
132 | in the beginning of the module, and as well it uses an RCS |
133 | tag with a version of 1.x. This version, before some perl5'ish |
134 | packaging was introduced, was actually written for perl4. |