From: Nigel Metheringham Date: Mon, 15 Mar 2010 17:36:44 +0000 (+0000) Subject: Documentation on Unicode use with DBIC X-Git-Tag: v0.08121~61 X-Git-Url: http://git.shadowcat.co.uk/gitweb/gitweb.cgi?a=commitdiff_plain;h=7c14c3cf3b25759613e2ecef2e2b50029b555745;hp=ef8d02eb7856f32ef1d738aba4923a167c6ed2df;p=dbsrgits%2FDBIx-Class.git Documentation on Unicode use with DBIC --- diff --git a/Changes b/Changes index b300e60..df3f225 100644 --- a/Changes +++ b/Changes @@ -21,6 +21,7 @@ Revision history for DBIx::Class (RT#54063) - Support add_columns('+colname' => { ... }) to augment column definitions. + - Unicode support documentation in Cookbook and UTF8Columns 0.08120 2010-02-24 08:58:00 (UTC) - Make sure possibly overwritten deployment_statements methods in diff --git a/lib/DBIx/Class/Manual/Cookbook.pod b/lib/DBIx/Class/Manual/Cookbook.pod index 3b3eb55..5524c18 100644 --- a/lib/DBIx/Class/Manual/Cookbook.pod +++ b/lib/DBIx/Class/Manual/Cookbook.pod @@ -1741,6 +1741,75 @@ the bind values (the C<[1, 2, 3]> arrayref in the above example) wrapped in arrayrefs together with the column name, like this: C<< [column_name => value] >>. +=head2 Using Unicode + +When using unicode character data there are two alternatives - +either your database supports unicode characters (including setting +the utf8 flag on the returned string), or you need to encode/decode +data appropriately each time a string field is inserted into or +retrieved from the database. It is better to avoid +encoding/decoding data and to use your database's own unicode +capabilities if at all possible. + +The L component handles storing selected +unicode columns in a database that does not directly support +unicode. If used with a database that does correctly handle unicode +then strange and unexpected data corrupt B occur. + +The Catalyst Wiki Unicode page at +L +has additional information on the use of Unicode with Catalyst and +DBIx::Class. + +The following databases do correctly handle unicode data:- + +=head3 MySQL + +MySQL supports unicode, and will correctly flag utf8 data from the +database if the C is set in the connect options. + + my $schema = My::Schema->connection('dbi:mysql:dbname=test', + $user, $pass, + { mysql_enable_utf8 => 1} ); + + +When set, a data retrieved from a textual column type (char, +varchar, etc) will have the UTF-8 flag turned on if necessary. This +enables character semantics on that string. You will also need to +ensure that your database / table / column is configured to use +UTF8. See Chapter 10 of the mysql manual for details. + +See L for further details. + +=head3 Oracle + +Information about Oracle support for unicode can be found in +L. + +=head3 PostgreSQL + +PostgreSQL supports unicode if the character set is correctly set +at database creation time. Additionally the C +should be set to ensure unicode data is correctly marked. + + my $schema = My::Schema->connection('dbi:Pg:dbname=test', + $user, $pass, + { pg_enable_utf8 => 1} ); + +Further information can be found in L. + +=head3 SQLite + +SQLite version 3 and above natively use unicode internally. To +correctly mark unicode strings taken from the database, the +C flag should be set at connect time (in versions +of L prior to 1.27 this attribute was named +C). + + my $schema = My::Schema->connection('dbi:SQLite:/tmp/test.db', + '', '', + { sqlite_unicode => 1} ); + =head1 BOOTSTRAPPING/MIGRATING =head2 Easy migration from class-based to schema-based setup diff --git a/lib/DBIx/Class/Manual/FAQ.pod b/lib/DBIx/Class/Manual/FAQ.pod index 464040d..9281a99 100644 --- a/lib/DBIx/Class/Manual/FAQ.pod +++ b/lib/DBIx/Class/Manual/FAQ.pod @@ -56,6 +56,12 @@ Create your classes manually, as above. Write a script that calls L. See there for details, or the L. +=item .. store/retrieve Unicode data in my database? + +Make sure you database supports Unicode and set the connect +attributes appropriately - see +L + =item .. connect to my database? Once you have created all the appropriate table/source classes, and an diff --git a/lib/DBIx/Class/UTF8Columns.pm b/lib/DBIx/Class/UTF8Columns.pm index a25ac39..63471e9 100644 --- a/lib/DBIx/Class/UTF8Columns.pm +++ b/lib/DBIx/Class/UTF8Columns.pm @@ -9,6 +9,9 @@ __PACKAGE__->mk_classdata( '_utf8_columns' ); DBIx::Class::UTF8Columns - Force UTF8 (Unicode) flag on columns + Please ensure you understand the purpose of this module before use. + Read the warnings below to prevent data corruption through misuse. + =head1 SYNOPSIS package Artist; @@ -23,9 +26,24 @@ DBIx::Class::UTF8Columns - Force UTF8 (Unicode) flag on columns =head1 DESCRIPTION -This module allows you to get columns data that have utf8 (Unicode) flag. +This module allows you to get and store utf8 (unicode) column data +in a database that does not natively support unicode. It ensures +that column data is correctly serialised as a byte stream when +stored and de-serialised to unicode strings on retrieval. + +=head2 Warning - Native Database Unicode Support + +If your database natively supports Unicode (as does SQLite with the +C connect flag, MySQL with C +connect flag or Postgres with the C connect flag), +then this component should B be used, and will corrupt unicode +data in a subtle and unexpected manner. + +It is far better to do Unicode support within the database if +possible rather convert data into and out of the database on every +round trip. -=head2 Warning +=head2 Warning - Component Overloading Note that this module overloads L in a way that may prevent other components overloading the same method from working