Added parsing of field.extra
[dbsrgits/SQL-Translator.git] / lib / SQL / Translator / Manual.pod
CommitLineData
2e2fc2b4 1=head1 NAME
2
3SQL::Translator::Manual
4
5=head1 SYNOPSIS
6
cfdcb09e 7SQL::Translator (AKA "SQLFairy") is a collection of modules for
8transforming (mainly) SQL DDL files into a variety of other formats,
9including other SQL dialects, documentation, images, and code. In
10this manual, we will attempt to address how to use SQLFairy for common
11tasks. For a lower-level discussion of how the code works, please
12read the documentation for L<SQL::Translator>.
2e2fc2b4 13
cfdcb09e 14It may prove helpful to have a general understanding of the SQLFairy
15code before continuing. The code can be broken into three conceptual
16groupings:
2e2fc2b4 17
cfdcb09e 18=over 4
2e2fc2b4 19
cfdcb09e 20=item * Parsers
2e2fc2b4 21
cfdcb09e 22The parsers are responsible for reading the input files and describing
23them to the Schema object middleware.
2e2fc2b4 24
cfdcb09e 25=item * Producers
2e2fc2b4 26
cfdcb09e 27The producers create the output as described by the Schema middleware.
2e2fc2b4 28
cfdcb09e 29=item * Schema objects
30
31The Schema objects bridge the communication between the Parsers and
32Producers by representing any parsed file through a standard set of
33generic objects to represent concepts like Tables, Fields (columns),
34Indices, Constraints, etc.
35
36=back
37
38It's not necessary to understand how to write or manipulate any
39of these for most common tasks, but you should aware of the concepts
40as they will be referenced later in this document.
41
42=head1 SQLFAIRY SCRIPTS
43
44Most common tasks can be accomplished through the use of the script
45interfaces to the SQL::Translator code. All SQLFairy scripts begin
46with "sqlt." Here are the scripts and a description of what they each
47do:
48
49=over 4
50
51=item * sqlt
52
53This is the main interface for text-to-text translations, e.g.,
54converting a MySQL schema to Oracle.
55
56=item * sqlt-diagram
57
58This is a tailored interface for the Diagram producer and its many
59myriad options.
60
61=item * sqlt-diff
62
63This script will examine two schemas and report the SQL commands
64(ALTER, CREATE) needed to turn the first schema into the second.
65
66=item * sqlt-dumper
67
68This script generates a Perl script that can be used to connect to a
69database and dump the data in each table in different formats, similar
70to the "mysqldump" program.
71
72=item * sqlt-graph
73
74This is an interface to the GraphViz visualization tool and its myriad
75options.
76
77=item * sqlt.cgi
78
79This is a CGI script that presents an HTML form for uploading or
80pasting a schema and choosing an output and the output options.
81
82=back
83
84To read the full documentation for each script, use "perldoc" (or
85execute any of the command-line scripts with the "--help" flag).
86
87=head1 CONVERTING SQL DIALECTS
88
89Probably the most common task SQLFairy is used for is to convert one
90dialect of SQL to another. If you have a text description of an SQL
91database (AKA a "DDL" -- "Data Definition Language"), then you should
92use the "sqlt" script with switches to indicate the parser and
93producer and the name of the text file as the final argument. For
94example, to convert the "foo.sql" MySQL schema to a version suitable
95for PostgreSQL, you would do the following:
96
97 $ sqlt -f MySQL -t PostgreSQL foo.sql > foo-pg.sql
98
99The "from" and "to" options are case-sensitive and must match exactly
100the names of the Parser and Producer classes in SQL::Translator. For
101a complete listing of your options, execute "sqlt" with the "--list"
102flag.
103
104=head1 EXTRACT SQL SCHEMAS DIRECTLY FROM DATABASE
105
106It is possible to extract some schemas directly from the database
107without parsing a text file (the "foo.sql" in the above example).
108This can prove significantly faster than parsing a text file. To
109do this, use the "DBI" parser and provide the necessary arguments to
110connect to the database and indicate the producer class, like so:
111
112 $ sqlt -f DBI --dsn dbi:mysql:FOO --db-user guest \
113 --db-password p4ssw0rd -t PostgreSQL > foo
114
115The "--list" option to "sqlt" will show the databases supported by
116DBI parsers.
117
118=head1 HANDLING NON-SQL DATA
119
120Certain structured document formats can be easily thought of as
121tables. SQLFairy can parse Microsoft Excel spreadsheets and
122arbitrarily delimited text files just as if they were schemas which
123contained only one table definition. The column names are normalized
124to something sane for most databases (whitespace is converted to
125underscores and non-word characters are removed), and the data in each
126field is scanned to determine the appropriate data type (character,
127integer, or float) and size. For instance, to convert a
128comma-separated file to an SQLite database, do the following:
129
130 $ sqlt -f xSV --fs ',' -t SQLite foo.csv > foo-sqlite.sql
131
132Additionally, there are non-SQL represenations of relational schemas
133such as XML and XMI. Currently the XMI support in SQLFairy is
134experimental and not released. Additionally, the only XML supported
135is our own version; however, it would be fairly easy to add an XML
136parser for something like the TorqueDB (http://db.apache.org/torque/)
137project. The actual parsing of XML should be trivial given the number
138of XML parsers available, so all that would be left would be to map
139the specific concepts in the source file to the Schema objects in
140SQLFairy.
141
142To convert a schema in SQLFairy's XML dialect to Oracle, do the following:
143
144 $ sqlt -f XML-SQLFairy -t Oracle foo.xml > foo-oracle.sql
145
146=head1 SERIALIZING SCHEMAS
2e2fc2b4 147
cfdcb09e 148Parsing a schema is generally the most computationally expensive
149operation performed by SQLFairy, so it may behoove you to serialize a
150parsed schema if you need to perform repeated conversions. For
151example, as part of a build process the author converts a MySQL schema
152first to YAML, then to PostgreSQL, Oracle, SQLite and Sybase.
153Additionally, a variety of documention in HTML and images is produced.
154This can be accomplished like so:
155
156 $ sqlt -f MySQL -t YAML schema-mysql.sql > schema.yaml
157 $ sqlt -f YAML -t Oracle schema.yaml > schema-oracle.sql
158 $ sqlt -f YAML -t PostgreSQL schema.yaml > schema-postgresql.sql
159 $ ...
160
161SQLFairy has three serialization producers, none of which is superior
162to the other in their description of a schema.
163
164=over 4
165
166=item * XML-SQLFairy
167
168This is the aforementioned XML format. It is essentially a direct
169mapping of the Schema objects into XML. This can also provide a very
170convenient bridge to describing a schema to a non-Perl application.
171Providing a producer argument to "sqlt" of just "XML" will default to
172using "XML-SQLFairy."
173
174=item * Storable
175
176This producer stores the Schema object using Perl's Storable.pm module
177available on CPAN.
178
179=item * YAML
180
181This producer serialized the Schema object with the very readable
182structured data format of YAML (http://www.yaml.org/). Earlier
183examples show serializing to YAML.
184
185=back
186
187=head1 VISUALIZING SQL SCHEMAS
188
189The visualization tools in SQLFairy can graphically represent the
190tables, fields, datatypes and sizes, constraints, and foreign key
191relationships in a very compact and intuitive format. This can be
192very beneficial in understanding and document large or small schemas.
193Two producers in SQLFairy will create pseudo-E/R (entity-relationship)
194diagrams:
195
196=over 4
197
198=item * Diagram
199
200The first visualization tool in SQLFairy, this producer uses libgd to
201draw a picture of the schema. The tables are evenly distributed in
202definition order running in columns (i.e., no graphing algorithms are
30bda799 203used), so the many of the lines showing the foreign key relationships
204may cross over each other and the table boxes. Please read the
205documentation of the "sqlt-diagram" script for all the options
206available to this producer.
cfdcb09e 207
208=item * GraphViz
209
210The layout of the GraphViz producer is far superior to the Diagram
211producer as it uses the Graphviz binary from Bell Labs to create very
212professional-looking graphs. There are several different layout
213algorithms and node shapes available. Please see the documentation of
214the "sqlt-graph" script for more information.
215
216=back
217
218=head1 AUTOMATED CODE-GENERATION
219
220Given that so many applications interact with SQL databases, it's no
221wonder that people have automated code to deal with this interaction.
222Class::DBI from CPAN is one such module that allows a developer to
223describe the relationships between tables and fields in class
224declarations and then generates all the SQL to interact (SELECT,
225UPDATE, DELETE, INSERT statements) at runtime. Obviously, the schema
226already describes itself, so it only makes sense that you should be
227able to generate this kind of code directly from the schema. The
228"ClassDBI" producer in SQLFairy does just this, creating a Perl module
229that inherits from Class::DBI and sets up most of the code needed to
230interact with the database. Here is an example of how to do this:
231
232 $ sqlt -f MySQL -t ClassDBI foo.sql > Foo.pm
233
234Then simply edit Foo.pm as needed and include it in your code.
235
21521f16 236=head1 CREATING A DATA DUMPER SCRIPT
237
238The Dumper producer creates a Perl script that can select the fields
239in each table and then create "INSERT" statements for each record in
240the database similar to the output generated by MySQL's "mysqldump"
241program:
242
243 $ sqlt -f YAML -t Dumper --dumper-db-user guest \
244 > --dumper-db-pass p4ssw0rd --dumper-dsn dbi:mysql:FOO \
245 > foo.yaml > foo-dumper.pl
246
247And then execute the resulting script to dump the data:
248
249 $ chmod +x foo-dumper.pl
250 $ ./foo-dumper.pl > foo-data.sql
251
252The dumper script also has a number of options available. Execute the
253script with the "--help" flag to read about them.
254
cfdcb09e 255=head1 DOCUMENTING WITH SQL::TRANSLATOR
2e2fc2b4 256
cfdcb09e 257SQLFairy offers two producers to help document schemas:
2e2fc2b4 258
cfdcb09e 259=over 4
2e2fc2b4 260
cfdcb09e 261=item * HTML
2e2fc2b4 262
cfdcb09e 263This producer creates a single HTML document which uses HTML
264formatting to describe the Schema objects and to create hyperlinks on
265foreign key relationships. This can be a surprisingly useful
266documentation aid as it creates a very readable format that allows one
267to jump easily to specific tables and fields. It's also possible to
268plugin your own CSS to further control the presentation of the HTML.
2e2fc2b4 269
cfdcb09e 270=item * POD
2e2fc2b4 271
cfdcb09e 272This is arguably not that useful of a producer by itself, but the
273number of POD-conversion tools could be used to further transform the
274POD into something more interesting. The schema is basically
275represented in POD sections where tables are broken down into fields,
276indices, constraints, foreign keys, etc.
2e2fc2b4 277
cfdcb09e 278=back
2e2fc2b4 279
cfdcb09e 280=head1 TEMPLATE-BASED MANIPULATION OF SCHEMA OBJECTS
2e2fc2b4 281
cfdcb09e 282All of the producers which create text output could have been coded
283using a templating system to mix in the dynamic output with static
284text. CPAN offers several diverse templating systems, but few are as
285powerful as Template Toolkit (http://www.template-toolkit.org/). You
286can easily create your own producer without writing any Perl code at
287all simply by writing a template using Template Toolkit's syntax. The
288template will be passed a reference to the Schema object briefly
289described at the beginning of this document and mentioned many times
290throughout. For example, you could create a template that simply
291prints the name of each table and field that looks like this:
2e2fc2b4 292
cfdcb09e 293 # file: schema.tt
294 [% FOREACH table IN schema.get_tables %]
295 Table: [% table.name %]
296 Fields:
297 [% FOREACH field IN table.get_fields -%]
298 [% field.name %]
299 [% END -%]
300 [% END %]
2e2fc2b4 301
cfdcb09e 302And the process it like so:
2e2fc2b4 303
cfdcb09e 304 $ sqlt -f YAML -t TTSchema --template schema.tt foo.yaml
2e2fc2b4 305
cfdcb09e 306To create output like this:
2e2fc2b4 307
cfdcb09e 308 Table: foo
309 Fields:
310 foo_id
311 foo_name
2e2fc2b4 312
cfdcb09e 313For more information on Template Toolkit, please install the
314"Template" module and read the POD.
2e2fc2b4 315
1b0fe900 316=head1 FINDING THE DIFFERENCES BETWEEN TWO SCHEMAS
317
318As mentioned above, the "sqlt-diff" schema examines two schemas and
319creates SQL schema modification statements that can be used to
320transform the first schema into the second. The flag syntax is
321somewhat quirky:
322
323 $ sqlt-diff foo-v1.sql=MySQL foo-v2.sql=Oracle > diff.sql
324
325As demonstrated, the schemas need not even be from the same vendor,
326though this is likely to produce some spurious results as
327datatypes are not currently viewed equivalent unless they match
328exactly, even if they would be converted to the same. For example,
329MySQL's "integer" data type would be converted to Oracle's "number,"
330but the differ isn't quite smart enough yet to figure this out. Also,
331as the SQL to ALTER a field definition varies from database vendor to
332vendor, these statements are made using just the keyword "CHANGE" and
333will likely need to be corrected for the target database.
334
335=head1 A UNIFIED GRAPHICAL INTERFACE
336
337Seeing all the above options and scripts, you may be pining for a
338single, graphical interface to handle all these transformations and
339choices. This is exactly what the "sqlt.cgi" script provides. Simply
340drop this script into your web server's CGI directory and enable the
341execute bit and you can point your web browser to an HTML form which
342provides a simple interface to all the SQLFairy parsers and producers.
343
cfdcb09e 344=head1 PLUGIN YOUR OWN PARSERS AND PRODUCERS
2e2fc2b4 345
cfdcb09e 346Now that you have seen how the parsers and producers interact via the
347Schema objects, you may wish to create your own versions to plugin.
2e2fc2b4 348
30bda799 349Producers are probably the easier concept to grok, so let's cover that
350first. By far the easiest way to create custom output is to use the
351TTSchema producer in conjunction with a Template Toolkit template as
352described earlier. However, you can also easily pass a reference to a
353subroutine that SQL::Translator can call for the production of the
354ouput. This subroutine will be passed a single argument of the
355SQL::Translator object which you can use to access the Schema objects.
356Please read the POD for SQL::Translator and SQL::Translator::Schema to
357learn the methods you can call. Here is a very simple example:
358
359 #!/usr/bin/perl
360
90f67010 361 use strict;
30bda799 362 use SQL::Translator;
363
364 my $input = q[
365 create table foo (
366 foo_id int not null default '0' primary key,
367 foo_name varchar(30) not null default ''
368 );
369
370 create table bar (
371 bar_id int not null default '0' primary key,
372 bar_value varchar(100) not null default ''
373 );
374 ];
375
376 my $t = SQL::Translator->new;
377 $t->parser('MySQL') or die $t->error;
378 $t->producer( \&produce ) or die $t->error;
379 my $output = $t->translate( \$input ) or die $t->error;
380 print $output;
381
382 sub produce {
383 my $tr = shift;
384 my $schema = $tr->schema;
385 my $output = '';
386 for my $t ( $schema->get_tables ) {
387 $output .= join('', "Table = ", $t->name, "\n");
388 }
389 return $output;
390 }
391
392Executing this script produces the following:
393
394 $ ./my-producer.pl
395 Table = foo
396 Table = bar
397
398A custom parser will be passed two arguments: the SQL::Translator
399object and the data to be parsed. In this example, the schema will be
400represented in a simple text format. Each line is a table definition
401where the fields are separated by colons. The first field is the
402table name and the following fields are column definitions where the
403column name, data type and size are separated by spaces. The
404specifics of the example are unimportant -- what is being demonstrated
405is that you have to decide how to parse the incoming data and then
406map the concepts in the data to the Schema object.
407
408 #!/usr/bin/perl
409
410 use strict;
411 use SQL::Translator;
412
413 my $input =
414 "foo:foo_id int 11:foo_name varchar 30\n" .
415 "bar:bar_id int 11:bar_value varchar 30"
416 ;
417
418 my $t = SQL::Translator->new;
419 $t->parser( \&parser ) or die $t->error;
420 $t->producer('Oracle') or die $t->error;
421 my $output = $t->translate( \$input ) or die $t->error;
422 print $output;
423
424 sub parser {
425 my ( $tr, $data ) = @_;
426 my $schema = $tr->schema;
427
428 for my $line ( split( /\n/, $data ) ) {
429 my ( $table_name, @fields ) = split( /:/, $line );
430 my $table = $schema->add_table( name => $table_name )
431 or die $schema->error;
432 for ( @fields ) {
433 my ( $f_name, $type, $size ) = split;
434 $table->add_field(
435 name => $f_name,
436 data_type => $type,
437 size => $size,
438 ) or die $table->error;
439 }
440 }
441
442 return 1;
443 }
444
445And here is the output produced by this script:
446
447 --
448 -- Created by SQL::Translator::Producer::Oracle
449 -- Created on Wed Mar 31 15:43:30 2004
450 --
451 --
452 -- Table: foo
453 --
454
455 CREATE TABLE foo (
456 foo_id number(11),
457 foo_name varchar2(30)
458 );
459
460 --
461 -- Table: bar
462 --
463
464 CREATE TABLE bar (
465 bar_id number(11),
466 bar_value varchar2(30)
467 );
468
90f67010 469If you create a useful parser or producer, you are encouraged to
470submit your work to the SQLFairy project!
471
08a6f829 472=head1 PLUGIN TEMPLATE TOOLKIT PRODUCERS
473
474You may find that the TTSchema producer doesn't give you enough control over
475templating and you want to play with the Template config or add you own
476variables. Or maybe you just have a really good template you want to submit to
477SQLFairy :) If so, the SQL::Translator::Producer::TT::Base producer may be
478just for you! Instead of working like a normal producer it provides a base
479class so you can cheaply build new producer modules based on templates.
480
481It's simplest use is when we just want to put a single template in its own
482module. So to create a Foo producer we create a F<Custom/Foo.pm> file as
483follows, putting our template in the __DATA__ section.
484
485 package Custom::Foo.pm;
486 use base qw/SQL::Translator::Producer::TT::Base/;
487 # Use our new class as the producer
488 sub produce { return __PACKAGE__->new( translator => shift )->run; };
489
490 __DATA__
491 [% FOREACH table IN schema.get_tables %]
492 Table: [% table.name %]
493 Fields:
494 [% FOREACH field IN table.get_fields -%]
495 [% field.name %]
496 [% END -%]
497 [% END %]
498
499For that we get a producer called Custom::Foo that we can now call like a
500normal producer (as long as the directory with F<Custom/Foo.pm> is in our @INC
501path):
502
503 $ sqlt -f YAML -t Custom-Foo foo.yaml
504
505The template gets variables of C<schema> and C<translator> to use in building
506its output. You also get a number of methods you can override to hook into the
507template generation.
508
509B<tt_config> Allows you to set the config options used by the Template object.
510The Template Toolkit provides a huge number of options which allow you to do all
511sorts of magic (See L<Template::Manual::Config> for details). This method
512provides a hook into them by returning a hash of options for the Template. e.g.
513Say you want to use the INTERPOLATE option to save some typing in your template;
514
515 sub tt_config { ( INTERPOLATE => 1 ); }
516
517A common use for this is to add you own filters to the template:
518
519 sub tt_config {(
520 INTERPOLATE => 1,
521 FILTERS => { foo_filter => \&foo_filter, }
522 );}
523
524Another common extension is adding your own template variables. This is done
525with B<tt_vars>:
526
527 sub tt_vars { ( foo => "bar" ); }
528
529What about using template files instead of DATA sections? You can already! if
530you give a template on the command line your new producer will use that instead
531of reading the DATA section:
532
533 $ sqlt -f YAML -t Custom-Foo --template foo.tt foo.yaml
534
535This is usefull as you can set up a producer that adds a set of filters and
536variables that you can then use in templates given on the command line. (There
537is also a tt_schema method to over ride if you need even finer control over the
538source of your template). Note that if you leave out the DATA section all
539together then your producer will require a template file name to be given.
540
541See L<SQL::Translator::Producer::TT::Base> for more details of what you can do.
542
2e2fc2b4 543=head1 AUTHOR
544
977651a5 545Ken Y. Clark E<lt>kclark@cpan.orgE<gt>.