lib/SQL/Translator/Manual.pod

   1 =head1 NAME
   2
   3 SQL::Translator::Manual - sqlfairy user manual
   4
   5 =head1 SYNOPSIS
   6
   7 SQL::Translator (AKA "SQLFairy") is a collection of modules for
   8 transforming (mainly) SQL DDL files into a variety of other formats,
   9 including other SQL dialects, documentation, images, and code.  In
  10 this manual, we will attempt to address how to use SQLFairy for common
  11 tasks.  For a lower-level discussion of how the code works, please
  12 read the documentation for L<SQL::Translator>.
  13
  14 It may prove helpful to have a general understanding of the SQLFairy
  15 code before continuing.  The code can be broken into three conceptual
  16 groupings:
  17
  18 =over 4
  19
  20 =item * Parsers
  21
  22 The parsers are responsible for reading the input files and describing
  23 them to the Schema object middleware.
  24
  25 =item * Producers
  26
  27 The producers create the output as described by the Schema middleware.
  28
  29 =item * Schema objects
  30
  31 The Schema objects bridge the communication between the Parsers and
  32 Producers by representing any parsed file through a standard set of
  33 generic objects to represent concepts like Tables, Fields (columns),
  34 Indices, Constraints, etc.
  35
  36 =back
  37
  38 It's not necessary to understand how to write or manipulate any
  39 of these for most common tasks, but you should aware of the concepts
  40 as they will be referenced later in this document.
  41
  42 =head1 SQLFAIRY SCRIPTS
  43
  44 Most common tasks can be accomplished through the use of the script
  45 interfaces to the SQL::Translator code.  All SQLFairy scripts begin
  46 with "sqlt."  Here are the scripts and a description of what they each
  47 do:
  48
  49 =over 4
  50
  51 =item * sqlt
  52
  53 This is the main interface for text-to-text translations, e.g.,
  54 converting a MySQL schema to Oracle.
  55
  56 =item * sqlt-diagram
  57
  58 This is a tailored interface for the Diagram producer and its many
  59 myriad options.
  60
  61 =item * sqlt-diff
  62
  63 This script will examine two schemas and report the SQL commands
  64 (ALTER, CREATE) needed to turn the first schema into the second.
  65
  66 =item * sqlt-dumper
  67
  68 This script generates a Perl script that can be used to connect to a
  69 database and dump the data in each table in different formats, similar
  70 to the "mysqldump" program.
  71
  72 =item * sqlt-graph
  73
  74 This is an interface to the GraphViz visualization tool and its myriad
  75 options.
  76
  77 =item * sqlt.cgi
  78
  79 This is a CGI script that presents an HTML form for uploading or
  80 pasting a schema and choosing an output and the output options.
  81
  82 =back
  83
  84 To read the full documentation for each script, use "perldoc" (or
  85 execute any of the command-line scripts with the "--help" flag).
  86
  87 =head1 CONVERTING SQL DIALECTS
  88
  89 Probably the most common task SQLFairy is used for is to convert one
  90 dialect of SQL to another.  If you have a text description of an SQL
  91 database (AKA a "DDL" -- "Data Definition Language"), then you should
  92 use the "sqlt" script with switches to indicate the parser and
  93 producer and the name of the text file as the final argument.  For
  94 example, to convert the "foo.sql" MySQL schema to a version suitable
  95 for PostgreSQL, you would do the following:
  96
  97   $ sqlt -f MySQL -t PostgreSQL foo.sql > foo-pg.sql
  98
  99 The "from" and "to" options are case-sensitive and must match exactly
 100 the names of the Parser and Producer classes in SQL::Translator.  For
 101 a complete listing of your options, execute "sqlt" with the "--list"
 102 flag.
 103
 104 =head1 EXTRACT SQL SCHEMAS DIRECTLY FROM DATABASE
 105
 106 It is possible to extract some schemas directly from the database
 107 without parsing a text file (the "foo.sql" in the above example).
 108 This can prove significantly faster than parsing a text file.  To
 109 do this, use the "DBI" parser and provide the necessary arguments to
 110 connect to the database and indicate the producer class, like so:
 111
 112   $ sqlt -f DBI --dsn dbi:mysql:FOO --db-user guest \
 113     --db-password p4ssw0rd -t PostgreSQL > foo
 114
 115 The "--list" option to "sqlt" will show the databases supported by
 116 DBI parsers.
 117
 118 =head1 HANDLING NON-SQL DATA
 119
 120 Certain structured document formats can be easily thought of as
 121 tables.  SQLFairy can parse Microsoft Excel spreadsheets and
 122 arbitrarily delimited text files just as if they were schemas which
 123 contained only one table definition.  The column names are normalized
 124 to something sane for most databases (whitespace is converted to
 125 underscores and non-word characters are removed), and the data in each
 126 field is scanned to determine the appropriate data type (character,
 127 integer, or float) and size.  For instance, to convert a
 128 comma-separated file to an SQLite database, do the following:
 129
 130   $ sqlt -f xSV --fs ',' -t SQLite foo.csv > foo-sqlite.sql
 131
 132 Additionally, there is a non-SQL representation of relational schemas namely
 133 XML.  Additionally, the only XML supported is our own version;  however, it
 134 would be fairly easy to add an XML parser for something like the TorqueDB
 135 (http://db.apache.org/torque/) project.  The actual parsing of XML should be
 136 trivial given the number of XML parsers available, so all that would be left
 137 would be to map the specific concepts in the source file to the Schema objects
 138 in SQLFairy.
 139
 140 To convert a schema in SQLFairy's XML dialect to Oracle, do the following:
 141
 142   $ sqlt -f XML-SQLFairy -t Oracle foo.xml > foo-oracle.sql
 143
 144 =head1 SERIALIZING SCHEMAS
 145
 146 Parsing a schema is generally the most computationally expensive
 147 operation performed by SQLFairy, so it may behoove you to serialize a
 148 parsed schema if you need to perform repeated conversions.  For
 149 example, as part of a build process the author converts a MySQL schema
 150 first to YAML, then to PostgreSQL, Oracle, SQLite and Sybase.
 151 Additionally, a variety of documentation in HTML and images is produced.
 152 This can be accomplished like so:
 153
 154   $ sqlt -f MySQL -t YAML schema-mysql.sql > schema.yaml
 155   $ sqlt -f YAML -t Oracle schema.yaml > schema-oracle.sql
 156   $ sqlt -f YAML -t PostgreSQL schema.yaml > schema-postgresql.sql
 157   $ ...
 158
 159 SQLFairy has three serialization producers, none of which is superior
 160 to the other in their description of a schema.
 161
 162 =over 4
 163
 164 =item * XML-SQLFairy
 165
 166 This is the aforementioned XML format.  It is essentially a direct
 167 mapping of the Schema objects into XML.  This can also provide a very
 168 convenient bridge to describing a schema to a non-Perl application.
 169 Providing a producer argument to "sqlt" of just "XML" will default to
 170 using "XML-SQLFairy."
 171
 172 =item * Storable
 173
 174 This producer stores the Schema object using Perl's Storable.pm module
 175 available on CPAN.
 176
 177 =item * YAML
 178
 179 This producer serialized the Schema object with the very readable
 180 structured data format of YAML (http://www.yaml.org/).  Earlier
 181 examples show serializing to YAML.
 182
 183 =back
 184
 185 =head1 VISUALIZING SQL SCHEMAS
 186
 187 The visualization tools in SQLFairy can graphically represent the
 188 tables, fields, datatypes and sizes, constraints, and foreign key
 189 relationships in a very compact and intuitive format.  This can be
 190 very beneficial in understanding and document large or small schemas.
 191 Two producers in SQLFairy will create pseudo-E/R (entity-relationship)
 192 diagrams:
 193
 194 =over 4
 195
 196 =item * Diagram
 197
 198 The first visualization tool in SQLFairy, this producer uses libgd to
 199 draw a picture of the schema.  The tables are evenly distributed in
 200 definition order running in columns (i.e., no graphing algorithms are
 201 used), so the many of the lines showing the foreign key relationships
 202 may cross over each other and the table boxes.  Please read the
 203 documentation of the "sqlt-diagram" script for all the options
 204 available to this producer.
 205
 206 =item * GraphViz
 207
 208 The layout of the GraphViz producer is far superior to the Diagram
 209 producer as it uses the Graphviz binary from Bell Labs to create very
 210 professional-looking graphs.  There are several different layout
 211 algorithms and node shapes available.  Please see the documentation of
 212 the "sqlt-graph" script for more information.
 213
 214 =back
 215
 216 =head1 AUTOMATED CODE-GENERATION
 217
 218 Given that so many applications interact with SQL databases, it's no
 219 wonder that people have automated code to deal with this interaction.
 220 Class::DBI from CPAN is one such module that allows a developer to
 221 describe the relationships between tables and fields in class
 222 declarations and then generates all the SQL to interact (SELECT,
 223 UPDATE, DELETE, INSERT statements) at runtime.  Obviously, the schema
 224 already describes itself, so it only makes sense that you should be
 225 able to generate this kind of code directly from the schema.  The
 226 "ClassDBI" producer in SQLFairy does just this, creating a Perl module
 227 that inherits from Class::DBI and sets up most of the code needed to
 228 interact with the database.  Here is an example of how to do this:
 229
 230   $ sqlt -f MySQL -t ClassDBI foo.sql > Foo.pm
 231
 232 Then simply edit Foo.pm as needed and include it in your code.
 233
 234 =head1 CREATING A DATA DUMPER SCRIPT
 235
 236 The Dumper producer creates a Perl script that can select the fields
 237 in each table and then create "INSERT" statements for each record in
 238 the database similar to the output generated by MySQL's "mysqldump"
 239 program:
 240
 241   $ sqlt -f YAML -t Dumper --dumper-db-user guest \
 242   > --dumper-db-pass p4ssw0rd --dumper-dsn dbi:mysql:FOO \
 243   > foo.yaml > foo-dumper.pl
 244
 245 And then execute the resulting script to dump the data:
 246
 247   $ chmod +x foo-dumper.pl
 248   $ ./foo-dumper.pl > foo-data.sql
 249
 250 The dumper script also has a number of options available.  Execute the
 251 script with the "--help" flag to read about them.
 252
 253 =head1 DOCUMENTING WITH SQL::TRANSLATOR
 254
 255 SQLFairy offers two producers to help document schemas:
 256
 257 =over 4
 258
 259 =item * HTML
 260
 261 This producer creates a single HTML document which uses HTML
 262 formatting to describe the Schema objects and to create hyperlinks on
 263 foreign key relationships.  This can be a surprisingly useful
 264 documentation aid as it creates a very readable format that allows one
 265 to jump easily to specific tables and fields.  It's also possible to
 266 plugin your own CSS to further control the presentation of the HTML.
 267
 268 =item * POD
 269
 270 This is arguably not that useful of a producer by itself, but the
 271 number of POD-conversion tools could be used to further transform the
 272 POD into something more interesting.  The schema is basically
 273 represented in POD sections where tables are broken down into fields,
 274 indices, constraints, foreign keys, etc.
 275
 276 =back
 277
 278 =head1 TEMPLATE-BASED MANIPULATION OF SCHEMA OBJECTS
 279
 280 All of the producers which create text output could have been coded
 281 using a templating system to mix in the dynamic output with static
 282 text.  CPAN offers several diverse templating systems, but few are as
 283 powerful as Template Toolkit (http://www.template-toolkit.org/).  You
 284 can easily create your own producer without writing any Perl code at
 285 all simply by writing a template using Template Toolkit's syntax.  The
 286 template will be passed a reference to the Schema object briefly
 287 described at the beginning of this document and mentioned many times
 288 throughout.  For example, you could create a template that simply
 289 prints the name of each table and field that looks like this:
 290
 291   # file: schema.tt
 292   [% FOREACH table IN schema.get_tables %]
 293   Table: [% table.name %]
 294   Fields:
 295   [% FOREACH field IN table.get_fields -%]
 296     [% field.name %]
 297   [% END -%]
 298   [% END %]
 299
 300 And then process it like so:
 301
 302   $ sqlt -f YAML -t TTSchema --template schema.tt foo.yaml
 303
 304 To create output like this:
 305
 306   Table: foo
 307   Fields:
 308     foo_id
 309     foo_name
 310
 311 For more information on Template Toolkit, please install the
 312 "Template" module and read the POD.
 313
 314 =head1 FINDING THE DIFFERENCES BETWEEN TWO SCHEMAS
 315
 316 As mentioned above, the "sqlt-diff" schema examines two schemas and
 317 creates SQL schema modification statements that can be used to
 318 transform the first schema into the second.  The flag syntax is
 319 somewhat quirky:
 320
 321   $ sqlt-diff foo-v1.sql=MySQL foo-v2.sql=Oracle > diff.sql
 322
 323 As demonstrated, the schemas need not even be from the same vendor,
 324 though this is likely to produce some spurious results as
 325 datatypes are not currently viewed equivalent unless they match
 326 exactly, even if they would be converted to the same.  For example,
 327 MySQL's "integer" data type would be converted to Oracle's "number,"
 328 but the differ isn't quite smart enough yet to figure this out.  Also,
 329 as the SQL to ALTER a field definition varies from database vendor to
 330 vendor, these statements are made using just the keyword "CHANGE" and
 331 will likely need to be corrected for the target database.
 332
 333 =head1 A UNIFIED GRAPHICAL INTERFACE
 334
 335 Seeing all the above options and scripts, you may be pining for a
 336 single, graphical interface to handle all these transformations and
 337 choices.  This is exactly what the "sqlt.cgi" script provides.  Simply
 338 drop this script into your web server's CGI directory and enable the
 339 execute bit and you can point your web browser to an HTML form which
 340 provides a simple interface to all the SQLFairy parsers and producers.
 341
 342 =head1 PLUGIN YOUR OWN PARSERS AND PRODUCERS
 343
 344 Now that you have seen how the parsers and producers interact via the
 345 Schema objects, you may wish to create your own versions to plugin.
 346
 347 Producers are probably the easier concept to grok, so let's cover that
 348 first.  By far the easiest way to create custom output is to use the
 349 TTSchema producer in conjunction with a Template Toolkit template as
 350 described earlier.  However, you can also easily pass a reference to a
 351 subroutine that SQL::Translator can call for the production of the
 352 output.  This subroutine will be passed a single argument of the
 353 SQL::Translator object which you can use to access the Schema objects.
 354 Please read the POD for SQL::Translator and SQL::Translator::Schema to
 355 learn the methods you can call.  Here is a very simple example:
 356
 357   #!/usr/bin/perl
 358
 359   use strict;
 360   use SQL::Translator;
 361
 362   my $input = q[
 363       create table foo (
 364           foo_id int not null default '0' primary key,
 365           foo_name varchar(30) not null default ''
 366       );
 367
 368       create table bar (
 369           bar_id int not null default '0' primary key,
 370           bar_value varchar(100) not null default ''
 371       );
 372   ];
 373
 374   my $t = SQL::Translator->new;
 375   $t->parser('MySQL') or die $t->error;
 376   $t->producer( \&produce ) or die $t->error;
 377   my $output = $t->translate( \$input ) or die $t->error;
 378   print $output;
 379
 380   sub produce {
 381       my $tr     = shift;
 382       my $schema = $tr->schema;
 383       my $output = '';
 384       for my $t ( $schema->get_tables ) {
 385           $output .= join('', "Table = ", $t->name, "\n");
 386       }
 387       return $output;
 388   }
 389
 390 Executing this script produces the following:
 391
 392   $ ./my-producer.pl
 393   Table = foo
 394   Table = bar
 395
 396 A custom parser will be passed two arguments:  the SQL::Translator
 397 object and the data to be parsed.  In this example, the schema will be
 398 represented in a simple text format.  Each line is a table definition
 399 where the fields are separated by colons.  The first field is the
 400 table name and the following fields are column definitions where the
 401 column name, data type and size are separated by spaces.  The
 402 specifics of the example are unimportant -- what is being demonstrated
 403 is that you have to decide how to parse the incoming data and then
 404 map the concepts in the data to the Schema object.
 405
 406   #!/usr/bin/perl
 407
 408   use strict;
 409   use SQL::Translator;
 410
 411   my $input =
 412       "foo:foo_id int 11:foo_name varchar 30\n" .
 413       "bar:bar_id int 11:bar_value varchar 30"
 414   ;
 415
 416   my $t = SQL::Translator->new;
 417   $t->parser( \&parser ) or die $t->error;
 418   $t->producer('Oracle') or die $t->error;
 419   my $output = $t->translate( \$input ) or die $t->error;
 420   print $output;
 421
 422   sub parser {
 423       my ( $tr, $data ) = @_;
 424       my $schema = $tr->schema;
 425
 426       for my $line ( split( /\n/, $data ) ) {
 427           my ( $table_name, @fields ) = split( /:/, $line );
 428           my $table = $schema->add_table( name => $table_name )
 429               or die $schema->error;
 430           for ( @fields ) {
 431               my ( $f_name, $type, $size ) = split;
 432               $table->add_field(
 433                   name      => $f_name,
 434                   data_type => $type,
 435                   size      => $size,
 436               ) or die $table->error;
 437           }
 438       }
 439
 440       return 1;
 441   }
 442
 443 And here is the output produced by this script:
 444
 445   --
 446   -- Created by SQL::Translator::Producer::Oracle
 447   -- Created on Wed Mar 31 15:43:30 2004
 448   --
 449   --
 450   -- Table: foo
 451   --
 452
 453   CREATE TABLE foo (
 454     foo_id number(11),
 455     foo_name varchar2(30)
 456   );
 457
 458   --
 459   -- Table: bar
 460   --
 461
 462   CREATE TABLE bar (
 463     bar_id number(11),
 464     bar_value varchar2(30)
 465   );
 466
 467 If you create a useful parser or producer, you are encouraged to
 468 submit your work to the SQLFairy project!
 469
 470 =head1 PLUGIN TEMPLATE TOOLKIT PRODUCERS
 471
 472 You may find that the TTSchema producer doesn't give you enough control over
 473 templating and you want to play with the Template config or add you own
 474 variables. Or maybe you just have a really good template you want to submit to
 475 SQLFairy :) If so, the SQL::Translator::Producer::TT::Base producer may be
 476 just for you! Instead of working like a normal producer it provides a base
 477 class so you can cheaply build new producer modules based on templates.
 478
 479 It's simplest use is when we just want to put a single template in its own
 480 module. So to create a Foo producer we create a F<Custom/Foo.pm> file as
 481 follows, putting our template in the __DATA__ section.
 482
 483  package Custom::Foo.pm;
 484  use base qw/SQL::Translator::Producer::TT::Base/;
 485  # Use our new class as the producer
 486  sub produce { return __PACKAGE__->new( translator => shift )->run; };
 487
 488  __DATA__
 489  [% FOREACH table IN schema.get_tables %]
 490  Table: [% table.name %]
 491  Fields:
 492  [% FOREACH field IN table.get_fields -%]
 493    [% field.name %]
 494  [% END -%]
 495  [% END %]
 496
 497 For that we get a producer called Custom::Foo that we can now call like a
 498 normal producer (as long as the directory with F<Custom/Foo.pm> is in our @INC
 499 path):
 500
 501  $ sqlt -f YAML -t Custom-Foo foo.yaml
 502
 503 The template gets variables of C<schema> and C<translator> to use in building
 504 its output. You also get a number of methods you can override to hook into the
 505 template generation.
 506
 507 B<tt_config> Allows you to set the config options used by the Template object.
 508 The Template Toolkit provides a huge number of options which allow you to do all
 509 sorts of magic (See L<Template::Manual::Config> for details). This method
 510 provides a hook into them by returning a hash of options for the Template. e.g.
 511 Say you want to use the INTERPOLATE option to save some typing in your template;
 512
 513  sub tt_config { ( INTERPOLATE => 1 ); }
 514
 515 Another common use for this is to add you own filters to the template:
 516
 517  sub tt_config {(
 518     INTERPOLATE => 1,
 519     FILTERS => { foo_filter => \&foo_filter, }
 520  );}
 521
 522 Another common extension is adding your own template variables. This is done
 523 with B<tt_vars>:
 524
 525  sub tt_vars { ( foo => "bar" ); }
 526
 527 What about using template files instead of DATA sections? You can already - if
 528 you give a template on the command line your new producer will use that instead
 529 of reading the DATA section:
 530
 531  $ sqlt -f YAML -t Custom-Foo --template foo.tt foo.yaml
 532
 533 This is useful as you can set up a producer that adds a set of filters and
 534 variables that you can then use in templates given on the command line. (There
 535 is also a tt_schema method to over ride if you need even finer control over the
 536 source of your template). Note that if you leave out the DATA section all
 537 together then your producer will require a template file name to be given.
 538
 539 See L<SQL::Translator::Producer::TT::Base> for more details.
 540
 541 =head1 AUTHOR
 542
 543 Ken Y. Clark E<lt>kclark@cpan.orgE<gt>.