[dbsrgits/SQL-Translator.git] / lib / SQL / Translator / Manual.pod

=head1 NAME

SQL::Translator::Manual

=head1 SYNOPSIS

SQL::Translator (AKA "SQLFairy") is a collection of modules for
transforming (mainly) SQL DDL files into a variety of other formats,
including other SQL dialects, documentation, images, and code.  In
this manual, we will attempt to address how to use SQLFairy for common
tasks.  For a lower-level discussion of how the code works, please
read the documentation for L<SQL::Translator>.

It may prove helpful to have a general understanding of the SQLFairy
code before continuing.  The code can be broken into three conceptual
groupings:

=over 4

=item * Parsers

The parsers are responsible for reading the input files and describing 
them to the Schema object middleware.

=item * Producers

The producers create the output as described by the Schema middleware.

=item * Schema objects

The Schema objects bridge the communication between the Parsers and
Producers by representing any parsed file through a standard set of
generic objects to represent concepts like Tables, Fields (columns),
Indices, Constraints, etc.

=back

It's not necessary to understand how to write or manipulate any
of these for most common tasks, but you should aware of the concepts
as they will be referenced later in this document. 

=head1 SQLFAIRY SCRIPTS

Most common tasks can be accomplished through the use of the script
interfaces to the SQL::Translator code.  All SQLFairy scripts begin
with "sqlt."  Here are the scripts and a description of what they each
do:

=over 4

=item * sqlt

This is the main interface for text-to-text translations, e.g.,
converting a MySQL schema to Oracle.  

=item * sqlt-diagram

This is a tailored interface for the Diagram producer and its many
myriad options.

=item * sqlt-diff

This script will examine two schemas and report the SQL commands
(ALTER, CREATE) needed to turn the first schema into the second. 

=item * sqlt-dumper

This script generates a Perl script that can be used to connect to a
database and dump the data in each table in different formats, similar
to the "mysqldump" program.

=item * sqlt-graph

This is an interface to the GraphViz visualization tool and its myriad
options.

=item * sqlt.cgi

This is a CGI script that presents an HTML form for uploading or
pasting a schema and choosing an output and the output options.

=back

To read the full documentation for each script, use "perldoc" (or
execute any of the command-line scripts with the "--help" flag).

=head1 CONVERTING SQL DIALECTS

Probably the most common task SQLFairy is used for is to convert one
dialect of SQL to another.  If you have a text description of an SQL
database (AKA a "DDL" -- "Data Definition Language"), then you should
use the "sqlt" script with switches to indicate the parser and
producer and the name of the text file as the final argument.  For
example, to convert the "foo.sql" MySQL schema to a version suitable
for PostgreSQL, you would do the following:

  $ sqlt -f MySQL -t PostgreSQL foo.sql > foo-pg.sql

The "from" and "to" options are case-sensitive and must match exactly
the names of the Parser and Producer classes in SQL::Translator.  For
a complete listing of your options, execute "sqlt" with the "--list"
flag.

=head1 EXTRACT SQL SCHEMAS DIRECTLY FROM DATABASE

It is possible to extract some schemas directly from the database
without parsing a text file (the "foo.sql" in the above example).
This can prove significantly faster than parsing a text file.  To
do this, use the "DBI" parser and provide the necessary arguments to
connect to the database and indicate the producer class, like so:

  $ sqlt -f DBI --dsn dbi:mysql:FOO --db-user guest \
    --db-password p4ssw0rd -t PostgreSQL > foo

The "--list" option to "sqlt" will show the databases supported by
DBI parsers.

=head1 HANDLING NON-SQL DATA

Certain structured document formats can be easily thought of as
tables.  SQLFairy can parse Microsoft Excel spreadsheets and
arbitrarily delimited text files just as if they were schemas which
contained only one table definition.  The column names are normalized
to something sane for most databases (whitespace is converted to
underscores and non-word characters are removed), and the data in each
field is scanned to determine the appropriate data type (character,
integer, or float) and size.  For instance, to convert a
comma-separated file to an SQLite database, do the following:

  $ sqlt -f xSV --fs ',' -t SQLite foo.csv > foo-sqlite.sql

Additionally, there are non-SQL represenations of relational schemas
such as XML and XMI.  Currently the XMI support in SQLFairy is
experimental and not released.  Additionally, the only XML supported
is our own version;  however, it would be fairly easy to add an XML
parser for something like the TorqueDB (http://db.apache.org/torque/)
project.  The actual parsing of XML should be trivial given the number
of XML parsers available, so all that would be left would be to map
the specific concepts in the source file to the Schema objects in
SQLFairy.  

To convert a schema in SQLFairy's XML dialect to Oracle, do the following:

  $ sqlt -f XML-SQLFairy -t Oracle foo.xml > foo-oracle.sql 

=head1 SERIALIZING SCHEMAS

via XML, YAML and Storable
Parsing a schema is generally the most computationally expensive
operation performed by SQLFairy, so it may behoove you to serialize a
parsed schema if you need to perform repeated conversions.  For
example, as part of a build process the author converts a MySQL schema
first to YAML, then to PostgreSQL, Oracle, SQLite and Sybase.
Additionally, a variety of documention in HTML and images is produced.
This can be accomplished like so:

  $ sqlt -f MySQL -t YAML schema-mysql.sql > schema.yaml
  $ sqlt -f YAML -t Oracle schema.yaml > schema-oracle.sql
  $ sqlt -f YAML -t PostgreSQL schema.yaml > schema-postgresql.sql
  $ ...

SQLFairy has three serialization producers, none of which is superior
to the other in their description of a schema.  

=over 4

=item * XML-SQLFairy

This is the aforementioned XML format.  It is essentially a direct
mapping of the Schema objects into XML.  This can also provide a very
convenient bridge to describing a schema to a non-Perl application.
Providing a producer argument to "sqlt" of just "XML" will default to
using "XML-SQLFairy."

=item * Storable

This producer stores the Schema object using Perl's Storable.pm module
available on CPAN.

=item * YAML

This producer serialized the Schema object with the very readable
structured data format of YAML (http://www.yaml.org/).  Earlier
examples show serializing to YAML.

=back

=head1 VISUALIZING SQL SCHEMAS

The visualization tools in SQLFairy can graphically represent the
tables, fields, datatypes and sizes, constraints, and foreign key
relationships in a very compact and intuitive format.  This can be
very beneficial in understanding and document large or small schemas.
Two producers in SQLFairy will create pseudo-E/R (entity-relationship)
diagrams:

=over 4

=item * Diagram

The first visualization tool in SQLFairy, this producer uses libgd to
draw a picture of the schema.  The tables are evenly distributed in
definition order running in columns (i.e., no graphing algorithms are
used), so the end result may result in many crossed lines showing the
foreign key relationships.  Please read the documentation of the
"sqlt-diagram" script for all the options available to this producer.

=item * GraphViz

The layout of the GraphViz producer is far superior to the Diagram
producer as it uses the Graphviz binary from Bell Labs to create very
professional-looking graphs.  There are several different layout
algorithms and node shapes available.  Please see the documentation of
the "sqlt-graph" script for more information.

=back

=head1 AUTOMATED CODE-GENERATION 

Given that so many applications interact with SQL databases, it's no
wonder that people have automated code to deal with this interaction.
Class::DBI from CPAN is one such module that allows a developer to
describe the relationships between tables and fields in class
declarations and then generates all the SQL to interact (SELECT,
UPDATE, DELETE, INSERT statements) at runtime.  Obviously, the schema
already describes itself, so it only makes sense that you should be
able to generate this kind of code directly from the schema.  The
"ClassDBI" producer in SQLFairy does just this, creating a Perl module
that inherits from Class::DBI and sets up most of the code needed to
interact with the database.  Here is an example of how to do this:

  $ sqlt -f MySQL -t ClassDBI foo.sql > Foo.pm

Then simply edit Foo.pm as needed and include it in your code.

=head1 DOCUMENTING WITH SQL::TRANSLATOR

SQLFairy offers two producers to help document schemas:

=over 4

=item * HTML

This producer creates a single HTML document which uses HTML
formatting to describe the Schema objects and to create hyperlinks on
foreign key relationships.  This can be a surprisingly useful
documentation aid as it creates a very readable format that allows one
to jump easily to specific tables and fields.  It's also possible to
plugin your own CSS to further control the presentation of the HTML.

=item * POD

This is arguably not that useful of a producer by itself, but the
number of POD-conversion tools could be used to further transform the
POD into something more interesting.  The schema is basically
represented in POD sections where tables are broken down into fields,
indices, constraints, foreign keys, etc.

=back

=head1 TEMPLATE-BASED MANIPULATION OF SCHEMA OBJECTS

All of the producers which create text output could have been coded
using a templating system to mix in the dynamic output with static
text.  CPAN offers several diverse templating systems, but few are as
powerful as Template Toolkit (http://www.template-toolkit.org/).  You
can easily create your own producer without writing any Perl code at
all simply by writing a template using Template Toolkit's syntax.  The
template will be passed a reference to the Schema object briefly
described at the beginning of this document and mentioned many times
throughout.  For example, you could create a template that simply
prints the name of each table and field that looks like this:

  # file: schema.tt
  [% FOREACH table IN schema.get_tables %]
  Table: [% table.name %]
  Fields:
  [% FOREACH field IN table.get_fields -%]
    [% field.name %]
  [% END -%]
  [% END %]

And the process it like so:

  $ sqlt -f YAML -t TTSchema --template schema.tt foo.yaml

To create output like this:

  Table: foo
  Fields:
    foo_id
    foo_name

For more information on Template Toolkit, please install the
"Template" module and read the POD.

=head1 PLUGIN YOUR OWN PARSERS AND PRODUCERS

Now that you have seen how the parsers and producers interact via the
Schema objects, you may wish to create your own versions to plugin.

=head1 AUTHOR

Ken Y. Clark E<lt>kclark@cpan.orgE<gt>.
Commit	Line	Data
2e2fc2b4	1	=head1 NAME
	2
	3	SQL::Translator::Manual
	4
	5	=head1 SYNOPSIS
	6
cfdcb09e	7	SQL::Translator (AKA "SQLFairy") is a collection of modules for
	8	transforming (mainly) SQL DDL files into a variety of other formats,
	9	including other SQL dialects, documentation, images, and code. In
	10	this manual, we will attempt to address how to use SQLFairy for common
	11	tasks. For a lower-level discussion of how the code works, please
	12	read the documentation for L<SQL::Translator>.
2e2fc2b4	13
cfdcb09e	14	It may prove helpful to have a general understanding of the SQLFairy
	15	code before continuing. The code can be broken into three conceptual
	16	groupings:
2e2fc2b4	17
cfdcb09e	18	=over 4
2e2fc2b4	19
cfdcb09e	20	=item * Parsers
2e2fc2b4	21
cfdcb09e	22	The parsers are responsible for reading the input files and describing
cfdcb09e	23	them to the Schema object middleware.
2e2fc2b4	24
cfdcb09e	25	=item * Producers
2e2fc2b4	26
cfdcb09e	27	The producers create the output as described by the Schema middleware.
2e2fc2b4	28
cfdcb09e	29	=item * Schema objects
	30
	31	The Schema objects bridge the communication between the Parsers and
	32	Producers by representing any parsed file through a standard set of
	33	generic objects to represent concepts like Tables, Fields (columns),
	34	Indices, Constraints, etc.
	35
	36	=back
	37
	38	It's not necessary to understand how to write or manipulate any
	39	of these for most common tasks, but you should aware of the concepts
	40	as they will be referenced later in this document.
	41
	42	=head1 SQLFAIRY SCRIPTS
	43
	44	Most common tasks can be accomplished through the use of the script
	45	interfaces to the SQL::Translator code. All SQLFairy scripts begin
	46	with "sqlt." Here are the scripts and a description of what they each
	47	do:
	48
	49	=over 4
	50
	51	=item * sqlt
	52
	53	This is the main interface for text-to-text translations, e.g.,
	54	converting a MySQL schema to Oracle.
	55
	56	=item * sqlt-diagram
	57
	58	This is a tailored interface for the Diagram producer and its many
	59	myriad options.
	60
	61	=item * sqlt-diff
	62
	63	This script will examine two schemas and report the SQL commands
	64	(ALTER, CREATE) needed to turn the first schema into the second.
	65
	66	=item * sqlt-dumper
	67
	68	This script generates a Perl script that can be used to connect to a
	69	database and dump the data in each table in different formats, similar
	70	to the "mysqldump" program.
	71
	72	=item * sqlt-graph
	73
	74	This is an interface to the GraphViz visualization tool and its myriad
	75	options.
	76
	77	=item * sqlt.cgi
	78
	79	This is a CGI script that presents an HTML form for uploading or
	80	pasting a schema and choosing an output and the output options.
	81
	82	=back
	83
	84	To read the full documentation for each script, use "perldoc" (or
	85	execute any of the command-line scripts with the "--help" flag).
	86
	87	=head1 CONVERTING SQL DIALECTS
	88
	89	Probably the most common task SQLFairy is used for is to convert one
	90	dialect of SQL to another. If you have a text description of an SQL
	91	database (AKA a "DDL" -- "Data Definition Language"), then you should
	92	use the "sqlt" script with switches to indicate the parser and
93	producer and the name of the text file as the final argument. For
94	example, to convert the "foo.sql" MySQL schema to a version suitable
95	for PostgreSQL, you would do the following:
96
97	$ sqlt -f MySQL -t PostgreSQL foo.sql > foo-pg.sql
98
99	The "from" and "to" options are case-sensitive and must match exactly
100	the names of the Parser and Producer classes in SQL::Translator. For
101	a complete listing of your options, execute "sqlt" with the "--list"
102	flag.
103
104	=head1 EXTRACT SQL SCHEMAS DIRECTLY FROM DATABASE
105
106	It is possible to extract some schemas directly from the database
107	without parsing a text file (the "foo.sql" in the above example).
108	This can prove significantly faster than parsing a text file. To
109	do this, use the "DBI" parser and provide the necessary arguments to
110	connect to the database and indicate the producer class, like so:
111
112	$ sqlt -f DBI --dsn dbi:mysql:FOO --db-user guest \
113	--db-password p4ssw0rd -t PostgreSQL > foo
114
115	The "--list" option to "sqlt" will show the databases supported by
116	DBI parsers.
117
118	=head1 HANDLING NON-SQL DATA
119
120	Certain structured document formats can be easily thought of as
121	tables. SQLFairy can parse Microsoft Excel spreadsheets and
122	arbitrarily delimited text files just as if they were schemas which
123	contained only one table definition. The column names are normalized
124	to something sane for most databases (whitespace is converted to
125	underscores and non-word characters are removed), and the data in each
126	field is scanned to determine the appropriate data type (character,
127	integer, or float) and size. For instance, to convert a
128	comma-separated file to an SQLite database, do the following:
129
130	$ sqlt -f xSV --fs ',' -t SQLite foo.csv > foo-sqlite.sql
131
132	Additionally, there are non-SQL represenations of relational schemas
133	such as XML and XMI. Currently the XMI support in SQLFairy is
134	experimental and not released. Additionally, the only XML supported
135	is our own version; however, it would be fairly easy to add an XML
136	parser for something like the TorqueDB (http://db.apache.org/torque/)
137	project. The actual parsing of XML should be trivial given the number
138	of XML parsers available, so all that would be left would be to map
139	the specific concepts in the source file to the Schema objects in
140	SQLFairy.
141
142	To convert a schema in SQLFairy's XML dialect to Oracle, do the following:
143
144	$ sqlt -f XML-SQLFairy -t Oracle foo.xml > foo-oracle.sql
145
146	=head1 SERIALIZING SCHEMAS
2e2fc2b4	147
2e2fc2b4	148	via XML, YAML and Storable
cfdcb09e	149	Parsing a schema is generally the most computationally expensive
	150	operation performed by SQLFairy, so it may behoove you to serialize a
	151	parsed schema if you need to perform repeated conversions. For
	152	example, as part of a build process the author converts a MySQL schema
	153	first to YAML, then to PostgreSQL, Oracle, SQLite and Sybase.
	154	Additionally, a variety of documention in HTML and images is produced.
	155	This can be accomplished like so:
	156
	157	$ sqlt -f MySQL -t YAML schema-mysql.sql > schema.yaml
	158	$ sqlt -f YAML -t Oracle schema.yaml > schema-oracle.sql
	159	$ sqlt -f YAML -t PostgreSQL schema.yaml > schema-postgresql.sql
	160	$ ...
	161
	162	SQLFairy has three serialization producers, none of which is superior
	163	to the other in their description of a schema.
	164
	165	=over 4
	166
	167	=item * XML-SQLFairy
	168
	169	This is the aforementioned XML format. It is essentially a direct
	170	mapping of the Schema objects into XML. This can also provide a very
	171	convenient bridge to describing a schema to a non-Perl application.
	172	Providing a producer argument to "sqlt" of just "XML" will default to
	173	using "XML-SQLFairy."
	174
	175	=item * Storable
	176
	177	This producer stores the Schema object using Perl's Storable.pm module
	178	available on CPAN.
	179
	180	=item * YAML
	181
	182	This producer serialized the Schema object with the very readable
	183	structured data format of YAML (http://www.yaml.org/). Earlier
	184	examples show serializing to YAML.
	185
	186	=back
	187
	188	=head1 VISUALIZING SQL SCHEMAS
	189
	190	The visualization tools in SQLFairy can graphically represent the
	191	tables, fields, datatypes and sizes, constraints, and foreign key
	192	relationships in a very compact and intuitive format. This can be
	193	very beneficial in understanding and document large or small schemas.
	194	Two producers in SQLFairy will create pseudo-E/R (entity-relationship)
	195	diagrams:
	196
	197	=over 4
	198
	199	=item * Diagram
	200
	201	The first visualization tool in SQLFairy, this producer uses libgd to
	202	draw a picture of the schema. The tables are evenly distributed in
	203	definition order running in columns (i.e., no graphing algorithms are
	204	used), so the end result may result in many crossed lines showing the
	205	foreign key relationships. Please read the documentation of the
	206	"sqlt-diagram" script for all the options available to this producer.
	207
	208	=item * GraphViz
	209
	210	The layout of the GraphViz producer is far superior to the Diagram
	211	producer as it uses the Graphviz binary from Bell Labs to create very
	212	professional-looking graphs. There are several different layout
213	algorithms and node shapes available. Please see the documentation of
214	the "sqlt-graph" script for more information.
215
216	=back
217
218	=head1 AUTOMATED CODE-GENERATION
219
220	Given that so many applications interact with SQL databases, it's no
221	wonder that people have automated code to deal with this interaction.
222	Class::DBI from CPAN is one such module that allows a developer to
223	describe the relationships between tables and fields in class
224	declarations and then generates all the SQL to interact (SELECT,
225	UPDATE, DELETE, INSERT statements) at runtime. Obviously, the schema
226	already describes itself, so it only makes sense that you should be
227	able to generate this kind of code directly from the schema. The
228	"ClassDBI" producer in SQLFairy does just this, creating a Perl module
229	that inherits from Class::DBI and sets up most of the code needed to
230	interact with the database. Here is an example of how to do this:
231
232	$ sqlt -f MySQL -t ClassDBI foo.sql > Foo.pm
233
234	Then simply edit Foo.pm as needed and include it in your code.
235
236	=head1 DOCUMENTING WITH SQL::TRANSLATOR
2e2fc2b4	237
cfdcb09e	238	SQLFairy offers two producers to help document schemas:
2e2fc2b4	239
cfdcb09e	240	=over 4
2e2fc2b4	241
cfdcb09e	242	=item * HTML
2e2fc2b4	243
cfdcb09e	244	This producer creates a single HTML document which uses HTML
	245	formatting to describe the Schema objects and to create hyperlinks on
	246	foreign key relationships. This can be a surprisingly useful
	247	documentation aid as it creates a very readable format that allows one
	248	to jump easily to specific tables and fields. It's also possible to
	249	plugin your own CSS to further control the presentation of the HTML.
2e2fc2b4	250
cfdcb09e	251	=item * POD
2e2fc2b4	252
cfdcb09e	253	This is arguably not that useful of a producer by itself, but the
	254	number of POD-conversion tools could be used to further transform the
	255	POD into something more interesting. The schema is basically
	256	represented in POD sections where tables are broken down into fields,
	257	indices, constraints, foreign keys, etc.
2e2fc2b4	258
cfdcb09e	259	=back
2e2fc2b4	260
cfdcb09e	261	=head1 TEMPLATE-BASED MANIPULATION OF SCHEMA OBJECTS
2e2fc2b4	262
cfdcb09e	263	All of the producers which create text output could have been coded
	264	using a templating system to mix in the dynamic output with static
	265	text. CPAN offers several diverse templating systems, but few are as
	266	powerful as Template Toolkit (http://www.template-toolkit.org/). You
	267	can easily create your own producer without writing any Perl code at
	268	all simply by writing a template using Template Toolkit's syntax. The
	269	template will be passed a reference to the Schema object briefly
	270	described at the beginning of this document and mentioned many times
	271	throughout. For example, you could create a template that simply
	272	prints the name of each table and field that looks like this:
2e2fc2b4	273
cfdcb09e	274	# file: schema.tt
	275	[% FOREACH table IN schema.get_tables %]
	276	Table: [% table.name %]
	277	Fields:
	278	[% FOREACH field IN table.get_fields -%]
	279	[% field.name %]
	280	[% END -%]
	281	[% END %]
2e2fc2b4	282
cfdcb09e	283	And the process it like so:
2e2fc2b4	284
cfdcb09e	285	$ sqlt -f YAML -t TTSchema --template schema.tt foo.yaml
2e2fc2b4	286
cfdcb09e	287	To create output like this:
2e2fc2b4	288
cfdcb09e	289	Table: foo
	290	Fields:
	291	foo_id
	292	foo_name
2e2fc2b4	293
cfdcb09e	294	For more information on Template Toolkit, please install the
cfdcb09e	295	"Template" module and read the POD.
2e2fc2b4	296
cfdcb09e	297	=head1 PLUGIN YOUR OWN PARSERS AND PRODUCERS
2e2fc2b4	298
cfdcb09e	299	Now that you have seen how the parsers and producers interact via the
cfdcb09e	300	Schema objects, you may wish to create your own versions to plugin.
2e2fc2b4	301
	302	=head1 AUTHOR
	303
977651a5	304	Ken Y. Clark E<lt>kclark@cpan.orgE<gt>.