A bunch of changes, primarily focused on adding the concept of a FooList and adding...
[dbsrgits/SQL-Abstract-2.0-ish.git] / lib / SQL / Abstract / Manual / Specification.pod
CommitLineData
d6e108eb 1=head1 NAME
2
3SQL::Abstract::Manual::Specification
4
5=head1 SYNOPSIS
6
7This discusses the specification for the AST provided by L<SQL::Abstract>. It is
8meant to describe how the AST is structured, various components provided by
9L<SQL::Abstract> for use with this AST, how to manipulate the AST, and various
10uses for the AST once it is generated.
11
12=head1 MOTIVATIONS
13
14L<SQL::Abstract> has been in use for many years. Originally created to handle
15the where-clause formation found in L<DBIx::Abstract>, it was generalized to
16manage the creation of any SQL statement through the use of Perl structures.
17Through the beating it received as the SQL generation syntax for L<DBIx::Class>,
18various deficiencies were found and a generalized SQL AST was designed. This
19document describes that AST.
20
21=head1 GOALS
22
23The goals for this AST are as follows:
24
25=head2 SQL-specific semantics
26
27Instead of attempting to be an AST to handle any form of query, this will
28instead be specialized to manage SQL queries (and queries that map to SQL
29queries). This means that there will be support for SQL-specific features, such
30as placeholders.
31
32=head2 Perl-specific semantics
33
34This AST is meant to be used from within Perl5 only. So, it will take advantage
35of as many Perl-specific features that make sense to use. No attempt whatosever
36will be made to make this AST work within any other language, including Perl6.
37
38=head2 Whole-lifecycle management
39
40Whether a query is built out of whole cloth in one shot or cobbled together from
41several snippets over the lifetime of a process, this AST will support any way
42to construct the query. Queries can also be built from other queries, so an
43UPDATE statement could be used as the basis for a SELECT statement, DELETE
44statement, or even a DDL statement of some kind.
45
46=head2 Dialect-agnostic usage
47
48Even though SQL itself has several ANSI specifications (SQL-92 and SQL-99 among
49them), this only serves as a basis for what a given RDBMS will expect. However,
50every engine has its own specific extensions and specific ways of handling
393a4eb8 51common features. The AST will provide ways of expressing common functionality in
52a common language. The emitters (objects that follow the Visitor pattern) will
53be responsible for converting that common language into RDBMS-specific SQL.
54
ad0f8fa6 55=head1 RESTRICTIONS
56
57The following are the restrictions upon the AST:
58
59=head2 DML-only
60
61The AST will only support DML (Data Modelling Language). It will not (currently)
62support DDL (Data Definition Language). Practically, this means that the only
63statements supported will be:
64
65=over 4
66
67=item * SELECT
68
69=item * INSERT INTO
70
71=item * UPDATE
72
73=item * DELETE
74
75=back
76
77Additional DML statements may be supported by specific Visitors (such as a
78MySQL visitor supporting REPLACE INTO). q.v. the relevant sections of this
79specification for details.
80
393a4eb8 81=head1 COMPONENTS
82
83There are two major components to SQL::Abstract v2.
84
85=over 4
86
87=item * AST
88
89This is the Abstract Syntax Tree. It is a data structure that represents
90everything necessary to construct the SQL statement in whatever dialect the
91user requires.
92
93=item * Visitor
94
95This object conforms to the Visitor pattern and is used to generate the SQL
96represented by the AST. Each dialect will have a different Visitor object. In
97addition, there will be visitors for at least one of the ANSI specifications.
98
99=back
d6e108eb 100
df35a525 101The division of duties between the two components will focus on what the AST
102can and cannot assume. For example, identifiers do not have 20 components in
103any dialect, so the AST can validate that. However, determining what
104constitutes a legal identifier can only be determined by the Visitor object
105enforcing that dialect's rules.
106
d6e108eb 107=head1 AST STRUCTURE
108
393a4eb8 109The AST will be a HoHo..oH (hash of hash of ... of hashes). The keys to the
110outermost hash will be the various clauses of a SQL statement, plus some
111metadata keys. All metadata keys will be identifiable as such by being prefixed
112with an underscore. All keys will be in lowercase.
d6e108eb 113
114=head2 Metadata keys
115
116These are the additional metadata keys that the AST provides for.
117
df35a525 118=head3 _query
119
120This denotes what kind of query this AST should be interpreted as. Different
121Visitors may accept additional values for _query. For example, a MySQL Visitor
7c66a0ab 122may choose to accept 'replace' for REPLACE INTO. If a _query value is
123unrecognized by the Visitor, the Visitor is expected to throw an error.
df35a525 124
125All Visitors are expected to handle the following values for _query:
126
d6e108eb 127=over 4
128
df35a525 129=item * select
130
131This is a SELECT statement.
d6e108eb 132
df35a525 133=item * insert
d6e108eb 134
df35a525 135This is an INSERT statement.
393a4eb8 136
df35a525 137=item * update
138
139This is an UPDATE statement.
140
141=item * delete
142
143This is a DELETE statement.
d6e108eb 144
145=back
146
df35a525 147=head3 _version
148
149This denotes the version of the AST. Different versions will indicate different
150capabilities provided. Visitors will choose to respect the _version as needed
151and desired.
152
d6e108eb 153=head2 Structural units
154
df35a525 155All structural units will be hashes. These hashes will have, at minimum, the
156following keys:
157
158=over 4
159
7c66a0ab 160=item * name
df35a525 161
162This indicates the structural unit that this hash is representing. While this
163specification provides for standard structural units, different Visitors may
164choose to accept additional units as desired. If a Visitor encounters a unit it
165doesn't know how to handle, it is expected to throw an exception.
166
167=back
168
d6e108eb 169Structural units in the AST are supported by loaded components. L<SQL::Abstract>
170provides for the following structural units by default:
171
172=head3 Identifier
173
df35a525 174This is a (potentially) fully canonicalized identifier for a elemnt in the
175query. This element could be a schema, table, or column. The Visitor will
176determine validity within the context of that SQL dialect. The AST is only
177responsible for validating that the elements are non-empty Strings.
178
179The hash will be structured as follows:
180
181 {
7c66a0ab 182 name => 'Identifier',
183 element1 => Scalar,
184 element2 => Scalar,
185 element3 => Scalar,
df35a525 186 }
d6e108eb 187
7c66a0ab 188If element3 exists, then element2 must exist. element1 must always exist. If a
189given element exists, then it must be defined and of non-zero length.
190
ad0f8fa6 191Visitors are expected to, by default, quote all identifiers according to the SQL
192dialect's quoting scheme.
d6e108eb 193
10000e9e 194=head3 Value
d6e108eb 195
7c66a0ab 196A Value is a Perl scalar. Depending on the type, a Visitor may be able to make
197certain decisions.
10000e9e 198
199=over 4
200
201=item * String
202
7c66a0ab 203A String is a quoted series of characters. The Visitor is expected to ensure
204that embedded quotes are properly handled per the SQL dialect's quoting scheme.
10000e9e 205
206=item * Number
207
7c66a0ab 208A Number is an unquoted number in some numeric format.
10000e9e 209
ad0f8fa6 210=item * Null
10000e9e 211
ad0f8fa6 212Null is SQL's NULL and corresponds to Perl's C<undef>.
10000e9e 213
214=item * BindParameter
215
216This corresponds to a value that will be passed in. This value is normally
217quoted in such a fashion so as to protect against SQL injection attacks. (q.v.
218L<DBI/quote()> for an example.)
219
7c66a0ab 220BindParameters are normally represented by a '?'.
221
10000e9e 222=back
223
a3872878 224The hash will be structured as follows:
225
226 {
7c66a0ab 227 name => 'Value'
228 subtype => [ 'String' | 'Number' | 'Null' | 'BindParameter' ]
229 value => Scalar
a3872878 230 }
231
232The provided subtypes are the ones that all Visitors are expected to support.
233Visitors may choose to support additional subtypes. Visitors are expected to
234throw an exception upon encountering an unknown subtype.
d6e108eb 235
236=head3 Function
237
7c66a0ab 238A Function is anything of the form C<< name( arglist ) >> where C<<name>> is a
239string and C<arglist> is an ExpressionList.
d6e108eb 240
81cd86f1 241Yes, a Subquery is legal as an argument for many functions. Some example
242functions are:
243
244=over 4
245
81cd86f1 246=item * C<< MAX >>
247
248=item * C<< MIN >>
249
250=item * C<< SUM >>
251
ad0f8fa6 252=item * C<< IF >>
253
81cd86f1 254=back
d6e108eb 255
7c66a0ab 256The hash will be structured as follows:
257
258 {
259 name => "Function",
260 function => String,
261 arglist => ExpressionList,
262 }
263
ad0f8fa6 264Functions have a cardinality, or expected number of arguments. Some functions,
265such as MAX(), have a cardinality of 1. Others, such as IF(), have a cardinality
266of N, meaning they can have any number of arguments greater than 0. Others, such
267as NOW(), have a cardinality of 0. Several functions with the same meaning may
268have a different cardinality in different SQL dialects as different engines may
269allow different behaviors.
270
271As cardinality may differ between dialects, enforcing cardinality is necessarily
272left to the Visitor.
273
d6e108eb 274=head3 Subquery
275
276A Subquery is another AST whose _query metadata parameter is set to "SELECT".
277
278Most places that a Subquery can be used would require a single value to be
279returned (single column, single row), but that is not something that the AST can
ad0f8fa6 280easily enforce. The single-column restriction may possibly be enforced, but the
d6e108eb 281single-row restriction is much more difficult and, in most cases, probably
282impossible.
283
7c66a0ab 284Subqueries, when expressed in SQL, must be bounded by parentheses.
81cd86f1 285
d6e108eb 286=head3 Unary Operator
287
ad0f8fa6 288A UnaryOperator takes a single argument on the RHS. The argument for a
289UnaryOperator is an Expression.
290
291Visitors are expected to support, at minimum, the following operators:
d6e108eb 292
293=over 4
294
ad0f8fa6 295=item * NOT X
296
297=item * ANY X
298
299=item * ALL X
300
301=item * SOME X
d6e108eb 302
303=back
304
ad0f8fa6 305The hash for a UnaryOperator is as follows:
306
307 {
7c66a0ab 308 name => 'UnaryOperator'
309 operator => [ .... ],
ad0f8fa6 310 argument1 => Expression,
311 }
312
313Visitors may choose to support additional operators. Visitors are expected to
314throw an exception upon encountering an unknown operator.
315
d6e108eb 316=head3 BinaryOperator
317
ad0f8fa6 318A BinaryOperator takes two arguments (one on the LHS and one on the RHS). The
319arguments for a BinaryOperator are all Expressions.
a3872878 320
ad0f8fa6 321Visitors are expected to support, at minimum, the following operators:
d6e108eb 322
323=over 4
324
a3872878 325=item * X = Y
326
327=item * X != Y
d6e108eb 328
a3872878 329=item * X > Y
d6e108eb 330
a3872878 331=item * X < Y
d6e108eb 332
a3872878 333=item * X >= Y
d6e108eb 334
a3872878 335=item * X <= Y
d6e108eb 336
a3872878 337=item * X IS Y
d6e108eb 338
a3872878 339=item * X IN Y
d6e108eb 340
ad0f8fa6 341=item * X NOT IN Y
342
343=item * X AND Y
344
345=item * X OR Y
346
d6e108eb 347=back
348
ad0f8fa6 349(Note that an operator can comprise of what would be multiple tokens in a normal
350parsing effort.)
351
352Visitors may choose to support additional operators. Visitors are expected to
353throw an exception upon encountering an unknown operator.
354
355The hash for a BinaryOperator is as follows:
356
357 {
7c66a0ab 358 name => 'BinaryOperator'
359 operator => [ .... ],
ad0f8fa6 360 argument1 => Expression,
361 argument2 => Expression,
362 }
d6e108eb 363
a3872878 364=head3 TrinaryOperator
365
366A TrinaryOperator takes three arguments. It generally is composed of two
367elements with one argument to the LHS, one to the RHS, and a third in the middle
ad0f8fa6 368of the elements. The arguments for a TrinaryOperator are all Expressions.
a3872878 369
ad0f8fa6 370Visitors are expected to support, at minimum, the following operators:
a3872878 371
372=over 4
373
374=item * X BETWEEN Y AND Z
375
376=back
377
ad0f8fa6 378Visitors may choose to support additional operators. Visitors are expected to
379throw an exception upon encountering an unknown operator.
380
381The hash for a TrinaryOperator is as follows:
382
383 {
7c66a0ab 384 name => 'TrinaryOperator'
385 operator => [ .... ],
ad0f8fa6 386 argument1 => Expression,
387 argument2 => Expression,
388 argument3 => Expression,
389 }
390
d6e108eb 391=head3 Expression
392
7c66a0ab 393An Expression can be any one of the following:
d6e108eb 394
395=over 4
396
10000e9e 397=item * Value
d6e108eb 398
399=item * Function
400
401=item * Subquery
402
ad0f8fa6 403=item * UnaryOperator
d6e108eb 404
ad0f8fa6 405=item * BinaryOperator
406
407=item * TrinaryOperator
d6e108eb 408
409=back
410
7c66a0ab 411An Expression is a meta-syntactic unit. An "Expression" unit will never appear
412within the AST. It acts as a junction.
413
414=head3 ExpressionList
415
416An ExpressionList is a list of Expressions, generally separated by commas
417(though other separators may be appropriate at times or for different SQL
418dialects).
81cd86f1 419
7c66a0ab 420The hash for an ExpressionList is as follows:
ad0f8fa6 421
422 {
7c66a0ab 423 name => 'ExpressionList',
424 separator => ',',
425 elements => Array of Expressions,
ad0f8fa6 426 }
427
7c66a0ab 428An ExpressionList is always rendered in SQL with parentheses around it.
429
d6e108eb 430=head2 SQL clauses
431
10000e9e 432These are all the legal and acceptable clauses within the AST that would
433correpsond to clauses in a SQL statement. Not all clauses are legal within a
434given RDBMS engine's SQL dialect and some clauses may be required in one and
435optional in another. Detecting and enforcing those engine-specific restrictions
436is the responsibility of the Visitor object.
437
438The clauses are defined with a yacc-like syntax. The various parts are:
439
440=over 4
441
442=item * :=
443
444This means "defined" and is used to create a new term to be used below.
445
446=item * []
447
448This means optional and indicates that the items within it are optional.
449
450=item * []*
451
452This means optional and repeating as many times as desired.
453
454=item * |
455
456This means alternation. It is a binary operator and indicates that either the
457left or right hand sides may be used, but not both.
458
459=item * C<< <> >>
460
461This is a grouping construct. It means that all elements within this construct
462are treated together for the purposes of optional, repeating, alternation, etc.
463
464=back
465
d6e108eb 466The expected clauses are (name and structure):
467
468=head3 select
469
81cd86f1 470This corresponds to the SELECT clause of a SELECT statement.
471
7c66a0ab 472A select clause unit is an array of one or more SelectComponent units.
81cd86f1 473
7c66a0ab 474The hash for a SelectComponent unit is composed as follows:
81cd86f1 475
7c66a0ab 476 {
477 name => 'SelectComponent',
478 value => Expression,
479 [ as => Identifier, ]
480 }
481
482The 'as' component is optional. Visitors may choose to make it required in
483certain situations.
d6e108eb 484
485=head3 tables
486
487This is a list of tables that this clause is affecting. It corresponds to the
81cd86f1 488FROM clause in a SELECT statement and the INSERT INTO/UPDATE/DELETE clauses in
489those respective statements. Depending on the _query metadata entry, the
490appropriate clause name will be used.
d6e108eb 491
7c66a0ab 492A tables clause unit is an array of one or more TableComponent units.
493
d6e108eb 494The tables clause has several RDBMS-specific variations. The AST will support
495all of them and it is up to the Visitor object constructing the actual SQL to
496validate and/or use what is provided as appropriate.
497
7c66a0ab 498The hash for a TableJoin will be composed as follows:
499
500 # TableJoin
501 {
502 name => 'TableJoin',
503 join => < LEFT|RIGHT [ OUTER ] > | INNER | CROSS | ',',
504 [ using => IdentifierList, ]
505 [ on => ExpressionList, ]
506 }
507
508A TableJoin may not have both a 'using' element and an 'on' element. It may
509have one of them if the 'join' element is not equal to ',' but doesn't have to.
510If the 'join' element is equal to ',', then it may not have either a 'using' or
511an 'on' element.
512
513The hash for a TableIdentifier will be composed as follows:
d6e108eb 514
7c66a0ab 515 # TableIdentifier
516 {
517 name => 'TableIdentifier',
518 value => Identifier | SubQuery
519 [ join => TableJoin, ]
520 [ as => Identifier, ]
521 }
522
523The first TableComponent in a tables clause may not have a join element. All
524other TableComponent elements that do not have a join element will have a
525default join element of:
526
527 {
528 name => 'TableJoin',
529 join => ',',
530 }
d6e108eb 531
7c66a0ab 532The 'as' component is optional. Visitors may choose to make it required in
533certain situations (such as MySQL requiring an alias for subqueries).
d6e108eb 534
535Additionally, where aliases are provided for in the TableIdentifier, those
536aliases must be used as the tablename in subsequent Identifiers that identify a
7c66a0ab 537column of that table. This may be enforceable by the AST or the Visitor. But, it
538is more likely that it will not be.
d6e108eb 539
540=head3 where
541
81cd86f1 542This corresponds to the WHERE clause in a SELECT, UPDATE, or DELETE statement.
543
544A where clause is composed as follows:
545
546 WhereOperator := AND | OR
547 WhereExpression := Expression | Expression WhereOperator Expression
548
549 WhereExpression
550
d6e108eb 551=head3 set
552
81cd86f1 553This corresponds to the SET clause in an INSERT or UPDATE statement.
554
555A set clause is composed as follows:
556
557 SetComponent := Identifier = Expression
558
559 SetComponent [ , SetComponent ]*
560
561=head3 columns
562
563This corresponds to the optional list of columns in an INSERT statement.
564
7c66a0ab 565A columns clause is an IdentifierList and the unit is composed as follows:
81cd86f1 566
7c66a0ab 567 columns => [
568 Identifier,
569 [ Identifier, ]*
570 ],
81cd86f1 571
d6e108eb 572=head3 values
573
81cd86f1 574This corresponds to the VALUES clause in an INSERT statement.
575
7c66a0ab 576A values clause is an ExpressionList and the unit is composed as follows.
81cd86f1 577
7c66a0ab 578 values => [
579 Expression,
580 [ Expression, ]*
581 ],
81cd86f1 582
583If there is a columns clause, the number of entries in the values clause must be
584equal to the number of entries in the columns clause.
585
d6e108eb 586=head3 orderby
587
81cd86f1 588This corresponds to the ORDER BY clause in a SELECT statement.
589
590An orderby clause is composed as follows:
591
10000e9e 592 OrderByComponent := XXX-TODO-XXX
81cd86f1 593 OrderByDirection := ASC | DESC
594
595 OrderByComponent [ OrderByDirection ]
596 [ , OrderByComponent [ OrderByDirection ] ]*
597
d6e108eb 598=head3 groupby
599
81cd86f1 600This corresponds to the GROUP BY clause in a SELECT statement.
601
602An groupby clause is composed as follows:
603
10000e9e 604 GroupByComponent := XXX-TODO-XXX
81cd86f1 605
606 GroupByComponent [ , GroupByComponent ]*
607
d6e108eb 608=head3 rows
609
81cd86f1 610This corresponds to the clause that is used in some RDBMS engines to limit the
611number of rows returned by a query. In MySQL, this would be the LIMIT clause.
612
613A rows clause is composed as follows:
614
615 Number [, Number ]
616
d6e108eb 617=head3 for
618
81cd86f1 619This corresponds to the clause that is used in some RDBMS engines to indicate
620what locks are to be taken by this SELECT statement.
621
622A for clause is composed as follows:
623
624 UPDATE | DELETE
625
626=head3 connectby
627
628This corresponds to the clause that is used in some RDBMS engines to provide for
629an adjacency-list query.
630
631A connectby clause is composed as follows:
632
633 Identifier, WhereExpression
634
7c66a0ab 635=head1 EXAMPLES
636
637The following are example SQL statements and a possible AST for each one.
638
639=over 4
640
641=item * SELECT 1
642
643 {
644 _query => 'select',
645 _ast_version => 0.0001,
646 select => [
647 {
648 name => 'SelectComponent',
649 value => {
650 name => 'Value',
651 subtype => 'number',
652 value => 1,
653 },
654 },
655 ],
656 }
657
658=item * SELECT NOW() AS time FROM dual AS duality
659
660 {
661 _query => 'select',
662 _ast_version => 0.0001,
663 select => [
664 {
665 name => 'SelectComponent',
666 value => {
667 name => 'Function',
668 function => 'NOW',
669 },
670 as => {
671 name => 'Identifier',
672 element1 => 'time',
673 },
674 },
675 ],
676 tables => [
677 {
678 name => 'TablesComponent',
679 value => {
680 name => 'Identifier',
681 element1 => 'dual',
682 },
683 as => {
684 name => 'Identifier',
685 element1 => 'duality',
686 },
687 },
688 ],
689 }
690
691=back
692
d6e108eb 693=head1 AUTHORS
694
81cd86f1 695robkinyon: Rob Kinyon C<< <rkinyon@cpan.org> >>
d6e108eb 696
697=head1 LICENSE
698
699You may distribute this code under the same terms as Perl itself.
700
701=cut