Commit | Line | Data |
d6e108eb |
1 | =head1 NAME |
2 | |
3 | SQL::Abstract::Manual::Specification |
4 | |
5 | =head1 SYNOPSIS |
6 | |
7 | This discusses the specification for the AST provided by L<SQL::Abstract>. It is |
8 | meant to describe how the AST is structured, various components provided by |
9 | L<SQL::Abstract> for use with this AST, how to manipulate the AST, and various |
10 | uses for the AST once it is generated. |
11 | |
12 | =head1 MOTIVATIONS |
13 | |
14 | L<SQL::Abstract> has been in use for many years. Originally created to handle |
15 | the where-clause formation found in L<DBIx::Abstract>, it was generalized to |
16 | manage the creation of any SQL statement through the use of Perl structures. |
17 | Through the beating it received as the SQL generation syntax for L<DBIx::Class>, |
18 | various deficiencies were found and a generalized SQL AST was designed. This |
19 | document describes that AST. |
20 | |
21 | =head1 GOALS |
22 | |
23 | The goals for this AST are as follows: |
24 | |
25 | =head2 SQL-specific semantics |
26 | |
27 | Instead of attempting to be an AST to handle any form of query, this will |
28 | instead be specialized to manage SQL queries (and queries that map to SQL |
29 | queries). This means that there will be support for SQL-specific features, such |
30 | as placeholders. |
31 | |
32 | =head2 Perl-specific semantics |
33 | |
34 | This AST is meant to be used from within Perl5 only. So, it will take advantage |
35 | of as many Perl-specific features that make sense to use. No attempt whatosever |
36 | will be made to make this AST work within any other language, including Perl6. |
37 | |
38 | =head2 Whole-lifecycle management |
39 | |
40 | Whether a query is built out of whole cloth in one shot or cobbled together from |
41 | several snippets over the lifetime of a process, this AST will support any way |
42 | to construct the query. Queries can also be built from other queries, so an |
43 | UPDATE statement could be used as the basis for a SELECT statement, DELETE |
44 | statement, or even a DDL statement of some kind. |
45 | |
46 | =head2 Dialect-agnostic usage |
47 | |
48 | Even though SQL itself has several ANSI specifications (SQL-92 and SQL-99 among |
49 | them), this only serves as a basis for what a given RDBMS will expect. However, |
50 | every engine has its own specific extensions and specific ways of handling |
393a4eb8 |
51 | common features. The AST will provide ways of expressing common functionality in |
52 | a common language. The emitters (objects that follow the Visitor pattern) will |
53 | be responsible for converting that common language into RDBMS-specific SQL. |
54 | |
ad0f8fa6 |
55 | =head1 RESTRICTIONS |
56 | |
57 | The following are the restrictions upon the AST: |
58 | |
59 | =head2 DML-only |
60 | |
61 | The AST will only support DML (Data Modelling Language). It will not (currently) |
62 | support DDL (Data Definition Language). Practically, this means that the only |
63 | statements supported will be: |
64 | |
65 | =over 4 |
66 | |
67 | =item * SELECT |
68 | |
69 | =item * INSERT INTO |
70 | |
71 | =item * UPDATE |
72 | |
73 | =item * DELETE |
74 | |
75 | =back |
76 | |
77 | Additional DML statements may be supported by specific Visitors (such as a |
78 | MySQL visitor supporting REPLACE INTO). q.v. the relevant sections of this |
79 | specification for details. |
80 | |
393a4eb8 |
81 | =head1 COMPONENTS |
82 | |
83 | There are two major components to SQL::Abstract v2. |
84 | |
85 | =over 4 |
86 | |
87 | =item * AST |
88 | |
89 | This is the Abstract Syntax Tree. It is a data structure that represents |
90 | everything necessary to construct the SQL statement in whatever dialect the |
91 | user requires. |
92 | |
93 | =item * Visitor |
94 | |
95 | This object conforms to the Visitor pattern and is used to generate the SQL |
96 | represented by the AST. Each dialect will have a different Visitor object. In |
97 | addition, there will be visitors for at least one of the ANSI specifications. |
98 | |
99 | =back |
d6e108eb |
100 | |
df35a525 |
101 | The division of duties between the two components will focus on what the AST |
102 | can and cannot assume. For example, identifiers do not have 20 components in |
103 | any dialect, so the AST can validate that. However, determining what |
104 | constitutes a legal identifier can only be determined by the Visitor object |
105 | enforcing that dialect's rules. |
106 | |
d6e108eb |
107 | =head1 AST STRUCTURE |
108 | |
393a4eb8 |
109 | The AST will be a HoHo..oH (hash of hash of ... of hashes). The keys to the |
110 | outermost hash will be the various clauses of a SQL statement, plus some |
111 | metadata keys. All metadata keys will be identifiable as such by being prefixed |
112 | with an underscore. All keys will be in lowercase. |
d6e108eb |
113 | |
114 | =head2 Metadata keys |
115 | |
116 | These are the additional metadata keys that the AST provides for. |
117 | |
df35a525 |
118 | =head3 _query |
119 | |
120 | This denotes what kind of query this AST should be interpreted as. Different |
121 | Visitors may accept additional values for _query. For example, a MySQL Visitor |
122 | may choose to accept 'replace'. If a _query value is unrecognized by the |
123 | Visitor, the Visitor is expected to throw an error. |
124 | |
125 | All Visitors are expected to handle the following values for _query: |
126 | |
d6e108eb |
127 | =over 4 |
128 | |
df35a525 |
129 | =item * select |
130 | |
131 | This is a SELECT statement. |
d6e108eb |
132 | |
df35a525 |
133 | =item * insert |
d6e108eb |
134 | |
df35a525 |
135 | This is an INSERT statement. |
393a4eb8 |
136 | |
df35a525 |
137 | =item * update |
138 | |
139 | This is an UPDATE statement. |
140 | |
141 | =item * delete |
142 | |
143 | This is a DELETE statement. |
d6e108eb |
144 | |
145 | =back |
146 | |
df35a525 |
147 | =head3 _version |
148 | |
149 | This denotes the version of the AST. Different versions will indicate different |
150 | capabilities provided. Visitors will choose to respect the _version as needed |
151 | and desired. |
152 | |
d6e108eb |
153 | =head2 Structural units |
154 | |
df35a525 |
155 | All structural units will be hashes. These hashes will have, at minimum, the |
156 | following keys: |
157 | |
158 | =over 4 |
159 | |
160 | =item * _name |
161 | |
162 | This indicates the structural unit that this hash is representing. While this |
163 | specification provides for standard structural units, different Visitors may |
164 | choose to accept additional units as desired. If a Visitor encounters a unit it |
165 | doesn't know how to handle, it is expected to throw an exception. |
166 | |
167 | =back |
168 | |
d6e108eb |
169 | Structural units in the AST are supported by loaded components. L<SQL::Abstract> |
170 | provides for the following structural units by default: |
171 | |
172 | =head3 Identifier |
173 | |
df35a525 |
174 | This is a (potentially) fully canonicalized identifier for a elemnt in the |
175 | query. This element could be a schema, table, or column. The Visitor will |
176 | determine validity within the context of that SQL dialect. The AST is only |
177 | responsible for validating that the elements are non-empty Strings. |
178 | |
179 | The hash will be structured as follows: |
180 | |
181 | { |
a3872878 |
182 | _name => 'Identifier', |
ad0f8fa6 |
183 | items => [Scalar], |
df35a525 |
184 | } |
d6e108eb |
185 | |
ad0f8fa6 |
186 | Visitors are expected to, by default, quote all identifiers according to the SQL |
187 | dialect's quoting scheme. |
d6e108eb |
188 | |
10000e9e |
189 | =head3 Value |
d6e108eb |
190 | |
10000e9e |
191 | A Value is a Perl scalar. It may either be a: |
192 | |
193 | =over 4 |
194 | |
195 | =item * String |
196 | |
197 | A String is a quoted series of characters |
198 | |
199 | =item * Number |
200 | |
201 | A Number is an unquoted number in some numeric format |
202 | |
ad0f8fa6 |
203 | =item * Null |
10000e9e |
204 | |
ad0f8fa6 |
205 | Null is SQL's NULL and corresponds to Perl's C<undef>. |
10000e9e |
206 | |
207 | =item * BindParameter |
208 | |
209 | This corresponds to a value that will be passed in. This value is normally |
210 | quoted in such a fashion so as to protect against SQL injection attacks. (q.v. |
211 | L<DBI/quote()> for an example.) |
212 | |
213 | =back |
214 | |
a3872878 |
215 | The hash will be structured as follows: |
216 | |
217 | { |
218 | _name => 'Value' |
ad0f8fa6 |
219 | _subtype => [ 'String' | 'Number' | 'Null' | 'BindParameter' ] |
a3872878 |
220 | value => [Scalar] |
221 | } |
222 | |
223 | The provided subtypes are the ones that all Visitors are expected to support. |
224 | Visitors may choose to support additional subtypes. Visitors are expected to |
225 | throw an exception upon encountering an unknown subtype. |
d6e108eb |
226 | |
227 | =head3 Function |
228 | |
229 | A Function is anything of the form C< name( arglist ) > where C<name> is a |
230 | string and C<arglist> is a comma-separated list of Expressions. |
231 | |
81cd86f1 |
232 | Yes, a Subquery is legal as an argument for many functions. Some example |
233 | functions are: |
234 | |
235 | =over 4 |
236 | |
81cd86f1 |
237 | =item * C<< MAX >> |
238 | |
239 | =item * C<< MIN >> |
240 | |
241 | =item * C<< SUM >> |
242 | |
ad0f8fa6 |
243 | =item * C<< IF >> |
244 | |
81cd86f1 |
245 | =back |
d6e108eb |
246 | |
ad0f8fa6 |
247 | Functions have a cardinality, or expected number of arguments. Some functions, |
248 | such as MAX(), have a cardinality of 1. Others, such as IF(), have a cardinality |
249 | of N, meaning they can have any number of arguments greater than 0. Others, such |
250 | as NOW(), have a cardinality of 0. Several functions with the same meaning may |
251 | have a different cardinality in different SQL dialects as different engines may |
252 | allow different behaviors. |
253 | |
254 | As cardinality may differ between dialects, enforcing cardinality is necessarily |
255 | left to the Visitor. |
256 | |
d6e108eb |
257 | =head3 Subquery |
258 | |
259 | A Subquery is another AST whose _query metadata parameter is set to "SELECT". |
260 | |
261 | Most places that a Subquery can be used would require a single value to be |
262 | returned (single column, single row), but that is not something that the AST can |
ad0f8fa6 |
263 | easily enforce. The single-column restriction may possibly be enforced, but the |
d6e108eb |
264 | single-row restriction is much more difficult and, in most cases, probably |
265 | impossible. |
266 | |
81cd86f1 |
267 | Subqueries, when expressed in SQL, must bounded by parentheses. |
268 | |
d6e108eb |
269 | =head3 Unary Operator |
270 | |
ad0f8fa6 |
271 | A UnaryOperator takes a single argument on the RHS. The argument for a |
272 | UnaryOperator is an Expression. |
273 | |
274 | Visitors are expected to support, at minimum, the following operators: |
d6e108eb |
275 | |
276 | =over 4 |
277 | |
ad0f8fa6 |
278 | =item * NOT X |
279 | |
280 | =item * ANY X |
281 | |
282 | =item * ALL X |
283 | |
284 | =item * SOME X |
d6e108eb |
285 | |
286 | =back |
287 | |
ad0f8fa6 |
288 | The hash for a UnaryOperator is as follows: |
289 | |
290 | { |
291 | _name => 'UnaryOperator' |
292 | _operator => [ .... ], |
293 | argument1 => Expression, |
294 | } |
295 | |
296 | Visitors may choose to support additional operators. Visitors are expected to |
297 | throw an exception upon encountering an unknown operator. |
298 | |
d6e108eb |
299 | =head3 BinaryOperator |
300 | |
ad0f8fa6 |
301 | A BinaryOperator takes two arguments (one on the LHS and one on the RHS). The |
302 | arguments for a BinaryOperator are all Expressions. |
a3872878 |
303 | |
ad0f8fa6 |
304 | Visitors are expected to support, at minimum, the following operators: |
d6e108eb |
305 | |
306 | =over 4 |
307 | |
a3872878 |
308 | =item * X = Y |
309 | |
310 | =item * X != Y |
d6e108eb |
311 | |
a3872878 |
312 | =item * X > Y |
d6e108eb |
313 | |
a3872878 |
314 | =item * X < Y |
d6e108eb |
315 | |
a3872878 |
316 | =item * X >= Y |
d6e108eb |
317 | |
a3872878 |
318 | =item * X <= Y |
d6e108eb |
319 | |
a3872878 |
320 | =item * X IS Y |
d6e108eb |
321 | |
a3872878 |
322 | =item * X IN Y |
d6e108eb |
323 | |
ad0f8fa6 |
324 | =item * X NOT IN Y |
325 | |
326 | =item * X AND Y |
327 | |
328 | =item * X OR Y |
329 | |
d6e108eb |
330 | =back |
331 | |
ad0f8fa6 |
332 | (Note that an operator can comprise of what would be multiple tokens in a normal |
333 | parsing effort.) |
334 | |
335 | Visitors may choose to support additional operators. Visitors are expected to |
336 | throw an exception upon encountering an unknown operator. |
337 | |
338 | The hash for a BinaryOperator is as follows: |
339 | |
340 | { |
341 | _name => 'BinaryOperator' |
342 | _operator => [ .... ], |
343 | argument1 => Expression, |
344 | argument2 => Expression, |
345 | } |
d6e108eb |
346 | |
a3872878 |
347 | =head3 TrinaryOperator |
348 | |
349 | A TrinaryOperator takes three arguments. It generally is composed of two |
350 | elements with one argument to the LHS, one to the RHS, and a third in the middle |
ad0f8fa6 |
351 | of the elements. The arguments for a TrinaryOperator are all Expressions. |
a3872878 |
352 | |
ad0f8fa6 |
353 | Visitors are expected to support, at minimum, the following operators: |
a3872878 |
354 | |
355 | =over 4 |
356 | |
357 | =item * X BETWEEN Y AND Z |
358 | |
359 | =back |
360 | |
ad0f8fa6 |
361 | Visitors may choose to support additional operators. Visitors are expected to |
362 | throw an exception upon encountering an unknown operator. |
363 | |
364 | The hash for a TrinaryOperator is as follows: |
365 | |
366 | { |
367 | _name => 'TrinaryOperator' |
368 | _operator => [ .... ], |
369 | argument1 => Expression, |
370 | argument2 => Expression, |
371 | argument3 => Expression, |
372 | } |
373 | |
d6e108eb |
374 | =head3 Expression |
375 | |
376 | An expression can be any one of the following: |
377 | |
378 | =over 4 |
379 | |
10000e9e |
380 | =item * Value |
d6e108eb |
381 | |
382 | =item * Function |
383 | |
384 | =item * Subquery |
385 | |
ad0f8fa6 |
386 | =item * UnaryOperator |
d6e108eb |
387 | |
ad0f8fa6 |
388 | =item * BinaryOperator |
389 | |
390 | =item * TrinaryOperator |
d6e108eb |
391 | |
81cd86f1 |
392 | =item * ( Expression ) |
393 | |
d6e108eb |
394 | =back |
395 | |
81cd86f1 |
396 | Parentheses indicate precedence and, in some situations, are necessary for |
397 | certain operators. |
398 | |
ad0f8fa6 |
399 | The hash for an Expression is as follows: |
400 | |
401 | { |
402 | _name => 'Expression', |
403 | _subtype => [ 'Value' | 'Function' | 'SubQuery' | . . . ], |
404 | } |
405 | |
d6e108eb |
406 | =head2 SQL clauses |
407 | |
10000e9e |
408 | These are all the legal and acceptable clauses within the AST that would |
409 | correpsond to clauses in a SQL statement. Not all clauses are legal within a |
410 | given RDBMS engine's SQL dialect and some clauses may be required in one and |
411 | optional in another. Detecting and enforcing those engine-specific restrictions |
412 | is the responsibility of the Visitor object. |
413 | |
414 | The clauses are defined with a yacc-like syntax. The various parts are: |
415 | |
416 | =over 4 |
417 | |
418 | =item * := |
419 | |
420 | This means "defined" and is used to create a new term to be used below. |
421 | |
422 | =item * [] |
423 | |
424 | This means optional and indicates that the items within it are optional. |
425 | |
426 | =item * []* |
427 | |
428 | This means optional and repeating as many times as desired. |
429 | |
430 | =item * | |
431 | |
432 | This means alternation. It is a binary operator and indicates that either the |
433 | left or right hand sides may be used, but not both. |
434 | |
435 | =item * C<< <> >> |
436 | |
437 | This is a grouping construct. It means that all elements within this construct |
438 | are treated together for the purposes of optional, repeating, alternation, etc. |
439 | |
440 | =back |
441 | |
d6e108eb |
442 | The expected clauses are (name and structure): |
443 | |
444 | =head3 select |
445 | |
81cd86f1 |
446 | This corresponds to the SELECT clause of a SELECT statement. |
447 | |
448 | A select clause is composed as follows: |
449 | |
450 | SelectComponent := Expression [ [ AS ] String ] |
451 | |
452 | SelectComponent |
453 | [ , SelectComponent ]* |
d6e108eb |
454 | |
455 | =head3 tables |
456 | |
457 | This is a list of tables that this clause is affecting. It corresponds to the |
81cd86f1 |
458 | FROM clause in a SELECT statement and the INSERT INTO/UPDATE/DELETE clauses in |
459 | those respective statements. Depending on the _query metadata entry, the |
460 | appropriate clause name will be used. |
d6e108eb |
461 | |
462 | The tables clause has several RDBMS-specific variations. The AST will support |
463 | all of them and it is up to the Visitor object constructing the actual SQL to |
464 | validate and/or use what is provided as appropriate. |
465 | |
466 | A table clause is composed as follows: |
467 | |
468 | TableIdentifier := Identifier [ [ AS ] String ] |
81cd86f1 |
469 | JoinType := < LEFT|RIGHT [ OUTER ] > | INNER | CROSS |
d6e108eb |
470 | |
471 | TableIdentifier |
472 | [ |
473 | < , TableIdentifier > |
474 | | < |
475 | [ JoinType ] JOIN TableIdentifier |
476 | [ |
477 | < USING ( Identifier [ , Identifier ] ) > |
478 | | < ON [ ( ] Expression [ , Expression ] [ ) ] > |
479 | ] |
480 | > |
481 | ]* |
482 | |
483 | Additionally, where aliases are provided for in the TableIdentifier, those |
484 | aliases must be used as the tablename in subsequent Identifiers that identify a |
485 | column of that table. |
486 | |
487 | =head3 where |
488 | |
81cd86f1 |
489 | This corresponds to the WHERE clause in a SELECT, UPDATE, or DELETE statement. |
490 | |
491 | A where clause is composed as follows: |
492 | |
493 | WhereOperator := AND | OR |
494 | WhereExpression := Expression | Expression WhereOperator Expression |
495 | |
496 | WhereExpression |
497 | |
d6e108eb |
498 | =head3 set |
499 | |
81cd86f1 |
500 | This corresponds to the SET clause in an INSERT or UPDATE statement. |
501 | |
502 | A set clause is composed as follows: |
503 | |
504 | SetComponent := Identifier = Expression |
505 | |
506 | SetComponent [ , SetComponent ]* |
507 | |
508 | =head3 columns |
509 | |
510 | This corresponds to the optional list of columns in an INSERT statement. |
511 | |
512 | A columns clause is composed as follows: |
513 | |
514 | ( Identifier [ , Identifier ]* ) |
515 | |
d6e108eb |
516 | =head3 values |
517 | |
81cd86f1 |
518 | This corresponds to the VALUES clause in an INSERT statement. |
519 | |
520 | A values clause is composed as follows: |
521 | |
522 | ( Expression [ , Expression ]* ) |
523 | |
524 | If there is a columns clause, the number of entries in the values clause must be |
525 | equal to the number of entries in the columns clause. |
526 | |
d6e108eb |
527 | =head3 orderby |
528 | |
81cd86f1 |
529 | This corresponds to the ORDER BY clause in a SELECT statement. |
530 | |
531 | An orderby clause is composed as follows: |
532 | |
10000e9e |
533 | OrderByComponent := XXX-TODO-XXX |
81cd86f1 |
534 | OrderByDirection := ASC | DESC |
535 | |
536 | OrderByComponent [ OrderByDirection ] |
537 | [ , OrderByComponent [ OrderByDirection ] ]* |
538 | |
d6e108eb |
539 | =head3 groupby |
540 | |
81cd86f1 |
541 | This corresponds to the GROUP BY clause in a SELECT statement. |
542 | |
543 | An groupby clause is composed as follows: |
544 | |
10000e9e |
545 | GroupByComponent := XXX-TODO-XXX |
81cd86f1 |
546 | |
547 | GroupByComponent [ , GroupByComponent ]* |
548 | |
d6e108eb |
549 | =head3 rows |
550 | |
81cd86f1 |
551 | This corresponds to the clause that is used in some RDBMS engines to limit the |
552 | number of rows returned by a query. In MySQL, this would be the LIMIT clause. |
553 | |
554 | A rows clause is composed as follows: |
555 | |
556 | Number [, Number ] |
557 | |
d6e108eb |
558 | =head3 for |
559 | |
81cd86f1 |
560 | This corresponds to the clause that is used in some RDBMS engines to indicate |
561 | what locks are to be taken by this SELECT statement. |
562 | |
563 | A for clause is composed as follows: |
564 | |
565 | UPDATE | DELETE |
566 | |
567 | =head3 connectby |
568 | |
569 | This corresponds to the clause that is used in some RDBMS engines to provide for |
570 | an adjacency-list query. |
571 | |
572 | A connectby clause is composed as follows: |
573 | |
574 | Identifier, WhereExpression |
575 | |
d6e108eb |
576 | =head1 AUTHORS |
577 | |
81cd86f1 |
578 | robkinyon: Rob Kinyon C<< <rkinyon@cpan.org> >> |
d6e108eb |
579 | |
580 | =head1 LICENSE |
581 | |
582 | You may distribute this code under the same terms as Perl itself. |
583 | |
584 | =cut |