Commit | Line | Data |
d6e108eb |
1 | =head1 NAME |
2 | |
3 | SQL::Abstract::Manual::Specification |
4 | |
5 | =head1 SYNOPSIS |
6 | |
7 | This discusses the specification for the AST provided by L<SQL::Abstract>. It is |
8 | meant to describe how the AST is structured, various components provided by |
9 | L<SQL::Abstract> for use with this AST, how to manipulate the AST, and various |
10 | uses for the AST once it is generated. |
11 | |
12 | =head1 MOTIVATIONS |
13 | |
14 | L<SQL::Abstract> has been in use for many years. Originally created to handle |
15 | the where-clause formation found in L<DBIx::Abstract>, it was generalized to |
16 | manage the creation of any SQL statement through the use of Perl structures. |
17 | Through the beating it received as the SQL generation syntax for L<DBIx::Class>, |
18 | various deficiencies were found and a generalized SQL AST was designed. This |
19 | document describes that AST. |
20 | |
21 | =head1 GOALS |
22 | |
23 | The goals for this AST are as follows: |
24 | |
25 | =head2 SQL-specific semantics |
26 | |
27 | Instead of attempting to be an AST to handle any form of query, this will |
28 | instead be specialized to manage SQL queries (and queries that map to SQL |
29 | queries). This means that there will be support for SQL-specific features, such |
30 | as placeholders. |
31 | |
32 | =head2 Perl-specific semantics |
33 | |
34 | This AST is meant to be used from within Perl5 only. So, it will take advantage |
35 | of as many Perl-specific features that make sense to use. No attempt whatosever |
36 | will be made to make this AST work within any other language, including Perl6. |
37 | |
38 | =head2 Whole-lifecycle management |
39 | |
40 | Whether a query is built out of whole cloth in one shot or cobbled together from |
41 | several snippets over the lifetime of a process, this AST will support any way |
42 | to construct the query. Queries can also be built from other queries, so an |
43 | UPDATE statement could be used as the basis for a SELECT statement, DELETE |
44 | statement, or even a DDL statement of some kind. |
45 | |
46 | =head2 Dialect-agnostic usage |
47 | |
48 | Even though SQL itself has several ANSI specifications (SQL-92 and SQL-99 among |
49 | them), this only serves as a basis for what a given RDBMS will expect. However, |
50 | every engine has its own specific extensions and specific ways of handling |
393a4eb8 |
51 | common features. The AST will provide ways of expressing common functionality in |
52 | a common language. The emitters (objects that follow the Visitor pattern) will |
53 | be responsible for converting that common language into RDBMS-specific SQL. |
54 | |
ad0f8fa6 |
55 | =head1 RESTRICTIONS |
56 | |
57 | The following are the restrictions upon the AST: |
58 | |
59 | =head2 DML-only |
60 | |
61 | The AST will only support DML (Data Modelling Language). It will not (currently) |
62 | support DDL (Data Definition Language). Practically, this means that the only |
63 | statements supported will be: |
64 | |
65 | =over 4 |
66 | |
67 | =item * SELECT |
68 | |
69 | =item * INSERT INTO |
70 | |
71 | =item * UPDATE |
72 | |
73 | =item * DELETE |
74 | |
75 | =back |
76 | |
77 | Additional DML statements may be supported by specific Visitors (such as a |
78 | MySQL visitor supporting REPLACE INTO). q.v. the relevant sections of this |
79 | specification for details. |
80 | |
804bd4ab |
81 | =head2 Dialect-agnostic construction |
82 | |
83 | The AST will not attempt to be immediately readable to a human as SQL. In fact, |
84 | due to the dialect differences, particularly in terms of which use operators and |
85 | which use functions for a given action, the AST will ... |
86 | |
87 | XXX FILL ME IN LATER XXX |
88 | |
393a4eb8 |
89 | =head1 COMPONENTS |
90 | |
91 | There are two major components to SQL::Abstract v2. |
92 | |
93 | =over 4 |
94 | |
95 | =item * AST |
96 | |
97 | This is the Abstract Syntax Tree. It is a data structure that represents |
98 | everything necessary to construct the SQL statement in whatever dialect the |
99 | user requires. |
100 | |
101 | =item * Visitor |
102 | |
103 | This object conforms to the Visitor pattern and is used to generate the SQL |
104 | represented by the AST. Each dialect will have a different Visitor object. In |
105 | addition, there will be visitors for at least one of the ANSI specifications. |
106 | |
107 | =back |
d6e108eb |
108 | |
df35a525 |
109 | The division of duties between the two components will focus on what the AST |
110 | can and cannot assume. For example, identifiers do not have 20 components in |
111 | any dialect, so the AST can validate that. However, determining what |
112 | constitutes a legal identifier can only be determined by the Visitor object |
113 | enforcing that dialect's rules. |
114 | |
d6e108eb |
115 | =head1 AST STRUCTURE |
116 | |
393a4eb8 |
117 | The AST will be a HoHo..oH (hash of hash of ... of hashes). The keys to the |
118 | outermost hash will be the various clauses of a SQL statement, plus some |
119 | metadata keys. All metadata keys will be identifiable as such by being prefixed |
120 | with an underscore. All keys will be in lowercase. |
d6e108eb |
121 | |
122 | =head2 Metadata keys |
123 | |
124 | These are the additional metadata keys that the AST provides for. |
125 | |
df35a525 |
126 | =head3 _query |
127 | |
128 | This denotes what kind of query this AST should be interpreted as. Different |
129 | Visitors may accept additional values for _query. For example, a MySQL Visitor |
7c66a0ab |
130 | may choose to accept 'replace' for REPLACE INTO. If a _query value is |
131 | unrecognized by the Visitor, the Visitor is expected to throw an error. |
df35a525 |
132 | |
133 | All Visitors are expected to handle the following values for _query: |
134 | |
d6e108eb |
135 | =over 4 |
136 | |
df35a525 |
137 | =item * select |
138 | |
139 | This is a SELECT statement. |
d6e108eb |
140 | |
df35a525 |
141 | =item * insert |
d6e108eb |
142 | |
df35a525 |
143 | This is an INSERT statement. |
393a4eb8 |
144 | |
df35a525 |
145 | =item * update |
146 | |
147 | This is an UPDATE statement. |
148 | |
149 | =item * delete |
150 | |
151 | This is a DELETE statement. |
d6e108eb |
152 | |
153 | =back |
154 | |
df35a525 |
155 | =head3 _version |
156 | |
157 | This denotes the version of the AST. Different versions will indicate different |
158 | capabilities provided. Visitors will choose to respect the _version as needed |
159 | and desired. |
160 | |
d6e108eb |
161 | =head2 Structural units |
162 | |
df35a525 |
163 | All structural units will be hashes. These hashes will have, at minimum, the |
164 | following keys: |
165 | |
166 | =over 4 |
167 | |
804bd4ab |
168 | =item * type |
df35a525 |
169 | |
170 | This indicates the structural unit that this hash is representing. While this |
171 | specification provides for standard structural units, different Visitors may |
172 | choose to accept additional units as desired. If a Visitor encounters a unit it |
173 | doesn't know how to handle, it is expected to throw an exception. |
174 | |
175 | =back |
176 | |
d6e108eb |
177 | Structural units in the AST are supported by loaded components. L<SQL::Abstract> |
178 | provides for the following structural units by default: |
179 | |
180 | =head3 Identifier |
181 | |
df35a525 |
182 | This is a (potentially) fully canonicalized identifier for a elemnt in the |
183 | query. This element could be a schema, table, or column. The Visitor will |
184 | determine validity within the context of that SQL dialect. The AST is only |
185 | responsible for validating that the elements are non-empty Strings. |
186 | |
187 | The hash will be structured as follows: |
188 | |
189 | { |
804bd4ab |
190 | type => 'Identifier', |
7c66a0ab |
191 | element1 => Scalar, |
192 | element2 => Scalar, |
193 | element3 => Scalar, |
df35a525 |
194 | } |
d6e108eb |
195 | |
7c66a0ab |
196 | If element3 exists, then element2 must exist. element1 must always exist. If a |
197 | given element exists, then it must be defined and of non-zero length. |
198 | |
ad0f8fa6 |
199 | Visitors are expected to, by default, quote all identifiers according to the SQL |
200 | dialect's quoting scheme. |
d6e108eb |
201 | |
10000e9e |
202 | =head3 Value |
d6e108eb |
203 | |
7c66a0ab |
204 | A Value is a Perl scalar. Depending on the type, a Visitor may be able to make |
205 | certain decisions. |
10000e9e |
206 | |
207 | =over 4 |
208 | |
209 | =item * String |
210 | |
7c66a0ab |
211 | A String is a quoted series of characters. The Visitor is expected to ensure |
212 | that embedded quotes are properly handled per the SQL dialect's quoting scheme. |
10000e9e |
213 | |
214 | =item * Number |
215 | |
7c66a0ab |
216 | A Number is an unquoted number in some numeric format. |
10000e9e |
217 | |
ad0f8fa6 |
218 | =item * Null |
10000e9e |
219 | |
ad0f8fa6 |
220 | Null is SQL's NULL and corresponds to Perl's C<undef>. |
10000e9e |
221 | |
222 | =item * BindParameter |
223 | |
224 | This corresponds to a value that will be passed in. This value is normally |
225 | quoted in such a fashion so as to protect against SQL injection attacks. (q.v. |
226 | L<DBI/quote()> for an example.) |
227 | |
7c66a0ab |
228 | BindParameters are normally represented by a '?'. |
229 | |
10000e9e |
230 | =back |
231 | |
a3872878 |
232 | The hash will be structured as follows: |
233 | |
234 | { |
804bd4ab |
235 | type => 'Value' |
7c66a0ab |
236 | subtype => [ 'String' | 'Number' | 'Null' | 'BindParameter' ] |
237 | value => Scalar |
a3872878 |
238 | } |
239 | |
240 | The provided subtypes are the ones that all Visitors are expected to support. |
241 | Visitors may choose to support additional subtypes. Visitors are expected to |
242 | throw an exception upon encountering an unknown subtype. |
d6e108eb |
243 | |
804bd4ab |
244 | =head3 Operator |
81cd86f1 |
245 | |
804bd4ab |
246 | An Operator would be, in SQL dialect terms, a unary operator, a binary operator, |
247 | a trinary operator, or a function. Since different dialects may have a given |
248 | functionality as an operator or a function (such as CONCAT in MySQl vs. || in |
249 | Oracle for string concatenation), they will be represented in the AST as generic |
250 | operators. |
d6e108eb |
251 | |
7c66a0ab |
252 | The hash will be structured as follows: |
253 | |
254 | { |
804bd4ab |
255 | type => 'Operator', |
256 | op => String, |
257 | args => ExpressionList, |
7c66a0ab |
258 | } |
259 | |
804bd4ab |
260 | Operators have a cardinality, or expected number of arguments. Some operators, |
ad0f8fa6 |
261 | such as MAX(), have a cardinality of 1. Others, such as IF(), have a cardinality |
262 | of N, meaning they can have any number of arguments greater than 0. Others, such |
804bd4ab |
263 | as NOW(), have a cardinality of 0. Several operators with the same meaning may |
ad0f8fa6 |
264 | have a different cardinality in different SQL dialects as different engines may |
804bd4ab |
265 | allow different behaviors. As cardinality may differ between dialects, enforcing |
266 | cardinality is necessarily left to the Visitor. |
ad0f8fa6 |
267 | |
804bd4ab |
268 | Operators also have restrictions on the types of arguments they will accept. The |
269 | first argument may or may not restricted in the same fashion as the other |
270 | arguments. As with cardinality, this restriction will need to be managed by the |
271 | Visitor. |
272 | |
273 | The operator name needs to take into account the possibility that the RDBMS may |
274 | allow UDFs (User-Defined Functions) that have the same name as an operator, such |
275 | as 'AND'. This will have to be managed by the Visitor. |
ad0f8fa6 |
276 | |
d6e108eb |
277 | =head3 Subquery |
278 | |
279 | A Subquery is another AST whose _query metadata parameter is set to "SELECT". |
280 | |
281 | Most places that a Subquery can be used would require a single value to be |
282 | returned (single column, single row), but that is not something that the AST can |
ad0f8fa6 |
283 | easily enforce. The single-column restriction may possibly be enforced, but the |
d6e108eb |
284 | single-row restriction is much more difficult and, in most cases, probably |
285 | impossible. |
286 | |
7c66a0ab |
287 | Subqueries, when expressed in SQL, must be bounded by parentheses. |
81cd86f1 |
288 | |
d6e108eb |
289 | =head3 Expression |
290 | |
7c66a0ab |
291 | An Expression can be any one of the following: |
d6e108eb |
292 | |
293 | =over 4 |
294 | |
804bd4ab |
295 | =item * Identifier |
296 | |
10000e9e |
297 | =item * Value |
d6e108eb |
298 | |
804bd4ab |
299 | =item * Operator |
d6e108eb |
300 | |
301 | =item * Subquery |
302 | |
d6e108eb |
303 | =back |
304 | |
7c66a0ab |
305 | An Expression is a meta-syntactic unit. An "Expression" unit will never appear |
306 | within the AST. It acts as a junction. |
307 | |
308 | =head3 ExpressionList |
309 | |
310 | An ExpressionList is a list of Expressions, generally separated by commas |
311 | (though other separators may be appropriate at times or for different SQL |
312 | dialects). |
81cd86f1 |
313 | |
7c66a0ab |
314 | The hash for an ExpressionList is as follows: |
ad0f8fa6 |
315 | |
316 | { |
804bd4ab |
317 | type => 'ExpressionList', |
7c66a0ab |
318 | separator => ',', |
319 | elements => Array of Expressions, |
ad0f8fa6 |
320 | } |
321 | |
7c66a0ab |
322 | An ExpressionList is always rendered in SQL with parentheses around it. |
323 | |
d6e108eb |
324 | =head2 SQL clauses |
325 | |
10000e9e |
326 | These are all the legal and acceptable clauses within the AST that would |
327 | correpsond to clauses in a SQL statement. Not all clauses are legal within a |
328 | given RDBMS engine's SQL dialect and some clauses may be required in one and |
329 | optional in another. Detecting and enforcing those engine-specific restrictions |
330 | is the responsibility of the Visitor object. |
331 | |
332 | The clauses are defined with a yacc-like syntax. The various parts are: |
333 | |
334 | =over 4 |
335 | |
336 | =item * := |
337 | |
338 | This means "defined" and is used to create a new term to be used below. |
339 | |
340 | =item * [] |
341 | |
342 | This means optional and indicates that the items within it are optional. |
343 | |
344 | =item * []* |
345 | |
346 | This means optional and repeating as many times as desired. |
347 | |
348 | =item * | |
349 | |
350 | This means alternation. It is a binary operator and indicates that either the |
351 | left or right hand sides may be used, but not both. |
352 | |
353 | =item * C<< <> >> |
354 | |
355 | This is a grouping construct. It means that all elements within this construct |
356 | are treated together for the purposes of optional, repeating, alternation, etc. |
357 | |
358 | =back |
359 | |
d6e108eb |
360 | The expected clauses are (name and structure): |
361 | |
362 | =head3 select |
363 | |
81cd86f1 |
364 | This corresponds to the SELECT clause of a SELECT statement. |
365 | |
7c66a0ab |
366 | A select clause unit is an array of one or more SelectComponent units. |
81cd86f1 |
367 | |
7c66a0ab |
368 | The hash for a SelectComponent unit is composed as follows: |
81cd86f1 |
369 | |
7c66a0ab |
370 | { |
804bd4ab |
371 | type => 'SelectComponent', |
7c66a0ab |
372 | value => Expression, |
373 | [ as => Identifier, ] |
374 | } |
375 | |
376 | The 'as' component is optional. Visitors may choose to make it required in |
377 | certain situations. |
d6e108eb |
378 | |
379 | =head3 tables |
380 | |
381 | This is a list of tables that this clause is affecting. It corresponds to the |
81cd86f1 |
382 | FROM clause in a SELECT statement and the INSERT INTO/UPDATE/DELETE clauses in |
383 | those respective statements. Depending on the _query metadata entry, the |
384 | appropriate clause name will be used. |
d6e108eb |
385 | |
7c66a0ab |
386 | A tables clause unit is an array of one or more TableComponent units. |
387 | |
d6e108eb |
388 | The tables clause has several RDBMS-specific variations. The AST will support |
389 | all of them and it is up to the Visitor object constructing the actual SQL to |
390 | validate and/or use what is provided as appropriate. |
391 | |
7c66a0ab |
392 | The hash for a TableJoin will be composed as follows: |
393 | |
394 | # TableJoin |
395 | { |
804bd4ab |
396 | type => 'TableJoin', |
7c66a0ab |
397 | join => < LEFT|RIGHT [ OUTER ] > | INNER | CROSS | ',', |
398 | [ using => IdentifierList, ] |
399 | [ on => ExpressionList, ] |
400 | } |
401 | |
402 | A TableJoin may not have both a 'using' element and an 'on' element. It may |
403 | have one of them if the 'join' element is not equal to ',' but doesn't have to. |
404 | If the 'join' element is equal to ',', then it may not have either a 'using' or |
405 | an 'on' element. |
406 | |
407 | The hash for a TableIdentifier will be composed as follows: |
d6e108eb |
408 | |
7c66a0ab |
409 | # TableIdentifier |
410 | { |
804bd4ab |
411 | type => 'TableIdentifier', |
7c66a0ab |
412 | value => Identifier | SubQuery |
413 | [ join => TableJoin, ] |
414 | [ as => Identifier, ] |
415 | } |
416 | |
417 | The first TableComponent in a tables clause may not have a join element. All |
418 | other TableComponent elements that do not have a join element will have a |
419 | default join element of: |
420 | |
421 | { |
804bd4ab |
422 | type => 'TableJoin', |
7c66a0ab |
423 | join => ',', |
424 | } |
d6e108eb |
425 | |
7c66a0ab |
426 | The 'as' component is optional. Visitors may choose to make it required in |
427 | certain situations (such as MySQL requiring an alias for subqueries). |
d6e108eb |
428 | |
429 | Additionally, where aliases are provided for in the TableIdentifier, those |
430 | aliases must be used as the tablename in subsequent Identifiers that identify a |
7c66a0ab |
431 | column of that table. This may be enforceable by the AST or the Visitor. But, it |
432 | is more likely that it will not be. |
d6e108eb |
433 | |
434 | =head3 where |
435 | |
81cd86f1 |
436 | This corresponds to the WHERE clause in a SELECT, UPDATE, or DELETE statement. |
437 | |
438 | A where clause is composed as follows: |
439 | |
440 | WhereOperator := AND | OR |
441 | WhereExpression := Expression | Expression WhereOperator Expression |
442 | |
443 | WhereExpression |
444 | |
d6e108eb |
445 | =head3 set |
446 | |
81cd86f1 |
447 | This corresponds to the SET clause in an INSERT or UPDATE statement. |
448 | |
449 | A set clause is composed as follows: |
450 | |
451 | SetComponent := Identifier = Expression |
452 | |
453 | SetComponent [ , SetComponent ]* |
454 | |
455 | =head3 columns |
456 | |
457 | This corresponds to the optional list of columns in an INSERT statement. |
458 | |
7c66a0ab |
459 | A columns clause is an IdentifierList and the unit is composed as follows: |
81cd86f1 |
460 | |
7c66a0ab |
461 | columns => [ |
462 | Identifier, |
463 | [ Identifier, ]* |
464 | ], |
81cd86f1 |
465 | |
d6e108eb |
466 | =head3 values |
467 | |
81cd86f1 |
468 | This corresponds to the VALUES clause in an INSERT statement. |
469 | |
7c66a0ab |
470 | A values clause is an ExpressionList and the unit is composed as follows. |
81cd86f1 |
471 | |
7c66a0ab |
472 | values => [ |
473 | Expression, |
474 | [ Expression, ]* |
475 | ], |
81cd86f1 |
476 | |
477 | If there is a columns clause, the number of entries in the values clause must be |
478 | equal to the number of entries in the columns clause. |
479 | |
d6e108eb |
480 | =head3 orderby |
481 | |
81cd86f1 |
482 | This corresponds to the ORDER BY clause in a SELECT statement. |
483 | |
484 | An orderby clause is composed as follows: |
485 | |
10000e9e |
486 | OrderByComponent := XXX-TODO-XXX |
81cd86f1 |
487 | OrderByDirection := ASC | DESC |
488 | |
489 | OrderByComponent [ OrderByDirection ] |
490 | [ , OrderByComponent [ OrderByDirection ] ]* |
491 | |
d6e108eb |
492 | =head3 groupby |
493 | |
81cd86f1 |
494 | This corresponds to the GROUP BY clause in a SELECT statement. |
495 | |
496 | An groupby clause is composed as follows: |
497 | |
10000e9e |
498 | GroupByComponent := XXX-TODO-XXX |
81cd86f1 |
499 | |
500 | GroupByComponent [ , GroupByComponent ]* |
501 | |
d6e108eb |
502 | =head3 rows |
503 | |
81cd86f1 |
504 | This corresponds to the clause that is used in some RDBMS engines to limit the |
505 | number of rows returned by a query. In MySQL, this would be the LIMIT clause. |
506 | |
507 | A rows clause is composed as follows: |
508 | |
509 | Number [, Number ] |
510 | |
d6e108eb |
511 | =head3 for |
512 | |
81cd86f1 |
513 | This corresponds to the clause that is used in some RDBMS engines to indicate |
514 | what locks are to be taken by this SELECT statement. |
515 | |
516 | A for clause is composed as follows: |
517 | |
518 | UPDATE | DELETE |
519 | |
520 | =head3 connectby |
521 | |
522 | This corresponds to the clause that is used in some RDBMS engines to provide for |
523 | an adjacency-list query. |
524 | |
525 | A connectby clause is composed as follows: |
526 | |
527 | Identifier, WhereExpression |
528 | |
7c66a0ab |
529 | =head1 EXAMPLES |
530 | |
531 | The following are example SQL statements and a possible AST for each one. |
532 | |
533 | =over 4 |
534 | |
535 | =item * SELECT 1 |
536 | |
537 | { |
538 | _query => 'select', |
539 | _ast_version => 0.0001, |
540 | select => [ |
541 | { |
804bd4ab |
542 | type => 'SelectComponent', |
7c66a0ab |
543 | value => { |
804bd4ab |
544 | type => 'Value', |
7c66a0ab |
545 | subtype => 'number', |
546 | value => 1, |
547 | }, |
548 | }, |
549 | ], |
550 | } |
551 | |
552 | =item * SELECT NOW() AS time FROM dual AS duality |
553 | |
554 | { |
555 | _query => 'select', |
556 | _ast_version => 0.0001, |
557 | select => [ |
558 | { |
804bd4ab |
559 | type => 'SelectComponent', |
7c66a0ab |
560 | value => { |
804bd4ab |
561 | type => 'Function', |
7c66a0ab |
562 | function => 'NOW', |
563 | }, |
564 | as => { |
804bd4ab |
565 | type => 'Identifier', |
7c66a0ab |
566 | element1 => 'time', |
567 | }, |
568 | }, |
569 | ], |
570 | tables => [ |
571 | { |
804bd4ab |
572 | type => 'TablesComponent', |
7c66a0ab |
573 | value => { |
804bd4ab |
574 | type => 'Identifier', |
7c66a0ab |
575 | element1 => 'dual', |
576 | }, |
577 | as => { |
804bd4ab |
578 | type => 'Identifier', |
7c66a0ab |
579 | element1 => 'duality', |
580 | }, |
581 | }, |
582 | ], |
583 | } |
584 | |
585 | =back |
586 | |
d6e108eb |
587 | =head1 AUTHORS |
588 | |
81cd86f1 |
589 | robkinyon: Rob Kinyon C<< <rkinyon@cpan.org> >> |
d6e108eb |
590 | |
591 | =head1 LICENSE |
592 | |
593 | You may distribute this code under the same terms as Perl itself. |
594 | |
595 | =cut |