Commit | Line | Data |
d6e108eb |
1 | =head1 NAME |
2 | |
3 | SQL::Abstract::Manual::Specification |
4 | |
5 | =head1 SYNOPSIS |
6 | |
7 | This discusses the specification for the AST provided by L<SQL::Abstract>. It is |
8 | meant to describe how the AST is structured, various components provided by |
9 | L<SQL::Abstract> for use with this AST, how to manipulate the AST, and various |
10 | uses for the AST once it is generated. |
11 | |
12 | =head1 MOTIVATIONS |
13 | |
14 | L<SQL::Abstract> has been in use for many years. Originally created to handle |
15 | the where-clause formation found in L<DBIx::Abstract>, it was generalized to |
16 | manage the creation of any SQL statement through the use of Perl structures. |
17 | Through the beating it received as the SQL generation syntax for L<DBIx::Class>, |
18 | various deficiencies were found and a generalized SQL AST was designed. This |
19 | document describes that AST. |
20 | |
21 | =head1 GOALS |
22 | |
23 | The goals for this AST are as follows: |
24 | |
25 | =head2 SQL-specific semantics |
26 | |
27 | Instead of attempting to be an AST to handle any form of query, this will |
28 | instead be specialized to manage SQL queries (and queries that map to SQL |
29 | queries). This means that there will be support for SQL-specific features, such |
30 | as placeholders. |
31 | |
32 | =head2 Perl-specific semantics |
33 | |
34 | This AST is meant to be used from within Perl5 only. So, it will take advantage |
35 | of as many Perl-specific features that make sense to use. No attempt whatosever |
36 | will be made to make this AST work within any other language, including Perl6. |
37 | |
38 | =head2 Whole-lifecycle management |
39 | |
40 | Whether a query is built out of whole cloth in one shot or cobbled together from |
41 | several snippets over the lifetime of a process, this AST will support any way |
42 | to construct the query. Queries can also be built from other queries, so an |
43 | UPDATE statement could be used as the basis for a SELECT statement, DELETE |
44 | statement, or even a DDL statement of some kind. |
45 | |
46 | =head2 Dialect-agnostic usage |
47 | |
48 | Even though SQL itself has several ANSI specifications (SQL-92 and SQL-99 among |
49 | them), this only serves as a basis for what a given RDBMS will expect. However, |
50 | every engine has its own specific extensions and specific ways of handling |
393a4eb8 |
51 | common features. The AST will provide ways of expressing common functionality in |
52 | a common language. The emitters (objects that follow the Visitor pattern) will |
53 | be responsible for converting that common language into RDBMS-specific SQL. |
54 | |
ad0f8fa6 |
55 | =head1 RESTRICTIONS |
56 | |
57 | The following are the restrictions upon the AST: |
58 | |
59 | =head2 DML-only |
60 | |
61 | The AST will only support DML (Data Modelling Language). It will not (currently) |
62 | support DDL (Data Definition Language). Practically, this means that the only |
63 | statements supported will be: |
64 | |
65 | =over 4 |
66 | |
67 | =item * SELECT |
68 | |
69 | =item * INSERT INTO |
70 | |
71 | =item * UPDATE |
72 | |
73 | =item * DELETE |
74 | |
75 | =back |
76 | |
77 | Additional DML statements may be supported by specific Visitors (such as a |
78 | MySQL visitor supporting REPLACE INTO). q.v. the relevant sections of this |
79 | specification for details. |
80 | |
804bd4ab |
81 | =head2 Dialect-agnostic construction |
82 | |
83 | The AST will not attempt to be immediately readable to a human as SQL. In fact, |
84 | due to the dialect differences, particularly in terms of which use operators and |
cca4daf5 |
85 | which use functions for a given action, the AST will provide simple units. It is |
86 | the responsibility of the Visitor to provide the appropriate SQL. Furthermore, |
87 | the AST will be very generic and only provide hints for a subset of SQL. If a |
88 | Visitor is sufficiently intelligent, pretty SQL may be emitted, but that is not |
89 | the goal of this AST. |
804bd4ab |
90 | |
393a4eb8 |
91 | =head1 COMPONENTS |
92 | |
93 | There are two major components to SQL::Abstract v2. |
94 | |
95 | =over 4 |
96 | |
97 | =item * AST |
98 | |
99 | This is the Abstract Syntax Tree. It is a data structure that represents |
100 | everything necessary to construct the SQL statement in whatever dialect the |
101 | user requires. |
102 | |
103 | =item * Visitor |
104 | |
105 | This object conforms to the Visitor pattern and is used to generate the SQL |
106 | represented by the AST. Each dialect will have a different Visitor object. In |
107 | addition, there will be visitors for at least one of the ANSI specifications. |
108 | |
109 | =back |
d6e108eb |
110 | |
df35a525 |
111 | The division of duties between the two components will focus on what the AST |
112 | can and cannot assume. For example, identifiers do not have 20 components in |
113 | any dialect, so the AST can validate that. However, determining what |
114 | constitutes a legal identifier can only be determined by the Visitor object |
115 | enforcing that dialect's rules. |
116 | |
d6e108eb |
117 | =head1 AST STRUCTURE |
118 | |
393a4eb8 |
119 | The AST will be a HoHo..oH (hash of hash of ... of hashes). The keys to the |
120 | outermost hash will be the various clauses of a SQL statement, plus some |
37f2cc3f |
121 | metadata keys. |
d6e108eb |
122 | |
123 | =head2 Metadata keys |
124 | |
125 | These are the additional metadata keys that the AST provides for. |
126 | |
37f2cc3f |
127 | =head3 type |
df35a525 |
128 | |
129 | This denotes what kind of query this AST should be interpreted as. Different |
37f2cc3f |
130 | Visitors may accept additional values for type. For example, a MySQL Visitor |
131 | may choose to accept 'replace' for REPLACE INTO. If a type value is |
7c66a0ab |
132 | unrecognized by the Visitor, the Visitor is expected to throw an error. |
df35a525 |
133 | |
37f2cc3f |
134 | All Visitors are expected to handle the following values for type: |
df35a525 |
135 | |
d6e108eb |
136 | =over 4 |
137 | |
df35a525 |
138 | =item * select |
139 | |
140 | This is a SELECT statement. |
d6e108eb |
141 | |
df35a525 |
142 | =item * insert |
d6e108eb |
143 | |
df35a525 |
144 | This is an INSERT statement. |
393a4eb8 |
145 | |
df35a525 |
146 | =item * update |
147 | |
148 | This is an UPDATE statement. |
149 | |
150 | =item * delete |
151 | |
152 | This is a DELETE statement. |
d6e108eb |
153 | |
154 | =back |
155 | |
37f2cc3f |
156 | =head3 ast_version |
df35a525 |
157 | |
158 | This denotes the version of the AST. Different versions will indicate different |
37f2cc3f |
159 | capabilities provided. Visitors will choose to respect the ast_version as needed |
df35a525 |
160 | and desired. |
161 | |
d6e108eb |
162 | =head2 Structural units |
163 | |
df35a525 |
164 | All structural units will be hashes. These hashes will have, at minimum, the |
165 | following keys: |
166 | |
167 | =over 4 |
168 | |
804bd4ab |
169 | =item * type |
df35a525 |
170 | |
171 | This indicates the structural unit that this hash is representing. While this |
172 | specification provides for standard structural units, different Visitors may |
173 | choose to accept additional units as desired. If a Visitor encounters a unit it |
174 | doesn't know how to handle, it is expected to throw an exception. |
175 | |
176 | =back |
177 | |
d6e108eb |
178 | Structural units in the AST are supported by loaded components. L<SQL::Abstract> |
179 | provides for the following structural units by default: |
180 | |
181 | =head3 Identifier |
182 | |
df35a525 |
183 | This is a (potentially) fully canonicalized identifier for a elemnt in the |
184 | query. This element could be a schema, table, or column. The Visitor will |
185 | determine validity within the context of that SQL dialect. The AST is only |
186 | responsible for validating that the elements are non-empty Strings. |
187 | |
188 | The hash will be structured as follows: |
189 | |
190 | { |
804bd4ab |
191 | type => 'Identifier', |
7c66a0ab |
192 | element1 => Scalar, |
193 | element2 => Scalar, |
194 | element3 => Scalar, |
df35a525 |
195 | } |
d6e108eb |
196 | |
7c66a0ab |
197 | If element3 exists, then element2 must exist. element1 must always exist. If a |
198 | given element exists, then it must be defined and of non-zero length. |
199 | |
ad0f8fa6 |
200 | Visitors are expected to, by default, quote all identifiers according to the SQL |
201 | dialect's quoting scheme. |
d6e108eb |
202 | |
4f6e8987 |
203 | Any of the elements may be '*', as in SELECT * or SELECT COUNT(*). Visitors must |
204 | be careful to I<not> quote asterisks. |
205 | |
10000e9e |
206 | =head3 Value |
d6e108eb |
207 | |
da93022e |
208 | A Value is a Perl scalar. Depending on the subtype, a Visitor may be able to |
209 | make certain decisions. The following are the minimally-valid subtypes: |
10000e9e |
210 | |
211 | =over 4 |
212 | |
213 | =item * String |
214 | |
7c66a0ab |
215 | A String is a quoted series of characters. The Visitor is expected to ensure |
216 | that embedded quotes are properly handled per the SQL dialect's quoting scheme. |
10000e9e |
217 | |
218 | =item * Number |
219 | |
7c66a0ab |
220 | A Number is an unquoted number in some numeric format. |
10000e9e |
221 | |
ad0f8fa6 |
222 | =item * Null |
10000e9e |
223 | |
ad0f8fa6 |
224 | Null is SQL's NULL and corresponds to Perl's C<undef>. |
10000e9e |
225 | |
226 | =item * BindParameter |
227 | |
228 | This corresponds to a value that will be passed in. This value is normally |
229 | quoted in such a fashion so as to protect against SQL injection attacks. (q.v. |
230 | L<DBI/quote()> for an example.) |
231 | |
7c66a0ab |
232 | BindParameters are normally represented by a '?'. |
233 | |
10000e9e |
234 | =back |
235 | |
a3872878 |
236 | The hash will be structured as follows: |
237 | |
238 | { |
804bd4ab |
239 | type => 'Value' |
7c66a0ab |
240 | subtype => [ 'String' | 'Number' | 'Null' | 'BindParameter' ] |
241 | value => Scalar |
a3872878 |
242 | } |
243 | |
244 | The provided subtypes are the ones that all Visitors are expected to support. |
245 | Visitors may choose to support additional subtypes. Visitors are expected to |
246 | throw an exception upon encountering an unknown subtype. |
d6e108eb |
247 | |
804bd4ab |
248 | =head3 Operator |
81cd86f1 |
249 | |
804bd4ab |
250 | An Operator would be, in SQL dialect terms, a unary operator, a binary operator, |
251 | a trinary operator, or a function. Since different dialects may have a given |
252 | functionality as an operator or a function (such as CONCAT in MySQl vs. || in |
253 | Oracle for string concatenation), they will be represented in the AST as generic |
254 | operators. |
d6e108eb |
255 | |
7c66a0ab |
256 | The hash will be structured as follows: |
257 | |
258 | { |
804bd4ab |
259 | type => 'Operator', |
260 | op => String, |
f32d60b9 |
261 | args => [ |
262 | Expression, |
263 | ], |
7c66a0ab |
264 | } |
265 | |
804bd4ab |
266 | Operators have a cardinality, or expected number of arguments. Some operators, |
ad0f8fa6 |
267 | such as MAX(), have a cardinality of 1. Others, such as IF(), have a cardinality |
268 | of N, meaning they can have any number of arguments greater than 0. Others, such |
804bd4ab |
269 | as NOW(), have a cardinality of 0. Several operators with the same meaning may |
ad0f8fa6 |
270 | have a different cardinality in different SQL dialects as different engines may |
804bd4ab |
271 | allow different behaviors. As cardinality may differ between dialects, enforcing |
272 | cardinality is necessarily left to the Visitor. |
ad0f8fa6 |
273 | |
804bd4ab |
274 | Operators also have restrictions on the types of arguments they will accept. The |
275 | first argument may or may not restricted in the same fashion as the other |
276 | arguments. As with cardinality, this restriction will need to be managed by the |
277 | Visitor. |
278 | |
279 | The operator name needs to take into account the possibility that the RDBMS may |
280 | allow UDFs (User-Defined Functions) that have the same name as an operator, such |
281 | as 'AND'. This will have to be managed by the Visitor. |
ad0f8fa6 |
282 | |
d6e108eb |
283 | =head3 Subquery |
284 | |
37f2cc3f |
285 | A Subquery is another AST whose type metadata parameter is set to "SELECT". |
d6e108eb |
286 | |
287 | Most places that a Subquery can be used would require a single value to be |
288 | returned (single column, single row), but that is not something that the AST can |
ad0f8fa6 |
289 | easily enforce. The single-column restriction may possibly be enforced, but the |
d6e108eb |
290 | single-row restriction is much more difficult and, in most cases, probably |
291 | impossible. |
292 | |
7c66a0ab |
293 | Subqueries, when expressed in SQL, must be bounded by parentheses. |
81cd86f1 |
294 | |
662b716d |
295 | =head3 Alias |
296 | |
297 | An Alias is any place where the construct "X as Y" appears. While the "as Y" is |
298 | often optional, the AST will make it required. |
299 | |
300 | The hash will be structured as follows: |
301 | |
302 | { |
303 | type => 'Alias', |
304 | value => Expression, |
305 | as => String, |
306 | } |
307 | |
d6e108eb |
308 | =head3 Expression |
309 | |
7c66a0ab |
310 | An Expression can be any one of the following: |
d6e108eb |
311 | |
312 | =over 4 |
313 | |
804bd4ab |
314 | =item * Identifier |
315 | |
10000e9e |
316 | =item * Value |
d6e108eb |
317 | |
804bd4ab |
318 | =item * Operator |
d6e108eb |
319 | |
320 | =item * Subquery |
321 | |
662b716d |
322 | =item * Alias |
323 | |
d6e108eb |
324 | =back |
325 | |
7c66a0ab |
326 | An Expression is a meta-syntactic unit. An "Expression" unit will never appear |
327 | within the AST. It acts as a junction. |
328 | |
4f6e8987 |
329 | =head3 Nesting |
330 | |
3d8ddf0b |
331 | There is no specific operator or nodetype for nesting. Instead, nesting is |
332 | explicitly specified by node descent in the AST. |
4f6e8987 |
333 | |
d6e108eb |
334 | =head2 SQL clauses |
335 | |
10000e9e |
336 | These are all the legal and acceptable clauses within the AST that would |
337 | correpsond to clauses in a SQL statement. Not all clauses are legal within a |
338 | given RDBMS engine's SQL dialect and some clauses may be required in one and |
339 | optional in another. Detecting and enforcing those engine-specific restrictions |
340 | is the responsibility of the Visitor object. |
341 | |
bc06d3c1 |
342 | The following clauses are expected to be handled by Visitors for each statement: |
10000e9e |
343 | |
344 | =over 4 |
345 | |
bc06d3c1 |
346 | =item * SELECT |
10000e9e |
347 | |
bc06d3c1 |
348 | =over 4 |
10000e9e |
349 | |
bc06d3c1 |
350 | =item * select |
10000e9e |
351 | |
bc06d3c1 |
352 | =item * tables |
10000e9e |
353 | |
bc06d3c1 |
354 | =item * where |
10000e9e |
355 | |
bc06d3c1 |
356 | =item * orderby |
10000e9e |
357 | |
bc06d3c1 |
358 | =item * groupby |
359 | |
360 | =back |
361 | |
362 | =item * insert |
363 | |
364 | =over 4 |
10000e9e |
365 | |
bc06d3c1 |
366 | =item * tables |
10000e9e |
367 | |
bc06d3c1 |
368 | =item * columns |
10000e9e |
369 | |
bc06d3c1 |
370 | =item * values |
371 | |
372 | =back |
373 | |
374 | There are RDBMS-specific variations of the INSERT statement, such the one in |
375 | MySQL's |
376 | |
377 | =item * update |
378 | |
379 | =over 4 |
380 | |
381 | =item * tables |
382 | |
383 | =item * set |
384 | |
385 | =item * where |
386 | |
387 | =back |
388 | |
389 | =item * delete |
390 | |
391 | =over 4 |
392 | |
393 | =item * tables |
394 | |
395 | =item * where |
396 | |
397 | =back |
10000e9e |
398 | |
399 | =back |
400 | |
d6e108eb |
401 | The expected clauses are (name and structure): |
402 | |
403 | =head3 select |
404 | |
81cd86f1 |
405 | This corresponds to the SELECT clause of a SELECT statement. |
406 | |
662b716d |
407 | A select clause unit is an array of one or more Expressions. |
d6e108eb |
408 | |
409 | =head3 tables |
410 | |
411 | This is a list of tables that this clause is affecting. It corresponds to the |
81cd86f1 |
412 | FROM clause in a SELECT statement and the INSERT INTO/UPDATE/DELETE clauses in |
37f2cc3f |
413 | those respective statements. Depending on the type metadata entry, the |
81cd86f1 |
414 | appropriate clause name will be used. |
d6e108eb |
415 | |
416 | The tables clause has several RDBMS-specific variations. The AST will support |
417 | all of them and it is up to the Visitor object constructing the actual SQL to |
418 | validate and/or use what is provided as appropriate. |
419 | |
662b716d |
420 | A tables clause is an Expression. |
7c66a0ab |
421 | |
cca4daf5 |
422 | The hash for an Operator within a tables clause will be composed as follows: |
423 | |
424 | # Operator |
7c66a0ab |
425 | { |
cca4daf5 |
426 | type => 'Operator', |
427 | op => '< LEFT|RIGHT|FULL [ OUTER ] > | INNER | CROSS', |
428 | on => Expression, |
7c66a0ab |
429 | } |
d6e108eb |
430 | |
cca4daf5 |
431 | A USING clause is syntactic sugar for an ON clause and, as such, is not provided |
da74c1c8 |
432 | for by the AST. A join of a comma is identical to a CROSS JOIN and, as such, is |
433 | not provided for by the AST. The on clause is optional. |
d6e108eb |
434 | |
435 | =head3 where |
436 | |
81cd86f1 |
437 | This corresponds to the WHERE clause in a SELECT, UPDATE, or DELETE statement. |
438 | |
37f2cc3f |
439 | A where clause is composed of an Expression. |
81cd86f1 |
440 | |
d6e108eb |
441 | =head3 set |
442 | |
81cd86f1 |
443 | This corresponds to the SET clause in an INSERT or UPDATE statement. |
444 | |
753e226d |
445 | A set clause unit is an array of one or more SetComponent units. |
81cd86f1 |
446 | |
753e226d |
447 | The hash for SetComponent unit is composed as follows: |
81cd86f1 |
448 | |
753e226d |
449 | { |
bc06d3c1 |
450 | type => 'SetComponent', |
451 | col => Identifier, |
753e226d |
452 | value => Expression, |
453 | } |
81cd86f1 |
454 | |
455 | =head3 columns |
456 | |
457 | This corresponds to the optional list of columns in an INSERT statement. |
458 | |
338df86b |
459 | A columns clause unit is an array of one or more Identifier units. |
81cd86f1 |
460 | |
d6e108eb |
461 | =head3 values |
462 | |
81cd86f1 |
463 | This corresponds to the VALUES clause in an INSERT statement. |
464 | |
338df86b |
465 | A values clause unit is an array of one or more Expression units. |
81cd86f1 |
466 | |
467 | If there is a columns clause, the number of entries in the values clause must be |
468 | equal to the number of entries in the columns clause. |
469 | |
d6e108eb |
470 | =head3 orderby |
471 | |
81cd86f1 |
472 | This corresponds to the ORDER BY clause in a SELECT statement. |
473 | |
da74c1c8 |
474 | A orderby clause unit is an array of one or more OrderbyComponent units. |
81cd86f1 |
475 | |
da74c1c8 |
476 | The hash for a OrderbyComponent unit is composed as follows: |
81cd86f1 |
477 | |
da74c1c8 |
478 | { |
479 | type => 'OrderbyComponent', |
bc06d3c1 |
480 | value => Expression, |
da74c1c8 |
481 | dir => '< ASC | DESC >', |
482 | } |
483 | |
bc06d3c1 |
484 | The value should either be an Identifier or a Number. The dir element, if |
485 | omitted, will be defaulted to ASC by the AST. The number corresponds to a column |
486 | in the select clause. |
81cd86f1 |
487 | |
d6e108eb |
488 | =head3 groupby |
489 | |
81cd86f1 |
490 | This corresponds to the GROUP BY clause in a SELECT statement. |
491 | |
da74c1c8 |
492 | A groupby clause unit is an array of one or more GroupbyComponent units. |
81cd86f1 |
493 | |
da74c1c8 |
494 | The hash for a GroupbyComponent unit is composed as follows: |
495 | |
496 | { |
497 | type => 'GroupbyComponent', |
bc06d3c1 |
498 | value => Expression, |
da74c1c8 |
499 | } |
81cd86f1 |
500 | |
bc06d3c1 |
501 | The value should either be an Identifier or a Number. The number corresponds to |
502 | a column in the select clause. |
503 | |
504 | =head2 Possible RDBMS-specific clauses |
505 | |
506 | The following clauses are provided as examples for RDBMS-specific elements. They |
507 | are B<not> expected to be supported by all Visitors. Visitors may choose whether |
508 | or not to throw on an unexpected clause, though it is strongly recommended. |
81cd86f1 |
509 | |
d6e108eb |
510 | =head3 rows |
511 | |
81cd86f1 |
512 | This corresponds to the clause that is used in some RDBMS engines to limit the |
bc06d3c1 |
513 | number of rows returned by a SELECT statement. In MySQL, this would be the LIMIT |
514 | clause. |
81cd86f1 |
515 | |
e4a310cb |
516 | The hash for a rows clause is composed as follows: |
81cd86f1 |
517 | |
e4a310cb |
518 | { |
e4a310cb |
519 | start => Number, |
520 | count => Number, |
521 | } |
522 | |
523 | The start attribute, if ommitted, will default to 0. The count attribute is |
524 | optional. |
81cd86f1 |
525 | |
d6e108eb |
526 | =head3 for |
527 | |
81cd86f1 |
528 | This corresponds to the clause that is used in some RDBMS engines to indicate |
529 | what locks are to be taken by this SELECT statement. |
530 | |
e4a310cb |
531 | The hash for a for clause is composed as follows: |
81cd86f1 |
532 | |
e4a310cb |
533 | { |
534 | value => '< UPDATE | DELETE >', |
535 | } |
81cd86f1 |
536 | |
537 | =head3 connectby |
538 | |
539 | This corresponds to the clause that is used in some RDBMS engines to provide for |
540 | an adjacency-list query. |
541 | |
22033e85 |
542 | The hash for a for clause is composed as follows: |
543 | |
544 | { |
f32d60b9 |
545 | start_with => [ |
546 | Expression, |
547 | ], |
22033e85 |
548 | connect_by => { |
549 | option => '< PRIOR | NOCYCLE >' |
f32d60b9 |
550 | cond => [ |
551 | Expression, |
552 | ], |
22033e85 |
553 | }, |
554 | order_siblings => orderby-clause, |
555 | } |
81cd86f1 |
556 | |
22033e85 |
557 | Both the start_with and order_siblings clauses are optional. |
81cd86f1 |
558 | |
cca4daf5 |
559 | =head1 TODO |
560 | |
561 | =over 4 |
562 | |
563 | =item * sproc unit |
564 | |
662b716d |
565 | =item * UNION, UNION ALL, and MINUS |
566 | |
567 | =item * INSERT INTO <table> SELECT ... |
568 | |
569 | =item * INSERT INTO <table> SET ... |
570 | |
cca4daf5 |
571 | =back |
572 | |
d6e108eb |
573 | =head1 AUTHORS |
574 | |
81cd86f1 |
575 | robkinyon: Rob Kinyon C<< <rkinyon@cpan.org> >> |
d6e108eb |
576 | |
577 | =head1 LICENSE |
578 | |
579 | You may distribute this code under the same terms as Perl itself. |
580 | |
581 | =cut |