a common language. The emitters (objects that follow the Visitor pattern) will
be responsible for converting that common language into RDBMS-specific SQL.
+=head1 RESTRICTIONS
+
+The following are the restrictions upon the AST:
+
+=head2 DML-only
+
+The AST will only support DML (Data Modelling Language). It will not (currently)
+support DDL (Data Definition Language). Practically, this means that the only
+statements supported will be:
+
+=over 4
+
+=item * SELECT
+
+=item * INSERT INTO
+
+=item * UPDATE
+
+=item * DELETE
+
+=back
+
+Additional DML statements may be supported by specific Visitors (such as a
+MySQL visitor supporting REPLACE INTO). q.v. the relevant sections of this
+specification for details.
+
+=head2 Dialect-agnostic construction
+
+The AST will not attempt to be immediately readable to a human as SQL. In fact,
+due to the dialect differences, particularly in terms of which use operators and
+which use functions for a given action, the AST will provide simple units. It is
+the responsibility of the Visitor to provide the appropriate SQL. Furthermore,
+the AST will be very generic and only provide hints for a subset of SQL. If a
+Visitor is sufficiently intelligent, pretty SQL may be emitted, but that is not
+the goal of this AST.
+
=head1 COMPONENTS
There are two major components to SQL::Abstract v2.
The AST will be a HoHo..oH (hash of hash of ... of hashes). The keys to the
outermost hash will be the various clauses of a SQL statement, plus some
-metadata keys. All metadata keys will be identifiable as such by being prefixed
-with an underscore. All keys will be in lowercase.
+metadata keys.
=head2 Metadata keys
These are the additional metadata keys that the AST provides for.
-=head3 _query
+=head3 type
This denotes what kind of query this AST should be interpreted as. Different
-Visitors may accept additional values for _query. For example, a MySQL Visitor
-may choose to accept 'replace'. If a _query value is unrecognized by the
-Visitor, the Visitor is expected to throw an error.
+Visitors may accept additional values for type. For example, a MySQL Visitor
+may choose to accept 'replace' for REPLACE INTO. If a type value is
+unrecognized by the Visitor, the Visitor is expected to throw an error.
-All Visitors are expected to handle the following values for _query:
+All Visitors are expected to handle the following values for type:
=over 4
=back
-=head3 _version
+=head3 ast_version
This denotes the version of the AST. Different versions will indicate different
-capabilities provided. Visitors will choose to respect the _version as needed
+capabilities provided. Visitors will choose to respect the ast_version as needed
and desired.
=head2 Structural units
=over 4
-=item * _name
+=item * type
This indicates the structural unit that this hash is representing. While this
specification provides for standard structural units, different Visitors may
The hash will be structured as follows:
{
- _name => 'identifier',
- items => [String],
+ type => 'Identifier',
+ elements => [ Scalar ],
}
-The items will always be quoted per the SQL dialect's quoting scheme. It is the
-responsibility of the Visitor to do this.
+All values in elements must be defined.
+
+Visitors are expected to, by default, quote all identifiers according to the SQL
+dialect's quoting scheme.
+
+Any of the elements may be '*', as in SELECT * or SELECT COUNT(*). Visitors must
+be careful to I<not> quote asterisks.
=head3 Value
-A Value is a Perl scalar. It may either be a:
+A Value is a Perl scalar. Depending on the subtype, a Visitor may be able to
+make certain decisions. The following are the minimally-valid subtypes:
=over 4
=item * String
-A String is a quoted series of characters
+A String is a quoted series of characters. The Visitor is expected to ensure
+that embedded quotes are properly handled per the SQL dialect's quoting scheme.
=item * Number
-A Number is an unquoted number in some numeric format
+A Number is an unquoted number in some numeric format.
-=item * NULL
+=item * Null
-NULL is SQL's NULL and corresponds to Perl's C<undef>.
+Null is SQL's NULL and corresponds to Perl's C<undef>.
=item * BindParameter
quoted in such a fashion so as to protect against SQL injection attacks. (q.v.
L<DBI/quote()> for an example.)
+BindParameters are normally represented by a '?'.
+
=back
-=head3
+The hash will be structured as follows:
-=head3 Function
+ {
+ type => 'Value'
+ subtype => [ 'String' | 'Number' | 'Null' | 'BindParameter' ]
+ value => Scalar
+ }
-A Function is anything of the form C< name( arglist ) > where C<name> is a
-string and C<arglist> is a comma-separated list of Expressions.
+The provided subtypes are the ones that all Visitors are expected to support.
+Visitors may choose to support additional subtypes. Visitors are expected to
+throw an exception upon encountering an unknown subtype.
-Yes, a Subquery is legal as an argument for many functions. Some example
-functions are:
+=head3 Operator
-=over 4
+An Operator would be, in SQL dialect terms, a unary operator, a binary operator,
+a trinary operator, or a function. Since different dialects may have a given
+functionality as an operator or a function (such as CONCAT in MySQl vs. || in
+Oracle for string concatenation), they will be represented in the AST as generic
+operators.
-=item * C<< IN >>
+The hash will be structured as follows:
-=item * C<< MAX >>
+ {
+ type => 'Operator',
+ op => String,
+ args => [
+ Expression,
+ ],
+ }
-=item * C<< MIN >>
+Operators have a cardinality, or expected number of arguments. Some operators,
+such as MAX(), have a cardinality of 1. Others, such as IF(), have a cardinality
+of N, meaning they can have any number of arguments greater than 0. Others, such
+as NOW(), have a cardinality of 0. Several operators with the same meaning may
+have a different cardinality in different SQL dialects as different engines may
+allow different behaviors. As cardinality may differ between dialects, enforcing
+cardinality is necessarily left to the Visitor.
-=item * C<< SUM >>
+Operators also have restrictions on the types of arguments they will accept. The
+first argument may or may not restricted in the same fashion as the other
+arguments. As with cardinality, this restriction will need to be managed by the
+Visitor.
-=back
+The operator name needs to take into account the possibility that the RDBMS may
+allow UDFs (User-Defined Functions) that have the same name as an operator, such
+as 'AND'. This will have to be managed by the Visitor.
=head3 Subquery
-A Subquery is another AST whose _query metadata parameter is set to "SELECT".
+A Subquery is another AST whose type metadata parameter is set to "SELECT".
Most places that a Subquery can be used would require a single value to be
returned (single column, single row), but that is not something that the AST can
-easily enforce. The single-column restriction can possibly be enforced, but the
+easily enforce. The single-column restriction may possibly be enforced, but the
single-row restriction is much more difficult and, in most cases, probably
impossible.
-Subqueries, when expressed in SQL, must bounded by parentheses.
+Subqueries, when expressed in SQL, must be bounded by parentheses.
-=head3 Unary Operator
+=head3 Alias
-A UnaryOperator takes a single argument on the RHS and is one of the following:
+An Alias is any place where the construct "X as Y" appears. While the "as Y" is
+often optional, the AST will make it required.
-=over 4
-
-=item * C<< NOT >>
+The hash will be structured as follows:
-=back
+ {
+ type => 'Alias',
+ value => Expression,
+ as => Identifier,
+ }
-=head3 BinaryOperator
+=head3 Expression
-A BinaryOperator takes two arguments (one on the LHS and one on the RHS) and is
-one of the following:
+An Expression can be any one of the following:
=over 4
-=item * C<< = >>
+=item * Identifier
-=item * C<< != >>
+=item * Value
-=item * C<< > >>
+=item * Operator
-=item * C<< < >>
+=item * Subquery
-=item * C<< >= >>
+=item * Alias
-=item * C<< <= >>
+=back
-=item * C<< IS >>
+An Expression is a meta-syntactic unit. An "Expression" unit will never appear
+within the AST. It acts as a junction.
-=item * C<< IS NOT >>
+=head3 Nesting
-=back
+There is no specific operator or nodetype for nesting. Instead, nesting is
+explicitly specified by node descent in the AST.
-Note that an operator can comprise of what would be multiple tokens in a normal
-parsing effort.
+=head2 SQL clauses
-=head3 Expression
+These are all the legal and acceptable clauses within the AST that would
+correpsond to clauses in a SQL statement. Not all clauses are legal within a
+given RDBMS engine's SQL dialect and some clauses may be required in one and
+optional in another. Detecting and enforcing those engine-specific restrictions
+is the responsibility of the Visitor object.
-An expression can be any one of the following:
+The following clauses are expected to be handled by Visitors for each statement:
=over 4
-=item * Value
+=item * select
-=item * Function
+=over 4
-=item * Subquery
+=item * select
+
+=item * tables
-=item * UnaryOperator Expression
+=item * where
-=item * Expression BinaryOperator Expression
+=item * orderby
-=item * ( Expression )
+=item * groupby
=back
-Parentheses indicate precedence and, in some situations, are necessary for
-certain operators.
+=item * insert
-=head2 SQL clauses
+=over 4
-These are all the legal and acceptable clauses within the AST that would
-correpsond to clauses in a SQL statement. Not all clauses are legal within a
-given RDBMS engine's SQL dialect and some clauses may be required in one and
-optional in another. Detecting and enforcing those engine-specific restrictions
-is the responsibility of the Visitor object.
+=item * tables
-The clauses are defined with a yacc-like syntax. The various parts are:
+=item * set
-=over 4
+=back
-=item * :=
+There are RDBMS-specific variations of the INSERT statement, such the one in
+MySQL's
+
+=item * update
-This means "defined" and is used to create a new term to be used below.
+=over 4
-=item * []
+=item * tables
-This means optional and indicates that the items within it are optional.
+=item * set
-=item * []*
+=item * where
-This means optional and repeating as many times as desired.
+=back
-=item * |
+=item * delete
-This means alternation. It is a binary operator and indicates that either the
-left or right hand sides may be used, but not both.
+=over 4
-=item * C<< <> >>
+=item * tables
-This is a grouping construct. It means that all elements within this construct
-are treated together for the purposes of optional, repeating, alternation, etc.
+=item * where
+
+=back
=back
This corresponds to the SELECT clause of a SELECT statement.
-A select clause is composed as follows:
-
- SelectComponent := Expression [ [ AS ] String ]
-
- SelectComponent
- [ , SelectComponent ]*
+A select clause unit is an array of one or more Expressions.
=head3 tables
This is a list of tables that this clause is affecting. It corresponds to the
FROM clause in a SELECT statement and the INSERT INTO/UPDATE/DELETE clauses in
-those respective statements. Depending on the _query metadata entry, the
+those respective statements. Depending on the type metadata entry, the
appropriate clause name will be used.
The tables clause has several RDBMS-specific variations. The AST will support
all of them and it is up to the Visitor object constructing the actual SQL to
validate and/or use what is provided as appropriate.
-A table clause is composed as follows:
+A tables clause is an Expression.
- TableIdentifier := Identifier [ [ AS ] String ]
- JoinType := < LEFT|RIGHT [ OUTER ] > | INNER | CROSS
+The hash for an Operator within a tables clause will be composed as follows:
- TableIdentifier
- [
- < , TableIdentifier >
- | <
- [ JoinType ] JOIN TableIdentifier
- [
- < USING ( Identifier [ , Identifier ] ) >
- | < ON [ ( ] Expression [ , Expression ] [ ) ] >
- ]
- >
- ]*
+ # Operator
+ {
+ type => 'Operator',
+ op => '< LEFT|RIGHT|FULL [ OUTER ] > | INNER | CROSS',
+ on => Expression,
+ args => [ Expression ],
+ }
-Additionally, where aliases are provided for in the TableIdentifier, those
-aliases must be used as the tablename in subsequent Identifiers that identify a
-column of that table.
+A USING clause is syntactic sugar for an ON clause and, as such, is not provided
+for by the AST. A join of a comma is identical to a CROSS JOIN and, as such, is
+not provided for by the AST. The on clause is optional.
=head3 where
This corresponds to the WHERE clause in a SELECT, UPDATE, or DELETE statement.
-A where clause is composed as follows:
-
- WhereOperator := AND | OR
- WhereExpression := Expression | Expression WhereOperator Expression
-
- WhereExpression
+A where clause is composed of an Expression.
=head3 set
This corresponds to the SET clause in an INSERT or UPDATE statement.
-A set clause is composed as follows:
-
- SetComponent := Identifier = Expression
-
- SetComponent [ , SetComponent ]*
-
-=head3 columns
-
-This corresponds to the optional list of columns in an INSERT statement.
-
-A columns clause is composed as follows:
-
- ( Identifier [ , Identifier ]* )
+The hash for an set clause will be composed as follows:
-=head3 values
-
-This corresponds to the VALUES clause in an INSERT statement.
-
-A values clause is composed as follows:
-
- ( Expression [ , Expression ]* )
+ {
+ type => 'Set',
+ args => [
+ [ Identifier ],
+ [ Expresion ],
+ ],
+ }
-If there is a columns clause, the number of entries in the values clause must be
-equal to the number of entries in the columns clause.
+The args is an array that is organized as follows: The first element is an array of
+Identifiers for the columns being set. The following arrays are Expressions describing
+the values. The various arrays should be the same length. The array of Identifiers can
+be omitted.
=head3 orderby
This corresponds to the ORDER BY clause in a SELECT statement.
-An orderby clause is composed as follows:
+A orderby clause unit is an array of one or more OrderbyComponent units.
- OrderByComponent := XXX-TODO-XXX
- OrderByDirection := ASC | DESC
+The hash for a OrderbyComponent unit is composed as follows:
- OrderByComponent [ OrderByDirection ]
- [ , OrderByComponent [ OrderByDirection ] ]*
+ {
+ type => 'OrderbyComponent',
+ value => Expression,
+ dir => '< ASC | DESC >',
+ }
+
+The value should either be an Identifier or a Number. The dir element, if
+omitted, will be defaulted to ASC by the AST. The number corresponds to a column
+in the select clause.
=head3 groupby
This corresponds to the GROUP BY clause in a SELECT statement.
-An groupby clause is composed as follows:
+A groupby clause unit is an array of one or more GroupbyComponent units.
+
+The hash for a GroupbyComponent unit is composed as follows:
- GroupByComponent := XXX-TODO-XXX
+ {
+ type => 'GroupbyComponent',
+ value => Expression,
+ }
- GroupByComponent [ , GroupByComponent ]*
+The value should either be an Identifier or a Number. The number corresponds to
+a column in the select clause.
+
+=head2 Possible RDBMS-specific clauses
+
+The following clauses are provided as examples for RDBMS-specific elements. They
+are B<not> expected to be supported by all Visitors. Visitors may choose whether
+or not to throw on an unexpected clause, though it is strongly recommended.
=head3 rows
This corresponds to the clause that is used in some RDBMS engines to limit the
-number of rows returned by a query. In MySQL, this would be the LIMIT clause.
+number of rows returned by a SELECT statement. In MySQL, this would be the LIMIT
+clause.
+
+The hash for a rows clause is composed as follows:
-A rows clause is composed as follows:
+ {
+ start => Number,
+ count => Number,
+ }
- Number [, Number ]
+The start attribute, if ommitted, will default to 0. The count attribute is
+optional.
=head3 for
This corresponds to the clause that is used in some RDBMS engines to indicate
what locks are to be taken by this SELECT statement.
-A for clause is composed as follows:
+The hash for a for clause is composed as follows:
- UPDATE | DELETE
+ {
+ value => '< UPDATE | DELETE >',
+ }
=head3 connectby
This corresponds to the clause that is used in some RDBMS engines to provide for
an adjacency-list query.
-A connectby clause is composed as follows:
+The hash for a for clause is composed as follows:
+
+ {
+ start_with => [
+ Expression,
+ ],
+ connect_by => {
+ option => '< PRIOR | NOCYCLE >'
+ cond => [
+ Expression,
+ ],
+ },
+ order_siblings => orderby-clause,
+ }
+
+Both the start_with and order_siblings clauses are optional.
+
+=head1 TODO
+
+=over 4
+
+=item * sproc unit
+
+=item * UNION, UNION ALL, and MINUS
+
+=item * AS is NOT required
+
+=item * remove BindParameter as a subtype in Value
+
+=item * start the API/Visitor guidelines
+
+=item * JOIN is now its own type
+
+=item * add additional subtypes in Value
+
+=item * Add a is_bind flag to Value
+
+=back
- Identifier, WhereExpression
+Convert INSERT and UPDATE into ->populate form.
=head1 AUTHORS