MySQL visitor supporting REPLACE INTO). q.v. the relevant sections of this
specification for details.
+=head2 Dialect-agnostic construction
+
+The AST will not attempt to be immediately readable to a human as SQL. In fact,
+due to the dialect differences, particularly in terms of which use operators and
+which use functions for a given action, the AST will provide simple units. It is
+the responsibility of the Visitor to provide the appropriate SQL. Furthermore,
+the AST will be very generic and only provide hints for a subset of SQL. If a
+Visitor is sufficiently intelligent, pretty SQL may be emitted, but that is not
+the goal of this AST.
+
=head1 COMPONENTS
There are two major components to SQL::Abstract v2.
The AST will be a HoHo..oH (hash of hash of ... of hashes). The keys to the
outermost hash will be the various clauses of a SQL statement, plus some
-metadata keys. All metadata keys will be identifiable as such by being prefixed
-with an underscore. All keys will be in lowercase.
+metadata keys.
=head2 Metadata keys
These are the additional metadata keys that the AST provides for.
-=head3 _query
+=head3 type
This denotes what kind of query this AST should be interpreted as. Different
-Visitors may accept additional values for _query. For example, a MySQL Visitor
-may choose to accept 'replace' for REPLACE INTO. If a _query value is
+Visitors may accept additional values for type. For example, a MySQL Visitor
+may choose to accept 'replace' for REPLACE INTO. If a type value is
unrecognized by the Visitor, the Visitor is expected to throw an error.
-All Visitors are expected to handle the following values for _query:
+All Visitors are expected to handle the following values for type:
=over 4
=back
-=head3 _version
+=head3 ast_version
This denotes the version of the AST. Different versions will indicate different
-capabilities provided. Visitors will choose to respect the _version as needed
+capabilities provided. Visitors will choose to respect the ast_version as needed
and desired.
=head2 Structural units
=over 4
-=item * name
+=item * type
This indicates the structural unit that this hash is representing. While this
specification provides for standard structural units, different Visitors may
The hash will be structured as follows:
{
- name => 'Identifier',
+ type => 'Identifier',
element1 => Scalar,
element2 => Scalar,
element3 => Scalar,
Visitors are expected to, by default, quote all identifiers according to the SQL
dialect's quoting scheme.
+Any of the elements may be '*', as in SELECT * or SELECT COUNT(*). Visitors must
+be careful to I<not> quote asterisks.
+
=head3 Value
-A Value is a Perl scalar. Depending on the type, a Visitor may be able to make
-certain decisions.
+A Value is a Perl scalar. Depending on the subtype, a Visitor may be able to
+make certain decisions. The following are the minimally-valid subtypes:
=over 4
The hash will be structured as follows:
{
- name => 'Value'
+ type => 'Value'
subtype => [ 'String' | 'Number' | 'Null' | 'BindParameter' ]
value => Scalar
}
Visitors may choose to support additional subtypes. Visitors are expected to
throw an exception upon encountering an unknown subtype.
-=head3 Function
-
-A Function is anything of the form C<< name( arglist ) >> where C<<name>> is a
-string and C<arglist> is an ExpressionList.
-
-Yes, a Subquery is legal as an argument for many functions. Some example
-functions are:
-
-=over 4
-
-=item * C<< MAX >>
-
-=item * C<< MIN >>
+=head3 Operator
-=item * C<< SUM >>
-
-=item * C<< IF >>
-
-=back
+An Operator would be, in SQL dialect terms, a unary operator, a binary operator,
+a trinary operator, or a function. Since different dialects may have a given
+functionality as an operator or a function (such as CONCAT in MySQl vs. || in
+Oracle for string concatenation), they will be represented in the AST as generic
+operators.
The hash will be structured as follows:
{
- name => "Function",
- function => String,
- arglist => ExpressionList,
+ type => 'Operator',
+ op => String,
+ args => ExpressionList,
}
-Functions have a cardinality, or expected number of arguments. Some functions,
+Operators have a cardinality, or expected number of arguments. Some operators,
such as MAX(), have a cardinality of 1. Others, such as IF(), have a cardinality
of N, meaning they can have any number of arguments greater than 0. Others, such
-as NOW(), have a cardinality of 0. Several functions with the same meaning may
+as NOW(), have a cardinality of 0. Several operators with the same meaning may
have a different cardinality in different SQL dialects as different engines may
-allow different behaviors.
+allow different behaviors. As cardinality may differ between dialects, enforcing
+cardinality is necessarily left to the Visitor.
+
+Operators also have restrictions on the types of arguments they will accept. The
+first argument may or may not restricted in the same fashion as the other
+arguments. As with cardinality, this restriction will need to be managed by the
+Visitor.
-As cardinality may differ between dialects, enforcing cardinality is necessarily
-left to the Visitor.
+The operator name needs to take into account the possibility that the RDBMS may
+allow UDFs (User-Defined Functions) that have the same name as an operator, such
+as 'AND'. This will have to be managed by the Visitor.
=head3 Subquery
-A Subquery is another AST whose _query metadata parameter is set to "SELECT".
+A Subquery is another AST whose type metadata parameter is set to "SELECT".
Most places that a Subquery can be used would require a single value to be
returned (single column, single row), but that is not something that the AST can
Subqueries, when expressed in SQL, must be bounded by parentheses.
-=head3 Unary Operator
-
-A UnaryOperator takes a single argument on the RHS. The argument for a
-UnaryOperator is an Expression.
-
-Visitors are expected to support, at minimum, the following operators:
-
-=over 4
-
-=item * NOT X
-
-=item * ANY X
-
-=item * ALL X
-
-=item * SOME X
-
-=back
-
-The hash for a UnaryOperator is as follows:
-
- {
- name => 'UnaryOperator'
- operator => [ .... ],
- argument1 => Expression,
- }
-
-Visitors may choose to support additional operators. Visitors are expected to
-throw an exception upon encountering an unknown operator.
-
-=head3 BinaryOperator
-
-A BinaryOperator takes two arguments (one on the LHS and one on the RHS). The
-arguments for a BinaryOperator are all Expressions.
-
-Visitors are expected to support, at minimum, the following operators:
-
-=over 4
-
-=item * X = Y
-
-=item * X != Y
-
-=item * X > Y
-
-=item * X < Y
-
-=item * X >= Y
-
-=item * X <= Y
-
-=item * X IS Y
-
-=item * X IN Y
-
-=item * X NOT IN Y
-
-=item * X AND Y
-
-=item * X OR Y
-
-=back
-
-(Note that an operator can comprise of what would be multiple tokens in a normal
-parsing effort.)
-
-Visitors may choose to support additional operators. Visitors are expected to
-throw an exception upon encountering an unknown operator.
-
-The hash for a BinaryOperator is as follows:
-
- {
- name => 'BinaryOperator'
- operator => [ .... ],
- argument1 => Expression,
- argument2 => Expression,
- }
-
-=head3 TrinaryOperator
-
-A TrinaryOperator takes three arguments. It generally is composed of two
-elements with one argument to the LHS, one to the RHS, and a third in the middle
-of the elements. The arguments for a TrinaryOperator are all Expressions.
-
-Visitors are expected to support, at minimum, the following operators:
-
-=over 4
-
-=item * X BETWEEN Y AND Z
-
-=back
-
-Visitors may choose to support additional operators. Visitors are expected to
-throw an exception upon encountering an unknown operator.
-
-The hash for a TrinaryOperator is as follows:
-
- {
- name => 'TrinaryOperator'
- operator => [ .... ],
- argument1 => Expression,
- argument2 => Expression,
- argument3 => Expression,
- }
-
=head3 Expression
An Expression can be any one of the following:
=over 4
+=item * Identifier
+
=item * Value
-=item * Function
+=item * Operator
=item * Subquery
-=item * UnaryOperator
-
-=item * BinaryOperator
-
-=item * TrinaryOperator
-
=back
An Expression is a meta-syntactic unit. An "Expression" unit will never appear
An ExpressionList is a list of Expressions, generally separated by commas
(though other separators may be appropriate at times or for different SQL
-dialects).
+dialects). An null separator may also be used.
The hash for an ExpressionList is as follows:
{
- name => 'ExpressionList',
+ type => 'ExpressionList',
separator => ',',
elements => Array of Expressions,
}
An ExpressionList is always rendered in SQL with parentheses around it.
+=head3 Nesting
+
+There is no specific operator or nodetype for nesting. Instead, nesting is
+explicitly specified by node descent in the AST.
+
=head2 SQL clauses
These are all the legal and acceptable clauses within the AST that would
The hash for a SelectComponent unit is composed as follows:
{
- name => 'SelectComponent',
+ type => 'SelectComponent',
value => Expression,
- [ as => Identifier, ]
+ as => String,
}
The 'as' component is optional. Visitors may choose to make it required in
This is a list of tables that this clause is affecting. It corresponds to the
FROM clause in a SELECT statement and the INSERT INTO/UPDATE/DELETE clauses in
-those respective statements. Depending on the _query metadata entry, the
+those respective statements. Depending on the type metadata entry, the
appropriate clause name will be used.
-A tables clause unit is an array of one or more TableComponent units.
-
The tables clause has several RDBMS-specific variations. The AST will support
all of them and it is up to the Visitor object constructing the actual SQL to
validate and/or use what is provided as appropriate.
-The hash for a TableJoin will be composed as follows:
+A TableJoin is a junction of the following elements:
- # TableJoin
- {
- name => 'TableJoin',
- join => < LEFT|RIGHT [ OUTER ] > | INNER | CROSS | ',',
- [ using => IdentifierList, ]
- [ on => ExpressionList, ]
- }
+=over 4
-A TableJoin may not have both a 'using' element and an 'on' element. It may
-have one of them if the 'join' element is not equal to ',' but doesn't have to.
-If the 'join' element is equal to ',', then it may not have either a 'using' or
-an 'on' element.
+=item * TableIdentifier
+
+=item * Operator
+
+=back
The hash for a TableIdentifier will be composed as follows:
# TableIdentifier
{
- name => 'TableIdentifier',
- value => Identifier | SubQuery
- [ join => TableJoin, ]
- [ as => Identifier, ]
+ type => 'TableIdentifier',
+ value => Expression,
+ as => String,
}
-The first TableComponent in a tables clause may not have a join element. All
-other TableComponent elements that do not have a join element will have a
-default join element of:
+The value should be either an Identifier or a SubQuery.
+The hash for an Operator within a tables clause will be composed as follows:
+
+ # Operator
{
- name => 'TableJoin',
- join => ',',
+ type => 'Operator',
+ op => '< LEFT|RIGHT|FULL [ OUTER ] > | INNER | CROSS',
+ on => Expression,
}
-The 'as' component is optional. Visitors may choose to make it required in
-certain situations (such as MySQL requiring an alias for subqueries).
-
-Additionally, where aliases are provided for in the TableIdentifier, those
-aliases must be used as the tablename in subsequent Identifiers that identify a
-column of that table. This may be enforceable by the AST or the Visitor. But, it
-is more likely that it will not be.
+A USING clause is syntactic sugar for an ON clause and, as such, is not provided
+for by the AST. A join of a comma is identical to a CROSS JOIN and, as such, is
+not provided for by the AST. The on clause is optional.
=head3 where
This corresponds to the WHERE clause in a SELECT, UPDATE, or DELETE statement.
-A where clause is composed as follows:
-
- WhereOperator := AND | OR
- WhereExpression := Expression | Expression WhereOperator Expression
-
- WhereExpression
+A where clause is composed of an Expression.
=head3 set
This corresponds to the SET clause in an INSERT or UPDATE statement.
-A set clause is composed as follows:
+A set clause unit is an array of one or more SetComponent units.
- SetComponent := Identifier = Expression
+The hash for SetComponent unit is composed as follows:
- SetComponent [ , SetComponent ]*
+ {
+ type => 'SetComponent',
+ col => Identifier,
+ value => Expression,
+ }
=head3 columns
This corresponds to the optional list of columns in an INSERT statement.
-A columns clause is an IdentifierList and the unit is composed as follows:
-
- columns => [
- Identifier,
- [ Identifier, ]*
- ],
+A columns clause unit is an array of one or more Identifier units.
=head3 values
This corresponds to the VALUES clause in an INSERT statement.
-A values clause is an ExpressionList and the unit is composed as follows.
-
- values => [
- Expression,
- [ Expression, ]*
- ],
+A values clause unit is an array of one or more Expression units.
If there is a columns clause, the number of entries in the values clause must be
equal to the number of entries in the columns clause.
This corresponds to the ORDER BY clause in a SELECT statement.
-An orderby clause is composed as follows:
+A orderby clause unit is an array of one or more OrderbyComponent units.
- OrderByComponent := XXX-TODO-XXX
- OrderByDirection := ASC | DESC
+The hash for a OrderbyComponent unit is composed as follows:
- OrderByComponent [ OrderByDirection ]
- [ , OrderByComponent [ OrderByDirection ] ]*
+ {
+ type => 'OrderbyComponent',
+ value => < Identifier | Number >
+ dir => '< ASC | DESC >',
+ }
+
+The dir element, if omitted, will be defaulted to ASC by the AST. The number
+corresponds to a column in the select clause.
=head3 groupby
This corresponds to the GROUP BY clause in a SELECT statement.
-An groupby clause is composed as follows:
+A groupby clause unit is an array of one or more GroupbyComponent units.
- GroupByComponent := XXX-TODO-XXX
+The hash for a GroupbyComponent unit is composed as follows:
- GroupByComponent [ , GroupByComponent ]*
+ {
+ type => 'GroupbyComponent',
+ value => < Identifier | Number >
+ }
+
+The number corresponds to a column in the select clause.
=head3 rows
This corresponds to the clause that is used in some RDBMS engines to limit the
number of rows returned by a query. In MySQL, this would be the LIMIT clause.
-A rows clause is composed as follows:
+The hash for a rows clause is composed as follows:
- Number [, Number ]
+ {
+ start => Number,
+ count => Number,
+ }
+
+The start attribute, if ommitted, will default to 0. The count attribute is
+optional.
=head3 for
This corresponds to the clause that is used in some RDBMS engines to indicate
what locks are to be taken by this SELECT statement.
-A for clause is composed as follows:
+The hash for a for clause is composed as follows:
- UPDATE | DELETE
+ {
+ value => '< UPDATE | DELETE >',
+ }
=head3 connectby
This corresponds to the clause that is used in some RDBMS engines to provide for
an adjacency-list query.
-A connectby clause is composed as follows:
+The hash for a for clause is composed as follows:
- Identifier, WhereExpression
+ {
+ start_with => ExpressionList,
+ connect_by => {
+ option => '< PRIOR | NOCYCLE >'
+ cond => ExpressionList,
+ },
+ order_siblings => orderby-clause,
+ }
-=head1 EXAMPLES
+Both the start_with and order_siblings clauses are optional.
-The following are example SQL statements and a possible AST for each one.
+=head1 TODO
=over 4
-=item * SELECT 1
-
- {
- _query => 'select',
- _ast_version => 0.0001,
- select => [
- {
- name => 'SelectComponent',
- value => {
- name => 'Value',
- subtype => 'number',
- value => 1,
- },
- },
- ],
- }
-
-=item * SELECT NOW() AS time FROM dual AS duality
-
- {
- _query => 'select',
- _ast_version => 0.0001,
- select => [
- {
- name => 'SelectComponent',
- value => {
- name => 'Function',
- function => 'NOW',
- },
- as => {
- name => 'Identifier',
- element1 => 'time',
- },
- },
- ],
- tables => [
- {
- name => 'TablesComponent',
- value => {
- name => 'Identifier',
- element1 => 'dual',
- },
- as => {
- name => 'Identifier',
- element1 => 'duality',
- },
- },
- ],
- }
+=item * sproc unit
=back