[dbsrgits/dbic-future.git] / lib / DBIx / Class / Manual / Specification.pm

=head1 NAME

DBIx::Class::Manual::Specification

=head1 SYNOPSIS

This discusses the specification for DBIx::Class v0.09. Along with discussing
new features, it will also include proper specification for features that exist
in the DBIx::Class 0.08 versions that will be supported in DBIx::Class 0.09.
Where appropriate, it will also discuss specifications for Data::Query and any
other modules/distributions.

=head1 MOTIVATIONS

DBIx::Class has become one of, if not the, premier ORMs in the Perl community.
However, its featureset has grown organically. As such, some features that were
considered a good idea at one time proved to be implemented in such a way that
prevented proper implementation of other features down the road. Examples
include prefetching and searching.

This document means to provide a solid foundation for future development so as
to prevent painting the distribution into a corner for features that have not
been conceived of at this time.

=head1 GOALS

This specification will be driven by the following design goals.

=head2 Database-agnostic usage

One of the biggest selling points of an ORM is that a programmer should be able
to change the relational database used to back a given application without
having to change anything but configuration. And, to a point, this works
relatively well. Until it doesn't. DBIx::Class 0.09 will change that.

=head2 Database-specific optimization

As it turns out, the drive to be database-agnostic, makes it very hard to
optimize for a given specific database's strengths. In order to support as many
databases, the SQL generated generally ends up being at the lowest common
denominator. This gives ORMs a bad name where performance is concerned. But, it
doesn't have to be that way and DBIx::Class 0.09 will demonstrate how.

=head2 Datastore-agnostic usage

Data doesn't just live in relational databases anymore. There are many reasons,
for this, the primary one being that the relational model doesn't necessarily
reflect the structure of certain data models. But, there are many times when
relational calculus represents the proper way of manipulating that data. Or,
more commonly, a given relational query needs to merge data from relational and
non-relational sources. DBIx::Class, along with Data::Query, will provide a
single API for manipulating data in a relational fashion, regardless of whether
that manipulation is in a datastore, within Perl, or both.

=head2 Intuitive API

Every API has unavoidable gribbly bits. This falls out of the fact that the
problemspace a given API solves can never be fully mapped to it (an extension
of Godel's Incompleteness Theorem). However, like Perl itself, the API should be
organized and optimized so that the user only needs to know a minimum of the
API's functionality in order to accomplish the most common tasks. Furthermore,
that minimum should be easily discoverable in simple synopses and examples.

=head2 Extensible API and backend

Without breaking or, in as many cases as possible I<knowing>, existing backend
implementation details, the functionality of DBIx::Class should be extensible.
Both the API (new contact points) and backend (new datastores, etc) should be
extended knowing a simple API for doing so. This implies that as many pieces as
possible of the backend functionality should be overridable separate from every
other piece.

=head1 RESTRICTIONS

Fill this part in.

=head1 RELATIONAL THEORY

(B<Author's note:> To understand the fundamental underpinnings of DBIx::Class
0.09's choice of features, it is necessary to have a common understanding of
relational theory. I recommend that everyone at least read this section once in
order to be familiar with the terms I use throughout this document.)

Relational theory (calculus or algebra - the difference is esoteric) is, at its
roots, all about set theory and set manipulations. Each row in a table (or
tuple) is an element of a set. The various clause (or operations) in a SQL
statement can be viewed as set manipulators that take one or more sets of tuples
as input and provide a set of tuples as output.

A SQL statement has many clauses and they are evaluated in a very specific order
that is different from the order they are presented. (As mutations share all of
their clauses with queries and evaluate in the same order, only queries will be
discussed here.) The order of evaluation for a SELECT statement is:

=over 4

=item * FROM (with JOINs evaluated in left-to-right order)

=item * WHERE (with AND/OR evaluated in left-to-right order within each
parenthetical block)

=item * GROUP BY

=item * SELECT

=item * HAVING

=item * ORDER BY

=back

(Expand here.)

=head1 FEATURES

=head2 Basic Features

An ORM must provide some very basic features. They are, in essence, the ability
to:

=over 4

=item * Retrieve data from a relational database (a.k.a., select)

=item * Insert, update, and delete in that relational database (a.k.a., mutate)

=back

These functions are generally achieved by providing some API to the SELECT
statement that returns a collection of objects. These objects then allow for
updating and deleting. The same API that allows for SELECT also, generally, will
allow for creation and, possibly, deletion. Most ORMs do this modelling by
mapping a class to a table and an instance of that class to a given row in that
table. 

DBIx::Class takes a different approach entirely. Instead of treating the row as
the atomic unit, DBIx::Class 09 will treat the column as the atomic unit. This
opens up several possibilities:

=over 4

=item * The definition of a table fall out naturally as a collection of columns

=item * Defining custom searches is a matter of creating a collection of columns on the fly.

=item * Validating values is easily handled regardless of where the column is used.

=item * Enumeration tables can be hidden behind a column definition, allowing optimization to ENUM as appropriate.

=back

=head3 Sources

A source, at its simplest, is the representation of a single table in a
relational database. For example, the table `artists`. But, a source can be
much more than that. Within a relational database, "`artists JOIN cds`" forms a
source containing the columns of both `artists` and `cds`. Taken to the logical
extreme, the FROM clause of a query forms a single source. A subquery can also
be viewed as a source. (In fact, any table or join can be viewed as the subquery
"(SELECT * FROM table) AS table".)

Under the hood, a source acts as a collection of column definitions. When a
source feeds into 

=head3 Resultsets (RENAME ME)

The resultset is arguably the single most important breakthrough in DBIx::Class.
It allows for the gradual building of a SQL statement and reuse of that building
for more than just SELECT statements. In addition to being able to separate
responsibilites by letting different pieces of an application (such as security)
decorate the resultset appropriately, a resultset can also be reused for various
needs. The same resultset can be used for a query, a mass update, a mass delete,
or anything else.

Under the hood, a resultset has a representation for each SQL clause. Each usage
of the resultset will take the clauses that make sense for that usage and leave
the others.

(Expand here, detailing each usage of a resultset and the clauses that each
uses.)

=head3 Objects (aka, Rows)

Traditionally, there is row object is acts as a hashref representing a row in a
table. This representation is simple to implement, meets the 80/20 case, and is
completely wrong. It fails when dealing with GROUP BY clauses, custom queries,
and is generally inflexible.

DBIx::Class 09 chooses instead to model a row as an object that is a collection
of columns that contain values. This means that the object is defined by the
resultset that generates the stream the object comes from. In the standard case,
the resultset's store is a single table and all columns are selected, acting the
same way as a DBIx::Class 08 row object would.

=head2 Queries as streams

A grouping of data can be viewed as either a collection or a stream. The main
difference is that a collection is eager and a stream is lazy. For large
datasets, collections can be very expensive in terms of memory and time (filling
that memory). Streams, on the other hand, defer loading anything into memory
until the last possible moment, but are easily convertible into a collection as
needed. (q.v. Higher Order Perl for more information.)

=head1 EXAMPLES

  subtype Year as Num where { $_ > 0 && $_ <= 9999 };

  table Artist {
      column artist_id => ( isa => 'Num', auto_increment => 1, primary_key => 1 );
      column name => ( isa => 'String', max_length => 50, nullable => 0 );
      column rank => ( isa => 'Num', default => 13 );

      has_many 'CD';

      table 'artists';
  }

  table CD {
      column cdid => ( isa => 'Num', auto_increment => 1, primary_key => 1 );
      column artist => ( isa => 'Num' );
      column title => ( isa => 'String', max_length => 100 );
      column year => ( isa => 'Year' );

      belongs_to 'Artist';

      table 'cds';
  }

  # The following four searches are equivalent:
  my @artists = $schema->table('Artist')->search({
      name => 'Bon Jovi',
  })->all;

  my @artists = $schema->search({
      name => 'Bon Jovi',
  }, {
      from => 'Artist',
  });

  # If a class and table name would conflict and they don't refer to the same
  # thing, a compile-time error is thrown.
  my @artists = $schema->search({
      'Artist.name' => 'Bon Jovi',
  });

  my @artists = $schema->search({
      'artists.name' => 'Bon Jovi',
  });

  # The following searches are equivalent. Results will contain objects that
  # respond to artist_name(), cd_name(), and year() and nothing else.
  my @results = $schema->search({
      'Artist.name' => [ 'Bon Jovi', 'Metallica' ],
      'CD.year'     => [ '2004', '2006' ],
  }, {
      select => [
          [ 'Artist.name' => 'artist_name' ],
          [ 'CD.name' => 'cd_name' ],
          'CD.year',
      ],
  });

  # Artist is the default table here.
  my @results = $schema->table('Artist')->search({
      'name'    => [ 'Bon Jovi', 'Metallica' ],
      'CD.year' => [ '2004', '2006' ],
  }, {
      select => [
          [ 'name' => 'artist_name' ],
          [ 'CD.name' => 'cd_name' ],
          'CD.year',
      ],
  });

  # CD is the default table here.
  my @results = $schema->table('CD')->search({
      'Artist.name'    => [ 'Bon Jovi', 'Metallica' ],
      'year' => [ '2004', '2006' ],
  }, {
      select => [
          [ 'Artist.name' => 'artist_name' ],
          [ 'name' => 'cd_name' ],
          'year',
      ],
  });

  # Note that joins are intuited through the existence of the relationships.

=head1 TODO

=over 4

=item * L</RESTRICTIONS> section needs to be filled in.

=back

=head1 AUTHOR(S)

robkinyon: Rob Kinyon C<< <rkinyon@cpan.org> >>

=head1 LICENSE

You may distribute this code under the same terms as Perl itself.

=cut
Commit	Line	Data
7848cf7a	1	=head1 NAME
	2
	3	DBIx::Class::Manual::Specification
	4
	5	=head1 SYNOPSIS
	6
	7	This discusses the specification for DBIx::Class v0.09. Along with discussing
	8	new features, it will also include proper specification for features that exist
	9	in the DBIx::Class 0.08 versions that will be supported in DBIx::Class 0.09.
	10	Where appropriate, it will also discuss specifications for Data::Query and any
	11	other modules/distributions.
	12
	13	=head1 MOTIVATIONS
	14
	15	DBIx::Class has become one of, if not the, premier ORMs in the Perl community.
	16	However, its featureset has grown organically. As such, some features that were
	17	considered a good idea at one time proved to be implemented in such a way that
	18	prevented proper implementation of other features down the road. Examples
	19	include prefetching and searching.
	20
	21	This document means to provide a solid foundation for future development so as
	22	to prevent painting the distribution into a corner for features that have not
	23	been conceived of at this time.
	24
	25	=head1 GOALS
	26
	27	This specification will be driven by the following design goals.
	28
	29	=head2 Database-agnostic usage
	30
	31	One of the biggest selling points of an ORM is that a programmer should be able
	32	to change the relational database used to back a given application without
	33	having to change anything but configuration. And, to a point, this works
	34	relatively well. Until it doesn't. DBIx::Class 0.09 will change that.
	35
	36	=head2 Database-specific optimization
	37
	38	As it turns out, the drive to be database-agnostic, makes it very hard to
	39	optimize for a given specific database's strengths. In order to support as many
	40	databases, the SQL generated generally ends up being at the lowest common
	41	denominator. This gives ORMs a bad name where performance is concerned. But, it
	42	doesn't have to be that way and DBIx::Class 0.09 will demonstrate how.
	43
	44	=head2 Datastore-agnostic usage
	45
	46	Data doesn't just live in relational databases anymore. There are many reasons,
	47	for this, the primary one being that the relational model doesn't necessarily
	48	reflect the structure of certain data models. But, there are many times when
	49	relational calculus represents the proper way of manipulating that data. Or,
	50	more commonly, a given relational query needs to merge data from relational and
	51	non-relational sources. DBIx::Class, along with Data::Query, will provide a
	52	single API for manipulating data in a relational fashion, regardless of whether
	53	that manipulation is in a datastore, within Perl, or both.
	54
	55	=head2 Intuitive API
	56
	57	Every API has unavoidable gribbly bits. This falls out of the fact that the
	58	problemspace a given API solves can never be fully mapped to it (an extension
	59	of Godel's Incompleteness Theorem). However, like Perl itself, the API should be
	60	organized and optimized so that the user only needs to know a minimum of the
	61	API's functionality in order to accomplish the most common tasks. Furthermore,
	62	that minimum should be easily discoverable in simple synopses and examples.
	63
	64	=head2 Extensible API and backend
65
66	Without breaking or, in as many cases as possible I<knowing>, existing backend
67	implementation details, the functionality of DBIx::Class should be extensible.
68	Both the API (new contact points) and backend (new datastores, etc) should be
69	extended knowing a simple API for doing so. This implies that as many pieces as
70	possible of the backend functionality should be overridable separate from every
71	other piece.
72
73	=head1 RESTRICTIONS
74
75	Fill this part in.
76
77	=head1 RELATIONAL THEORY
78
79	(B<Author's note:> To understand the fundamental underpinnings of DBIx::Class
80	0.09's choice of features, it is necessary to have a common understanding of
81	relational theory. I recommend that everyone at least read this section once in
82	order to be familiar with the terms I use throughout this document.)
83
84	Relational theory (calculus or algebra - the difference is esoteric) is, at its
85	roots, all about set theory and set manipulations. Each row in a table (or
86	tuple) is an element of a set. The various clause (or operations) in a SQL
87	statement can be viewed as set manipulators that take one or more sets of tuples
88	as input and provide a set of tuples as output.
89
90	A SQL statement has many clauses and they are evaluated in a very specific order
91	that is different from the order they are presented. (As mutations share all of
92	their clauses with queries and evaluate in the same order, only queries will be
93	discussed here.) The order of evaluation for a SELECT statement is:
94
95	=over 4
96
97	=item * FROM (with JOINs evaluated in left-to-right order)
98
99	=item * WHERE (with AND/OR evaluated in left-to-right order within each
100	parenthetical block)
101
102	=item * GROUP BY
103
104	=item * SELECT
105
106	=item * HAVING
107
108	=item * ORDER BY
109
110	=back
111
112	(Expand here.)
113
114	=head1 FEATURES
115
116	=head2 Basic Features
117
118	An ORM must provide some very basic features. They are, in essence, the ability
119	to:
120
121	=over 4
122
123	=item * Retrieve data from a relational database (a.k.a., select)
124
125	=item * Insert, update, and delete in that relational database (a.k.a., mutate)
126
127	=back
128
129	These functions are generally achieved by providing some API to the SELECT
130	statement that returns a collection of objects. These objects then allow for
131	updating and deleting. The same API that allows for SELECT also, generally, will
132	allow for creation and, possibly, deletion. Most ORMs do this modelling by
133	mapping a class to a table and an instance of that class to a given row in that
134	table.
135
8b8bab95	136	DBIx::Class takes a different approach entirely. Instead of treating the row as
	137	the atomic unit, DBIx::Class 09 will treat the column as the atomic unit. This
	138	opens up several possibilities:
7848cf7a	139
8b8bab95	140	=over 4
7848cf7a	141
8b8bab95	142	=item * The definition of a table fall out naturally as a collection of columns
	143
	144	=item * Defining custom searches is a matter of creating a collection of columns on the fly.
	145
	146	=item * Validating values is easily handled regardless of where the column is used.
	147
	148	=item * Enumeration tables can be hidden behind a column definition, allowing optimization to ENUM as appropriate.
	149
	150	=back
7848cf7a	151
	152	=head3 Sources
	153
	154	A source, at its simplest, is the representation of a single table in a
	155	relational database. For example, the table `artists`. But, a source can be
	156	much more than that. Within a relational database, "`artists JOIN cds`" forms a
	157	source containing the columns of both `artists` and `cds`. Taken to the logical
	158	extreme, the FROM clause of a query forms a single source. A subquery can also
	159	be viewed as a source. (In fact, any table or join can be viewed as the subquery
	160	"(SELECT * FROM table) AS table".)
	161
8b8bab95	162	Under the hood, a source acts as a collection of column definitions. When a
	163	source feeds into
	164
	165	=head3 Resultsets (RENAME ME)
7848cf7a	166
	167	The resultset is arguably the single most important breakthrough in DBIx::Class.
	168	It allows for the gradual building of a SQL statement and reuse of that building
	169	for more than just SELECT statements. In addition to being able to separate
	170	responsibilites by letting different pieces of an application (such as security)
	171	decorate the resultset appropriately, a resultset can also be reused for various
	172	needs. The same resultset can be used for a query, a mass update, a mass delete,
	173	or anything else.
	174
	175	Under the hood, a resultset has a representation for each SQL clause. Each usage
	176	of the resultset will take the clauses that make sense for that usage and leave
	177	the others.
	178
	179	(Expand here, detailing each usage of a resultset and the clauses that each
	180	uses.)
	181
8b8bab95	182	=head3 Objects (aka, Rows)
7848cf7a	183
8b8bab95	184	Traditionally, there is row object is acts as a hashref representing a row in a
	185	table. This representation is simple to implement, meets the 80/20 case, and is
	186	completely wrong. It fails when dealing with GROUP BY clauses, custom queries,
	187	and is generally inflexible.
7848cf7a	188
8b8bab95	189	DBIx::Class 09 chooses instead to model a row as an object that is a collection
	190	of columns that contain values. This means that the object is defined by the
	191	resultset that generates the stream the object comes from. In the standard case,
	192	the resultset's store is a single table and all columns are selected, acting the
	193	same way as a DBIx::Class 08 row object would.
7848cf7a	194
	195	=head2 Queries as streams
	196
	197	A grouping of data can be viewed as either a collection or a stream. The main
	198	difference is that a collection is eager and a stream is lazy. For large
	199	datasets, collections can be very expensive in terms of memory and time (filling
	200	that memory). Streams, on the other hand, defer loading anything into memory
	201	until the last possible moment, but are easily convertible into a collection as
	202	needed. (q.v. Higher Order Perl for more information.)
	203
8b8bab95	204	=head1 EXAMPLES
7848cf7a	205
9a8fc0f9	206	subtype Year as Num where { $_ > 0 && $_ <= 9999 };
	207
	208	table Artist {
	209	column artist_id => ( isa => 'Num', auto_increment => 1, primary_key => 1 );
	210	column name => ( isa => 'String', max_length => 50, nullable => 0 );
	211	column rank => ( isa => 'Num', default => 13 );
	212
	213	has_many 'CD';
	214
	215	table 'artists';
	216	}
	217
	218	table CD {
	219	column cdid => ( isa => 'Num', auto_increment => 1, primary_key => 1 );
	220	column artist => ( isa => 'Num' );
	221	column title => ( isa => 'String', max_length => 100 );
	222	column year => ( isa => 'Year' );
	223
	224	belongs_to 'Artist';
	225
	226	table 'cds';
	227	}
	228
	229	# The following four searches are equivalent:
	230	my @artists = $schema->table('Artist')->search({
	231	name => 'Bon Jovi',
	232	})->all;
	233
	234	my @artists = $schema->search({
	235	name => 'Bon Jovi',
	236	}, {
	237	from => 'Artist',
	238	});
	239
	240	# If a class and table name would conflict and they don't refer to the same
	241	# thing, a compile-time error is thrown.
	242	my @artists = $schema->search({
	243	'Artist.name' => 'Bon Jovi',
	244	});
	245
	246	my @artists = $schema->search({
	247	'artists.name' => 'Bon Jovi',
	248	});
	249
	250	# The following searches are equivalent. Results will contain objects that
	251	# respond to artist_name(), cd_name(), and year() and nothing else.
	252	my @results = $schema->search({
	253	'Artist.name' => [ 'Bon Jovi', 'Metallica' ],
	254	'CD.year' => [ '2004', '2006' ],
	255	}, {
	256	select => [
	257	[ 'Artist.name' => 'artist_name' ],
	258	[ 'CD.name' => 'cd_name' ],
	259	'CD.year',
	260	],
	261	});
	262
	263	# Artist is the default table here.
	264	my @results = $schema->table('Artist')->search({
	265	'name' => [ 'Bon Jovi', 'Metallica' ],
	266	'CD.year' => [ '2004', '2006' ],
	267	}, {
	268	select => [
	269	[ 'name' => 'artist_name' ],
270	[ 'CD.name' => 'cd_name' ],
271	'CD.year',
272	],
273	});
274
275	# CD is the default table here.
276	my @results = $schema->table('CD')->search({
277	'Artist.name' => [ 'Bon Jovi', 'Metallica' ],
278	'year' => [ '2004', '2006' ],
279	}, {
280	select => [
281	[ 'Artist.name' => 'artist_name' ],
282	[ 'name' => 'cd_name' ],
283	'year',
284	],
285	});
286
287	# Note that joins are intuited through the existence of the relationships.
288
7848cf7a	289	=head1 TODO
	290
	291	=over 4
	292
	293	=item * L</RESTRICTIONS> section needs to be filled in.
	294
	295	=back
	296
	297	=head1 AUTHOR(S)
	298
	299	robkinyon: Rob Kinyon C<< <rkinyon@cpan.org> >>
	300
	301	=head1 LICENSE
	302
	303	You may distribute this code under the same terms as Perl itself.
	304
	305	=cut