[dbsrgits/DBIx-Class.git] / lib / DBIx / Class / Storage / DBI / Replicated / Introduction.pod

package DBIx::Class::Storage::DBI::Replicated::Introduction;

=head1 NAME

DBIx::Class::Storage::DBI::Replicated::Introduction - Minimum Need to Know

=head1 SYNOPSIS

This is an introductory document for L<DBIx::Class::Storage::Replication>.

This document is not an overview of what replication is or why you should be
using it.  It is not a document explaing how to setup MySQL native replication
either.  Copious external resources are avialable for both.  This document
presumes you have the basics down.
  
=head1 DESCRIPTION

L<DBIx::Class> supports a framework for using database replication.  This system
is integrated completely, which means once it's setup you should be able to 
automatically just start using a replication cluster without additional work or
changes to your code.  Some caveats apply, primarily related to the proper use
of transactions (you are wrapping all your database modifying statements inside
a transaction, right ;) ) however in our experience properly written DBIC will
work transparently with Replicated storage.

Currently we have support for MySQL native replication, which is relatively
easy to install and configure.  We also currently support single master to one
or more replicants (also called 'slaves' in some documentation).  However the
framework is not specifically tied to the MySQL framework and supporting other
replication systems or topographies should be possible.  Please bring your
patches and ideas to the #dbix-class IRC channel or the mailing list.

For an easy way to start playing with MySQL native replication, see:
L<MySQL::Sandbox>.

If you are using this with a L<Catalyst> based appplication, you may also wish
to see more recent updates to L<Catalyst::Model::DBIC::Schema>, which has 
support for replication configuration options as well.

=head1 REPLICATED STORAGE

By default, when you start L<DBIx::Class>, your Schema (L<DBIx::Class::Schema>)
is assigned a storage_type, which when fully connected will reflect your
underlying storage engine as defined by your choosen database driver.  For
example, if you connect to a MySQL database, your storage_type will be
L<DBIx::Class::Storage::DBI::mysql>  Your storage type class will contain 
database specific code to help smooth over the differences between databases
and let L<DBIx::Class> do its thing.

If you want to use replication, you will override this setting so that the
replicated storage engine will 'wrap' your underlying storages and present to
the end programmer a unified interface.  This wrapper storage class will
delegate method calls to either a master database or one or more replicated
databases based on if they are read only (by default sent to the replicants)
or write (reserved for the master).  Additionally, the Replicated storage 
will monitor the health of your replicants and automatically drop them should
one exceed configurable parameters.  Later, it can automatically restore a
replicant when its health is restored.

This gives you a very robust system, since you can add or drop replicants
and DBIC will automatically adjust itself accordingly.

Additionally, if you need high data integrity, such as when you are executing
a transaction, replicated storage will automatically delegate all database
traffic to the master storage.  There are several ways to enable this high
integrity mode, but wrapping your statements inside a transaction is the easy
and canonical option. 

=head1 PARTS OF REPLICATED STORAGE

A replicated storage contains several parts.  First, there is the replicated
storage itself (L<DBIx::Class::Storage::DBI::Replicated>).  A replicated storage
takes a pool of replicants (L<DBIx::Class::Storage::DBI::Replicated::Pool>)
and a software balancer (L<DBIx::Class::Storage::DBI::Replicated::Pool>).  The
balancer does the job of splitting up all the read traffic amongst each
replicant in the Pool. Currently there are two types of balancers, a Random one
which chooses a Replicant in the Pool using a naive randomizer algorithm, and a
First replicant, which just uses the first one in the Pool (and obviously is
only of value when you have a single replicant).

=head1 REPLICATED STORAGE CONFIGURATION

All the parts of replication can be altered dynamically at runtime, which makes
it possibly to create a system that automatically scales under load by creating
more replicants as needed, perhaps using a cloud system such as Amazon EC2.
However, for common use you can setup your replicated storage to be enabled at
the time you connect the databases.  The following is a breakdown of how you
may wish to do this.  Again, if you are using L<Catalyst>, I strongly recommend
you use (or upgrade to) the latest L<Catalyst::Model::DBIC::Schema>, which makes
this job even easier.

First, you need to connect your L<DBIx::Class::Schema>.  Let's assume you have
such a schema called, "MyApp::Schema".

	use MyApp::Schema;
	my $schema = MyApp::Schema->connect($dsn, $user, $pass);

Next, you need to set the storage_type.

	$schema->storage_type(
		::DBI::Replicated' => {
			balancer_type => '::Random',
            balancer_args => {
				auto_validate_every => 5,
				master_read_weight => 1
			},
			pool_args => {
				maximum_lag =>2,
			},
		}
	);

Let's break down the settings.  The method L<DBIx::Class::Schema/storage_type>
takes one mandatory parameter, a scalar value, and an option second value which
is a Hash Reference of configuration options for that storage.  In this case,
we are setting the Replicated storage type using '::DBI::Replicated' as the
first value.  You will only use a different value if you are subclassing the
replicated storage, so for now just copy that first parameter.

The second parameter contains a hash reference of stuff that gets passed to the
replicated storage.  L<DBIx::Class::Storage::DBI::Replicated/balancer_type> is
the type of software load balancer you will use to split up traffic among all
your replicants.  Right now we have two options, "::Random" and "::First". You
can review documentation for both at:

L<DBIx::Class::Storage::DBI::Replicated::Balancer::First>,
L<DBIx::Class::Storage::DBI::Replicated::Balancer::Random>.

In this case we will have three replicants, so the ::Random option is the only
one that makes sense.

'balancer_args' get passed to the balancer when it's instantiated.  All
balancers have the 'auto_validate_every' option.  This is the number of seconds
we allow to pass between validation checks on a load balanced replicant. So
the higher the number, the more possibility that your reads to the replicant 
may be inconsistant with what's on the master.  Setting this number too low
will result in increased database loads, so choose a number with care.  Our
experience is that setting the number around 5 seconds results in a good
performance / integrity balance.

'master_read_weight' is an option associated with the ::Random balancer.  It
allows you to let the master be read from.  I usually leave this off (default
is off).

The 'pool_args' are configuration options associated with the replicant pool.
This object (L<DBIx::Class::Storage::DBI::Replicated::Pool>) manages all the
declared replicants.  'maximum_lag' is the number of seconds a replicant is
allowed to lag behind the master before being temporarily removed from the pool.
Keep in mind that the Balancer option 'auto_validate_every' determins how often
a replicant is tested against this condition, so the true possible lag can be
higher than the number you set.  The default is zero.

No matter how low you set the maximum_lag or the auto_validate_every settings,
there is always the chance that your replicants will lag a bit behind the
master for the supported replication system built into MySQL.  You can ensure
reliabily reads by using a transaction, which will force both read and write
activity to the master, however this will increase the load on your master
database.

After you've configured the replicated storage, you need to add the connection
information for the replicants:

	$schema->storage->connect_replicants(
		[$dsn1, $user, $pass, \%opts],
 		[$dsn2, $user, $pass, \%opts],
 		[$dsn3, $user, $pass, \%opts],
 	);

These replicants should be configured as slaves to the master using the
instructions for MySQL native replication, or if you are just learning, you
will find L<MySQL::Sandbox> an easy way to set up a replication cluster.

And now your $schema object is properly configured!  Enjoy!

=head1 AUTHOR

John Napiorkowski <jjnapiork@cpan.org>

=head1 LICENSE

You may distribute this code under the same terms as Perl itself.

=cut

1;
Commit	Line	Data
212cc5c2	1	package DBIx::Class::Storage::DBI::Replicated::Introduction;
	2
	3	=head1 NAME
	4
	5	DBIx::Class::Storage::DBI::Replicated::Introduction - Minimum Need to Know
	6
	7	=head1 SYNOPSIS
	8
	9	This is an introductory document for L<DBIx::Class::Storage::Replication>.
	10
	11	This document is not an overview of what replication is or why you should be
	12	using it. It is not a document explaing how to setup MySQL native replication
	13	either. Copious external resources are avialable for both. This document
	14	presumes you have the basics down.
	15
	16	=head1 DESCRIPTION
	17
	18	L<DBIx::Class> supports a framework for using database replication. This system
	19	is integrated completely, which means once it's setup you should be able to
	20	automatically just start using a replication cluster without additional work or
	21	changes to your code. Some caveats apply, primarily related to the proper use
	22	of transactions (you are wrapping all your database modifying statements inside
	23	a transaction, right ;) ) however in our experience properly written DBIC will
	24	work transparently with Replicated storage.
	25
	26	Currently we have support for MySQL native replication, which is relatively
	27	easy to install and configure. We also currently support single master to one
	28	or more replicants (also called 'slaves' in some documentation). However the
	29	framework is not specifically tied to the MySQL framework and supporting other
	30	replication systems or topographies should be possible. Please bring your
	31	patches and ideas to the #dbix-class IRC channel or the mailing list.
	32
	33	For an easy way to start playing with MySQL native replication, see:
	34	L<MySQL::Sandbox>.
	35
	36	If you are using this with a L<Catalyst> based appplication, you may also wish
	37	to see more recent updates to L<Catalyst::Model::DBIC::Schema>, which has
	38	support for replication configuration options as well.
	39
	40	=head1 REPLICATED STORAGE
	41
	42	By default, when you start L<DBIx::Class>, your Schema (L<DBIx::Class::Schema>)
	43	is assigned a storage_type, which when fully connected will reflect your
	44	underlying storage engine as defined by your choosen database driver. For
	45	example, if you connect to a MySQL database, your storage_type will be
	46	L<DBIx::Class::Storage::DBI::mysql> Your storage type class will contain
	47	database specific code to help smooth over the differences between databases
	48	and let L<DBIx::Class> do its thing.
	49
	50	If you want to use replication, you will override this setting so that the
	51	replicated storage engine will 'wrap' your underlying storages and present to
	52	the end programmer a unified interface. This wrapper storage class will
	53	delegate method calls to either a master database or one or more replicated
	54	databases based on if they are read only (by default sent to the replicants)
	55	or write (reserved for the master). Additionally, the Replicated storage
	56	will monitor the health of your replicants and automatically drop them should
	57	one exceed configurable parameters. Later, it can automatically restore a
	58	replicant when its health is restored.
	59
	60	This gives you a very robust system, since you can add or drop replicants
	61	and DBIC will automatically adjust itself accordingly.
	62
	63	Additionally, if you need high data integrity, such as when you are executing
	64	a transaction, replicated storage will automatically delegate all database
65	traffic to the master storage. There are several ways to enable this high
66	integrity mode, but wrapping your statements inside a transaction is the easy
67	and canonical option.
68
69	=head1 PARTS OF REPLICATED STORAGE
70
71	A replicated storage contains several parts. First, there is the replicated
d4daee7b	72	storage itself (L<DBIx::Class::Storage::DBI::Replicated>). A replicated storage
212cc5c2	73	takes a pool of replicants (L<DBIx::Class::Storage::DBI::Replicated::Pool>)
	74	and a software balancer (L<DBIx::Class::Storage::DBI::Replicated::Pool>). The
	75	balancer does the job of splitting up all the read traffic amongst each
	76	replicant in the Pool. Currently there are two types of balancers, a Random one
	77	which chooses a Replicant in the Pool using a naive randomizer algorithm, and a
	78	First replicant, which just uses the first one in the Pool (and obviously is
	79	only of value when you have a single replicant).
	80
	81	=head1 REPLICATED STORAGE CONFIGURATION
	82
	83	All the parts of replication can be altered dynamically at runtime, which makes
	84	it possibly to create a system that automatically scales under load by creating
	85	more replicants as needed, perhaps using a cloud system such as Amazon EC2.
	86	However, for common use you can setup your replicated storage to be enabled at
	87	the time you connect the databases. The following is a breakdown of how you
	88	may wish to do this. Again, if you are using L<Catalyst>, I strongly recommend
	89	you use (or upgrade to) the latest L<Catalyst::Model::DBIC::Schema>, which makes
	90	this job even easier.
	91
	92	First, you need to connect your L<DBIx::Class::Schema>. Let's assume you have
	93	such a schema called, "MyApp::Schema".
	94
	95	use MyApp::Schema;
	96	my $schema = MyApp::Schema->connect($dsn, $user, $pass);
	97
	98	Next, you need to set the storage_type.
	99
	100	$schema->storage_type(
	101	::DBI::Replicated' => {
	102	balancer_type => '::Random',
	103	balancer_args => {
	104	auto_validate_every => 5,
	105	master_read_weight => 1
	106	},
	107	pool_args => {
	108	maximum_lag =>2,
	109	},
	110	}
	111	);
	112
	113	Let's break down the settings. The method L<DBIx::Class::Schema/storage_type>
	114	takes one mandatory parameter, a scalar value, and an option second value which
	115	is a Hash Reference of configuration options for that storage. In this case,
	116	we are setting the Replicated storage type using '::DBI::Replicated' as the
	117	first value. You will only use a different value if you are subclassing the
	118	replicated storage, so for now just copy that first parameter.
	119
	120	The second parameter contains a hash reference of stuff that gets passed to the
	121	replicated storage. L<DBIx::Class::Storage::DBI::Replicated/balancer_type> is
	122	the type of software load balancer you will use to split up traffic among all
	123	your replicants. Right now we have two options, "::Random" and "::First". You
	124	can review documentation for both at:
	125
	126	L<DBIx::Class::Storage::DBI::Replicated::Balancer::First>,
	127	L<DBIx::Class::Storage::DBI::Replicated::Balancer::Random>.
	128
	129	In this case we will have three replicants, so the ::Random option is the only
	130	one that makes sense.
	131
	132	'balancer_args' get passed to the balancer when it's instantiated. All
	133	balancers have the 'auto_validate_every' option. This is the number of seconds
	134	we allow to pass between validation checks on a load balanced replicant. So
	135	the higher the number, the more possibility that your reads to the replicant
	136	may be inconsistant with what's on the master. Setting this number too low
137	will result in increased database loads, so choose a number with care. Our
138	experience is that setting the number around 5 seconds results in a good
139	performance / integrity balance.
140
141	'master_read_weight' is an option associated with the ::Random balancer. It
142	allows you to let the master be read from. I usually leave this off (default
143	is off).
144
145	The 'pool_args' are configuration options associated with the replicant pool.
146	This object (L<DBIx::Class::Storage::DBI::Replicated::Pool>) manages all the
147	declared replicants. 'maximum_lag' is the number of seconds a replicant is
148	allowed to lag behind the master before being temporarily removed from the pool.
149	Keep in mind that the Balancer option 'auto_validate_every' determins how often
150	a replicant is tested against this condition, so the true possible lag can be
151	higher than the number you set. The default is zero.
152
153	No matter how low you set the maximum_lag or the auto_validate_every settings,
154	there is always the chance that your replicants will lag a bit behind the
155	master for the supported replication system built into MySQL. You can ensure
156	reliabily reads by using a transaction, which will force both read and write
157	activity to the master, however this will increase the load on your master
158	database.
159
160	After you've configured the replicated storage, you need to add the connection
161	information for the replicants:
162
163	$schema->storage->connect_replicants(
164	[$dsn1, $user, $pass, \%opts],
165	[$dsn2, $user, $pass, \%opts],
166	[$dsn3, $user, $pass, \%opts],
167	);
168
169	These replicants should be configured as slaves to the master using the
170	instructions for MySQL native replication, or if you are just learning, you
171	will find L<MySQL::Sandbox> an easy way to set up a replication cluster.
172
173	And now your $schema object is properly configured! Enjoy!
174
175	=head1 AUTHOR
176
177	John Napiorkowski <jjnapiork@cpan.org>
178
179	=head1 LICENSE
180
181	You may distribute this code under the same terms as Perl itself.
182
183	=cut
184
185	1;