[dbsrgits/DBIx-Class.git] / lib / DBIx / Class / Storage / DBI / Replicated / Introduction.pod

package DBIx::Class::Storage::DBI::Replicated::Introduction;

=head1 NAME

DBIx::Class::Storage::DBI::Replicated::Introduction - Minimum Need to Know

=head1 SYNOPSIS

This is an introductory document for L<DBIx::Class::Storage::Replication>.

This document is not an overview of what replication is or why you should be
using it.  It is not a document explaing how to setup MySQL native replication
either.  Copious external resources are avialable for both.  This document
presumes you have the basics down.
  
=head1 DESCRIPTION

L<DBIx::Class> supports a framework for using database replication.  This system
is integrated completely, which means once it's setup you should be able to 
automatically just start using a replication cluster without additional work or
changes to your code.  Some caveats apply, primarily related to the proper use
of transactions (you are wrapping all your database modifying statements inside
a transaction, right ;) ) however in our experience properly written DBIC will
work transparently with Replicated storage.

Currently we have support for MySQL native replication, which is relatively
easy to install and configure.  We also currently support single master to one
or more replicants (also called 'slaves' in some documentation).  However the
framework is not specifically tied to the MySQL framework and supporting other
replication systems or topographies should be possible.  Please bring your
patches and ideas to the #dbix-class IRC channel or the mailing list.

For an easy way to start playing with MySQL native replication, see:
L<MySQL::Sandbox>.

If you are using this with a L<Catalyst> based appplication, you may also wish
to see more recent updates to L<Catalyst::Model::DBIC::Schema>, which has 
support for replication configuration options as well.

=head1 REPLICATED STORAGE

By default, when you start L<DBIx::Class>, your Schema (L<DBIx::Class::Schema>)
is assigned a storage_type, which when fully connected will reflect your
underlying storage engine as defined by your chosen database driver.  For
example, if you connect to a MySQL database, your storage_type will be
L<DBIx::Class::Storage::DBI::mysql>  Your storage type class will contain 
database specific code to help smooth over the differences between databases
and let L<DBIx::Class> do its thing.

If you want to use replication, you will override this setting so that the
replicated storage engine will 'wrap' your underlying storages and present to
the end programmer a unified interface.  This wrapper storage class will
delegate method calls to either a master database or one or more replicated
databases based on if they are read only (by default sent to the replicants)
or write (reserved for the master).  Additionally, the Replicated storage 
will monitor the health of your replicants and automatically drop them should
one exceed configurable parameters.  Later, it can automatically restore a
replicant when its health is restored.

This gives you a very robust system, since you can add or drop replicants
and DBIC will automatically adjust itself accordingly.

Additionally, if you need high data integrity, such as when you are executing
a transaction, replicated storage will automatically delegate all database
traffic to the master storage.  There are several ways to enable this high
integrity mode, but wrapping your statements inside a transaction is the easy
and canonical option. 

=head1 PARTS OF REPLICATED STORAGE

A replicated storage contains several parts.  First, there is the replicated
storage itself (L<DBIx::Class::Storage::DBI::Replicated>).  A replicated storage
takes a pool of replicants (L<DBIx::Class::Storage::DBI::Replicated::Pool>)
and a software balancer (L<DBIx::Class::Storage::DBI::Replicated::Pool>).  The
balancer does the job of splitting up all the read traffic amongst each
replicant in the Pool. Currently there are two types of balancers, a Random one
which chooses a Replicant in the Pool using a naive randomizer algorithm, and a
First replicant, which just uses the first one in the Pool (and obviously is
only of value when you have a single replicant).

=head1 REPLICATED STORAGE CONFIGURATION

All the parts of replication can be altered dynamically at runtime, which makes
it possibly to create a system that automatically scales under load by creating
more replicants as needed, perhaps using a cloud system such as Amazon EC2.
However, for common use you can setup your replicated storage to be enabled at
the time you connect the databases.  The following is a breakdown of how you
may wish to do this.  Again, if you are using L<Catalyst>, I strongly recommend
you use (or upgrade to) the latest L<Catalyst::Model::DBIC::Schema>, which makes
this job even easier.

First, you need to get a C<$schema> object and set the storage_type:

  my $schema = MyApp::Schema->clone;
  $schema->storage_type([
    '::DBI::Replicated' => {
      balancer_type => '::Random',
      balancer_args => {
        auto_validate_every => 5,
        master_read_weight => 1
      },
      pool_args => {
        maximum_lag =>2,
      },
    }
  ]);

Then, you need to connect your L<DBIx::Class::Schema>.

  $schema->connection($dsn, $user, $pass);

Let's break down the settings.  The method L<DBIx::Class::Schema/storage_type>
takes one mandatory parameter, a scalar value, and an option second value which
is a Hash Reference of configuration options for that storage.  In this case,
we are setting the Replicated storage type using '::DBI::Replicated' as the
first value.  You will only use a different value if you are subclassing the
replicated storage, so for now just copy that first parameter.

The second parameter contains a hash reference of stuff that gets passed to the
replicated storage.  L<DBIx::Class::Storage::DBI::Replicated/balancer_type> is
the type of software load balancer you will use to split up traffic among all
your replicants.  Right now we have two options, "::Random" and "::First". You
can review documentation for both at:

L<DBIx::Class::Storage::DBI::Replicated::Balancer::First>,
L<DBIx::Class::Storage::DBI::Replicated::Balancer::Random>.

In this case we will have three replicants, so the ::Random option is the only
one that makes sense.

'balancer_args' get passed to the balancer when it's instantiated.  All
balancers have the 'auto_validate_every' option.  This is the number of seconds
we allow to pass between validation checks on a load balanced replicant. So
the higher the number, the more possibility that your reads to the replicant 
may be inconsistent with what's on the master.  Setting this number too low
will result in increased database loads, so choose a number with care.  Our
experience is that setting the number around 5 seconds results in a good
performance / integrity balance.

'master_read_weight' is an option associated with the ::Random balancer.  It
allows you to let the master be read from.  I usually leave this off (default
is off).

The 'pool_args' are configuration options associated with the replicant pool.
This object (L<DBIx::Class::Storage::DBI::Replicated::Pool>) manages all the
declared replicants.  'maximum_lag' is the number of seconds a replicant is
allowed to lag behind the master before being temporarily removed from the pool.
Keep in mind that the Balancer option 'auto_validate_every' determins how often
a replicant is tested against this condition, so the true possible lag can be
higher than the number you set.  The default is zero.

No matter how low you set the maximum_lag or the auto_validate_every settings,
there is always the chance that your replicants will lag a bit behind the
master for the supported replication system built into MySQL.  You can ensure
reliabily reads by using a transaction, which will force both read and write
activity to the master, however this will increase the load on your master
database.

After you've configured the replicated storage, you need to add the connection
information for the replicants:

  $schema->storage->connect_replicants(
    [$dsn1, $user, $pass, \%opts],
    [$dsn2, $user, $pass, \%opts],
    [$dsn3, $user, $pass, \%opts],
  );

These replicants should be configured as slaves to the master using the
instructions for MySQL native replication, or if you are just learning, you
will find L<MySQL::Sandbox> an easy way to set up a replication cluster.

And now your $schema object is properly configured!  Enjoy!

=head1 AUTHOR

John Napiorkowski <jjnapiork@cpan.org>

=head1 LICENSE

You may distribute this code under the same terms as Perl itself.

=cut

1;
Commit	Line	Data
212cc5c2	1	package DBIx::Class::Storage::DBI::Replicated::Introduction;
	2
	3	=head1 NAME
	4
	5	DBIx::Class::Storage::DBI::Replicated::Introduction - Minimum Need to Know
	6
	7	=head1 SYNOPSIS
	8
	9	This is an introductory document for L<DBIx::Class::Storage::Replication>.
	10
	11	This document is not an overview of what replication is or why you should be
	12	using it. It is not a document explaing how to setup MySQL native replication
	13	either. Copious external resources are avialable for both. This document
	14	presumes you have the basics down.
	15
	16	=head1 DESCRIPTION
	17
	18	L<DBIx::Class> supports a framework for using database replication. This system
	19	is integrated completely, which means once it's setup you should be able to
	20	automatically just start using a replication cluster without additional work or
	21	changes to your code. Some caveats apply, primarily related to the proper use
	22	of transactions (you are wrapping all your database modifying statements inside
	23	a transaction, right ;) ) however in our experience properly written DBIC will
	24	work transparently with Replicated storage.
	25
	26	Currently we have support for MySQL native replication, which is relatively
	27	easy to install and configure. We also currently support single master to one
	28	or more replicants (also called 'slaves' in some documentation). However the
	29	framework is not specifically tied to the MySQL framework and supporting other
	30	replication systems or topographies should be possible. Please bring your
	31	patches and ideas to the #dbix-class IRC channel or the mailing list.
	32
	33	For an easy way to start playing with MySQL native replication, see:
	34	L<MySQL::Sandbox>.
	35
	36	If you are using this with a L<Catalyst> based appplication, you may also wish
	37	to see more recent updates to L<Catalyst::Model::DBIC::Schema>, which has
	38	support for replication configuration options as well.
	39
	40	=head1 REPLICATED STORAGE
	41
	42	By default, when you start L<DBIx::Class>, your Schema (L<DBIx::Class::Schema>)
	43	is assigned a storage_type, which when fully connected will reflect your
c1300297	44	underlying storage engine as defined by your chosen database driver. For
212cc5c2	45	example, if you connect to a MySQL database, your storage_type will be
	46	L<DBIx::Class::Storage::DBI::mysql> Your storage type class will contain
	47	database specific code to help smooth over the differences between databases
	48	and let L<DBIx::Class> do its thing.
	49
	50	If you want to use replication, you will override this setting so that the
	51	replicated storage engine will 'wrap' your underlying storages and present to
	52	the end programmer a unified interface. This wrapper storage class will
	53	delegate method calls to either a master database or one or more replicated
	54	databases based on if they are read only (by default sent to the replicants)
	55	or write (reserved for the master). Additionally, the Replicated storage
	56	will monitor the health of your replicants and automatically drop them should
	57	one exceed configurable parameters. Later, it can automatically restore a
	58	replicant when its health is restored.
	59
	60	This gives you a very robust system, since you can add or drop replicants
	61	and DBIC will automatically adjust itself accordingly.
	62
	63	Additionally, if you need high data integrity, such as when you are executing
	64	a transaction, replicated storage will automatically delegate all database
	65	traffic to the master storage. There are several ways to enable this high
	66	integrity mode, but wrapping your statements inside a transaction is the easy
	67	and canonical option.
	68
	69	=head1 PARTS OF REPLICATED STORAGE
	70
	71	A replicated storage contains several parts. First, there is the replicated
d4daee7b	72	storage itself (L<DBIx::Class::Storage::DBI::Replicated>). A replicated storage
212cc5c2	73	takes a pool of replicants (L<DBIx::Class::Storage::DBI::Replicated::Pool>)
	74	and a software balancer (L<DBIx::Class::Storage::DBI::Replicated::Pool>). The
	75	balancer does the job of splitting up all the read traffic amongst each
	76	replicant in the Pool. Currently there are two types of balancers, a Random one
	77	which chooses a Replicant in the Pool using a naive randomizer algorithm, and a
	78	First replicant, which just uses the first one in the Pool (and obviously is
	79	only of value when you have a single replicant).
	80
	81	=head1 REPLICATED STORAGE CONFIGURATION
	82
	83	All the parts of replication can be altered dynamically at runtime, which makes
	84	it possibly to create a system that automatically scales under load by creating
	85	more replicants as needed, perhaps using a cloud system such as Amazon EC2.
	86	However, for common use you can setup your replicated storage to be enabled at
	87	the time you connect the databases. The following is a breakdown of how you
	88	may wish to do this. Again, if you are using L<Catalyst>, I strongly recommend
	89	you use (or upgrade to) the latest L<Catalyst::Model::DBIC::Schema>, which makes
	90	this job even easier.
	91
ce854fd3	92	First, you need to get a C<$schema> object and set the storage_type:
	93
	94	my $schema = MyApp::Schema->clone;
	95	$schema->storage_type([
	96	'::DBI::Replicated' => {
	97	balancer_type => '::Random',
	98	balancer_args => {
	99	auto_validate_every => 5,
	100	master_read_weight => 1
	101	},
	102	pool_args => {
	103	maximum_lag =>2,
	104	},
	105	}
	106	]);
	107
	108	Then, you need to connect your L<DBIx::Class::Schema>.
	109
	110	$schema->connection($dsn, $user, $pass);
212cc5c2	111
	112	Let's break down the settings. The method L<DBIx::Class::Schema/storage_type>
	113	takes one mandatory parameter, a scalar value, and an option second value which
	114	is a Hash Reference of configuration options for that storage. In this case,
	115	we are setting the Replicated storage type using '::DBI::Replicated' as the
	116	first value. You will only use a different value if you are subclassing the
	117	replicated storage, so for now just copy that first parameter.
	118
	119	The second parameter contains a hash reference of stuff that gets passed to the
	120	replicated storage. L<DBIx::Class::Storage::DBI::Replicated/balancer_type> is
	121	the type of software load balancer you will use to split up traffic among all
	122	your replicants. Right now we have two options, "::Random" and "::First". You
	123	can review documentation for both at:
	124
	125	L<DBIx::Class::Storage::DBI::Replicated::Balancer::First>,
	126	L<DBIx::Class::Storage::DBI::Replicated::Balancer::Random>.
	127
	128	In this case we will have three replicants, so the ::Random option is the only
	129	one that makes sense.
	130
	131	'balancer_args' get passed to the balancer when it's instantiated. All
	132	balancers have the 'auto_validate_every' option. This is the number of seconds
	133	we allow to pass between validation checks on a load balanced replicant. So
	134	the higher the number, the more possibility that your reads to the replicant
c1300297	135	may be inconsistent with what's on the master. Setting this number too low
212cc5c2	136	will result in increased database loads, so choose a number with care. Our
	137	experience is that setting the number around 5 seconds results in a good
	138	performance / integrity balance.
	139
	140	'master_read_weight' is an option associated with the ::Random balancer. It
	141	allows you to let the master be read from. I usually leave this off (default
	142	is off).
	143
	144	The 'pool_args' are configuration options associated with the replicant pool.
	145	This object (L<DBIx::Class::Storage::DBI::Replicated::Pool>) manages all the
	146	declared replicants. 'maximum_lag' is the number of seconds a replicant is
	147	allowed to lag behind the master before being temporarily removed from the pool.
	148	Keep in mind that the Balancer option 'auto_validate_every' determins how often
	149	a replicant is tested against this condition, so the true possible lag can be
	150	higher than the number you set. The default is zero.
	151
	152	No matter how low you set the maximum_lag or the auto_validate_every settings,
	153	there is always the chance that your replicants will lag a bit behind the
	154	master for the supported replication system built into MySQL. You can ensure
	155	reliabily reads by using a transaction, which will force both read and write
	156	activity to the master, however this will increase the load on your master
	157	database.
	158
	159	After you've configured the replicated storage, you need to add the connection
	160	information for the replicants:
	161
ce854fd3	162	$schema->storage->connect_replicants(
	163	[$dsn1, $user, $pass, \%opts],
	164	[$dsn2, $user, $pass, \%opts],
	165	[$dsn3, $user, $pass, \%opts],
	166	);
212cc5c2	167
	168	These replicants should be configured as slaves to the master using the
	169	instructions for MySQL native replication, or if you are just learning, you
	170	will find L<MySQL::Sandbox> an easy way to set up a replication cluster.
	171
	172	And now your $schema object is properly configured! Enjoy!
	173
	174	=head1 AUTHOR
	175
	176	John Napiorkowski <jjnapiork@cpan.org>
	177
	178	=head1 LICENSE
	179
	180	You may distribute this code under the same terms as Perl itself.
	181
	182	=cut
	183
	184	1;