lib/DBIx/Class/Storage/DBI/Replicated/Introduction.pod

   1 package DBIx::Class::Storage::DBI::Replicated::Introduction;
   2
   3 =head1 NAME
   4
   5 DBIx::Class::Storage::DBI::Replicated::Introduction - Minimum Need to Know
   6
   7 =head1 SYNOPSIS
   8
   9 This is an introductory document for L<DBIx::Class::Storage::Replication>.
  10
  11 This document is not an overview of what replication is or why you should be
  12 using it.  It is not a document explaing how to setup MySQL native replication
  13 either.  Copious external resources are avialable for both.  This document
  14 presumes you have the basics down.
  15
  16 =head1 DESCRIPTION
  17
  18 L<DBIx::Class> supports a framework for using database replication.  This system
  19 is integrated completely, which means once it's setup you should be able to
  20 automatically just start using a replication cluster without additional work or
  21 changes to your code.  Some caveats apply, primarily related to the proper use
  22 of transactions (you are wrapping all your database modifying statements inside
  23 a transaction, right ;) ) however in our experience properly written DBIC will
  24 work transparently with Replicated storage.
  25
  26 Currently we have support for MySQL native replication, which is relatively
  27 easy to install and configure.  We also currently support single master to one
  28 or more replicants (also called 'slaves' in some documentation).  However the
  29 framework is not specifically tied to the MySQL framework and supporting other
  30 replication systems or topographies should be possible.  Please bring your
  31 patches and ideas to the #dbix-class IRC channel or the mailing list.
  32
  33 For an easy way to start playing with MySQL native replication, see:
  34 L<MySQL::Sandbox>.
  35
  36 If you are using this with a L<Catalyst> based appplication, you may also wish
  37 to see more recent updates to L<Catalyst::Model::DBIC::Schema>, which has
  38 support for replication configuration options as well.
  39
  40 =head1 REPLICATED STORAGE
  41
  42 By default, when you start L<DBIx::Class>, your Schema (L<DBIx::Class::Schema>)
  43 is assigned a storage_type, which when fully connected will reflect your
  44 underlying storage engine as defined by your choosen database driver.  For
  45 example, if you connect to a MySQL database, your storage_type will be
  46 L<DBIx::Class::Storage::DBI::mysql>  Your storage type class will contain
  47 database specific code to help smooth over the differences between databases
  48 and let L<DBIx::Class> do its thing.
  49
  50 If you want to use replication, you will override this setting so that the
  51 replicated storage engine will 'wrap' your underlying storages and present to
  52 the end programmer a unified interface.  This wrapper storage class will
  53 delegate method calls to either a master database or one or more replicated
  54 databases based on if they are read only (by default sent to the replicants)
  55 or write (reserved for the master).  Additionally, the Replicated storage
  56 will monitor the health of your replicants and automatically drop them should
  57 one exceed configurable parameters.  Later, it can automatically restore a
  58 replicant when its health is restored.
  59
  60 This gives you a very robust system, since you can add or drop replicants
  61 and DBIC will automatically adjust itself accordingly.
  62
  63 Additionally, if you need high data integrity, such as when you are executing
  64 a transaction, replicated storage will automatically delegate all database
  65 traffic to the master storage.  There are several ways to enable this high
  66 integrity mode, but wrapping your statements inside a transaction is the easy
  67 and canonical option.
  68
  69 =head1 PARTS OF REPLICATED STORAGE
  70
  71 A replicated storage contains several parts.  First, there is the replicated
  72 storage itself (L<DBIx::Class::Storage::DBI::Replicated).  A replicated storage
  73 takes a pool of replicants (L<DBIx::Class::Storage::DBI::Replicated::Pool>)
  74 and a software balancer (L<DBIx::Class::Storage::DBI::Replicated::Pool>).  The
  75 balancer does the job of splitting up all the read traffic amongst each
  76 replicant in the Pool. Currently there are two types of balancers, a Random one
  77 which chooses a Replicant in the Pool using a naive randomizer algorithm, and a
  78 First replicant, which just uses the first one in the Pool (and obviously is
  79 only of value when you have a single replicant).
  80
  81 =head1 REPLICATED STORAGE CONFIGURATION
  82
  83 All the parts of replication can be altered dynamically at runtime, which makes
  84 it possibly to create a system that automatically scales under load by creating
  85 more replicants as needed, perhaps using a cloud system such as Amazon EC2.
  86 However, for common use you can setup your replicated storage to be enabled at
  87 the time you connect the databases.  The following is a breakdown of how you
  88 may wish to do this.  Again, if you are using L<Catalyst>, I strongly recommend
  89 you use (or upgrade to) the latest L<Catalyst::Model::DBIC::Schema>, which makes
  90 this job even easier.
  91
  92 First, you need to connect your L<DBIx::Class::Schema>.  Let's assume you have
  93 such a schema called, "MyApp::Schema".
  94
  95         use MyApp::Schema;
  96         my $schema = MyApp::Schema->connect($dsn, $user, $pass);
  97
  98 Next, you need to set the storage_type.
  99
 100         $schema->storage_type(
 101                 ::DBI::Replicated' => {
 102                         balancer_type => '::Random',
 103             balancer_args => {
 104                                 auto_validate_every => 5,
 105                                 master_read_weight => 1
 106                         },
 107                         pool_args => {
 108                                 maximum_lag =>2,
 109                         },
 110                 }
 111         );
 112
 113 Let's break down the settings.  The method L<DBIx::Class::Schema/storage_type>
 114 takes one mandatory parameter, a scalar value, and an option second value which
 115 is a Hash Reference of configuration options for that storage.  In this case,
 116 we are setting the Replicated storage type using '::DBI::Replicated' as the
 117 first value.  You will only use a different value if you are subclassing the
 118 replicated storage, so for now just copy that first parameter.
 119
 120 The second parameter contains a hash reference of stuff that gets passed to the
 121 replicated storage.  L<DBIx::Class::Storage::DBI::Replicated/balancer_type> is
 122 the type of software load balancer you will use to split up traffic among all
 123 your replicants.  Right now we have two options, "::Random" and "::First". You
 124 can review documentation for both at:
 125
 126 L<DBIx::Class::Storage::DBI::Replicated::Balancer::First>,
 127 L<DBIx::Class::Storage::DBI::Replicated::Balancer::Random>.
 128
 129 In this case we will have three replicants, so the ::Random option is the only
 130 one that makes sense.
 131
 132 'balancer_args' get passed to the balancer when it's instantiated.  All
 133 balancers have the 'auto_validate_every' option.  This is the number of seconds
 134 we allow to pass between validation checks on a load balanced replicant. So
 135 the higher the number, the more possibility that your reads to the replicant
 136 may be inconsistant with what's on the master.  Setting this number too low
 137 will result in increased database loads, so choose a number with care.  Our
 138 experience is that setting the number around 5 seconds results in a good
 139 performance / integrity balance.
 140
 141 'master_read_weight' is an option associated with the ::Random balancer.  It
 142 allows you to let the master be read from.  I usually leave this off (default
 143 is off).
 144
 145 The 'pool_args' are configuration options associated with the replicant pool.
 146 This object (L<DBIx::Class::Storage::DBI::Replicated::Pool>) manages all the
 147 declared replicants.  'maximum_lag' is the number of seconds a replicant is
 148 allowed to lag behind the master before being temporarily removed from the pool.
 149 Keep in mind that the Balancer option 'auto_validate_every' determins how often
 150 a replicant is tested against this condition, so the true possible lag can be
 151 higher than the number you set.  The default is zero.
 152
 153 No matter how low you set the maximum_lag or the auto_validate_every settings,
 154 there is always the chance that your replicants will lag a bit behind the
 155 master for the supported replication system built into MySQL.  You can ensure
 156 reliabily reads by using a transaction, which will force both read and write
 157 activity to the master, however this will increase the load on your master
 158 database.
 159
 160 After you've configured the replicated storage, you need to add the connection
 161 information for the replicants:
 162
 163         $schema->storage->connect_replicants(
 164                 [$dsn1, $user, $pass, \%opts],
 165                 [$dsn2, $user, $pass, \%opts],
 166                 [$dsn3, $user, $pass, \%opts],
 167         );
 168
 169 These replicants should be configured as slaves to the master using the
 170 instructions for MySQL native replication, or if you are just learning, you
 171 will find L<MySQL::Sandbox> an easy way to set up a replication cluster.
 172
 173 And now your $schema object is properly configured!  Enjoy!
 174
 175 =head1 AUTHOR
 176
 177 John Napiorkowski <jjnapiork@cpan.org>
 178
 179 =head1 LICENSE
 180
 181 You may distribute this code under the same terms as Perl itself.
 182
 183 =cut
 184
 185 1;