[dbsrgits/DBM-Deep.git] / lib / DBM / Deep / Cookbook.pod

=head1 NAME

DBM::Deep::Cookbook

=head1 DESCRIPTION

This is the Cookbook for L<DBM::Deep>. It contains useful tips and tricks,
plus some examples of how to do common tasks.

=head1 RECIPES

=head2 UTF8 data

When you're using UTF8 data, you may run into the "Wide character in print"
warning. To fix that in 5.8+, do the following:

  my $db = DBM::Deep->new( ... );
  binmode $db->_fh, ":utf8";

In 5.6, you will have to do the following:

  my $db = DBM::Deep->new( ... );
  $db->set_filter( 'store_value' => sub { pack "U0C*", unpack "C*", $_[0] } );
  $db->set_filter( 'retrieve_value' => sub { pack "C*", unpack "U0C*", $_[0] } );

In a future version, you will be able to specify C<utf8 =E<gt> 1> and
L<DBM::Deep> will do these things for you.

=head2 Real-time Encryption Example

B<NOTE>: This is just an example of how to write a filter. This most
definitely should B<NOT> be taken as a proper way to write a filter that does
encryption.

Here is a working example that uses the I<Crypt::Blowfish> module to
do real-time encryption / decryption of keys & values with DBM::Deep Filters.
Please visit L<http://search.cpan.org/search?module=Crypt::Blowfish> for more
on I<Crypt::Blowfish>. You'll also need the I<Crypt::CBC> module.

  use DBM::Deep;
  use Crypt::Blowfish;
  use Crypt::CBC;

  my $cipher = Crypt::CBC->new({
      'key'             => 'my secret key',
      'cipher'          => 'Blowfish',
      'iv'              => '$KJh#(}q',
      'regenerate_key'  => 0,
      'padding'         => 'space',
      'prepend_iv'      => 0
  });

  my $db = DBM::Deep->new(
      file => "foo-encrypt.db",
      filter_store_key => \&my_encrypt,
      filter_store_value => \&my_encrypt,
      filter_fetch_key => \&my_decrypt,
      filter_fetch_value => \&my_decrypt,
  );

  $db->{key1} = "value1";
  $db->{key2} = "value2";
  print "key1: " . $db->{key1} . "\n";
  print "key2: " . $db->{key2} . "\n";

  undef $db;
  exit;

  sub my_encrypt {
      return $cipher->encrypt( $_[0] );
  }
  sub my_decrypt {
      return $cipher->decrypt( $_[0] );
  }

=head2 Real-time Compression Example

Here is a working example that uses the I<Compress::Zlib> module to do real-time
compression / decompression of keys & values with DBM::Deep Filters.
Please visit L<http://search.cpan.org/search?module=Compress::Zlib> for
more on I<Compress::Zlib>.

  use DBM::Deep;
  use Compress::Zlib;

  my $db = DBM::Deep->new(
      file => "foo-compress.db",
      filter_store_key => \&my_compress,
      filter_store_value => \&my_compress,
      filter_fetch_key => \&my_decompress,
      filter_fetch_value => \&my_decompress,
  );

  $db->{key1} = "value1";
  $db->{key2} = "value2";
  print "key1: " . $db->{key1} . "\n";
  print "key2: " . $db->{key2} . "\n";

  undef $db;
  exit;

  sub my_compress {
      return Compress::Zlib::memGzip( $_[0] ) ;
  }
  sub my_decompress {
      return Compress::Zlib::memGunzip( $_[0] ) ;
  }

B<Note:> Filtering of keys only applies to hashes. Array "keys" are
actually numerical index numbers, and are not filtered.

=head1 Custom Digest Algorithm

DBM::Deep by default uses the I<Message Digest 5> (MD5) algorithm for hashing
keys. However you can override this, and use another algorithm (such as SHA-256)
or even write your own. But please note that DBM::Deep currently expects zero
collisions, so your algorithm has to be I<perfect>, so to speak. Collision
detection may be introduced in a later version.

You can specify a custom digest algorithm by passing it into the parameter
list for new(), passing a reference to a subroutine as the 'digest' parameter,
and the length of the algorithm's hashes (in bytes) as the 'hash_size'
parameter. Here is a working example that uses a 256-bit hash from the
I<Digest::SHA256> module. Please see
L<http://search.cpan.org/search?module=Digest::SHA256> for more information.

  use DBM::Deep;
  use Digest::SHA256;

  my $context = Digest::SHA256::new(256);

  my $db = DBM::Deep->new(
      filename => "foo-sha.db",
      digest => \&my_digest,
      hash_size => 32,
  );

  $db->{key1} = "value1";
  $db->{key2} = "value2";
  print "key1: " . $db->{key1} . "\n";
  print "key2: " . $db->{key2} . "\n";

  undef $db;
  exit;

  sub my_digest {
      return substr( $context->hash($_[0]), 0, 32 );
  }

B<Note:> Your returned digest strings must be B<EXACTLY> the number
of bytes you specify in the hash_size parameter (in this case 32). Undefined
behavior will occur otherwise.

B<Note:> If you do choose to use a custom digest algorithm, you must set it
every time you access this file. Otherwise, the default (MD5) will be used.

=head1 PERFORMANCE

Because DBM::Deep is a conncurrent datastore, every change is flushed to disk
immediately and every read goes to disk. This means that DBM::Deep functions
at the speed of disk (generally 10-20ms) vs. the speed of RAM (generally
50-70ns), or at least 150-200x slower than the comparable in-memory
datastructure in Perl.

There are several techniques you can use to speed up how DBM::Deep functions.

=over 4

=item * Put it on a ramdisk

The easiest and quickest mechanism to making DBM::Deep run faster is to create
a ramdisk and locate the DBM::Deep file there. Doing this as an option may
become a feature of DBM::Deep, assuming there is a good ramdisk wrapper on CPAN.

=item * Work at the tightest level possible

It is much faster to assign the level of your db that you are working with to
an intermediate variable than to re-look it up every time. Thus

  # BAD
  while ( my ($k, $v) = each %{$db->{foo}{bar}{baz}} ) {
    ...
  }

  # GOOD
  my $x = $db->{foo}{bar}{baz};
  while ( my ($k, $v) = each %$x ) {
    ...
  }

=item * Make your file as tight as possible

If you know that you are not going to use more than 65K in your database,
consider using the C<pack_size =E<gt> 'small'> option. This will instruct
DBM::Deep to use 16bit addresses, meaning that the seek times will be less.

=back

=cut
Commit	Line	Data
d8db2929	1	=head1 NAME
	2
	3	DBM::Deep::Cookbook
	4
	5	=head1 DESCRIPTION
	6
b8370759	7	This is the Cookbook for L<DBM::Deep>. It contains useful tips and tricks,
d8db2929	8	plus some examples of how to do common tasks.
	9
	10	=head1 RECIPES
	11
	12	=head2 UTF8 data
	13
	14	When you're using UTF8 data, you may run into the "Wide character in print"
	15	warning. To fix that in 5.8+, do the following:
	16
	17	my $db = DBM::Deep->new( ... );
	18	binmode $db->_fh, ":utf8";
	19
	20	In 5.6, you will have to do the following:
	21
	22	my $db = DBM::Deep->new( ... );
	23	$db->set_filter( 'store_value' => sub { pack "U0C", unpack "C", $_[0] } );
	24	$db->set_filter( 'retrieve_value' => sub { pack "C", unpack "U0C", $_[0] } );
	25
	26	In a future version, you will be able to specify C<utf8 =E<gt> 1> and
b8370759	27	L<DBM::Deep> will do these things for you.
d8db2929	28
1cff45d7	29	=head2 Real-time Encryption Example
	30
	31	B<NOTE>: This is just an example of how to write a filter. This most
	32	definitely should B<NOT> be taken as a proper way to write a filter that does
	33	encryption.
	34
	35	Here is a working example that uses the I<Crypt::Blowfish> module to
	36	do real-time encryption / decryption of keys & values with DBM::Deep Filters.
	37	Please visit L<http://search.cpan.org/search?module=Crypt::Blowfish> for more
	38	on I<Crypt::Blowfish>. You'll also need the I<Crypt::CBC> module.
	39
	40	use DBM::Deep;
	41	use Crypt::Blowfish;
	42	use Crypt::CBC;
	43
	44	my $cipher = Crypt::CBC->new({
	45	'key' => 'my secret key',
	46	'cipher' => 'Blowfish',
	47	'iv' => '$KJh#(}q',
	48	'regenerate_key' => 0,
	49	'padding' => 'space',
	50	'prepend_iv' => 0
	51	});
	52
	53	my $db = DBM::Deep->new(
	54	file => "foo-encrypt.db",
	55	filter_store_key => \&my_encrypt,
	56	filter_store_value => \&my_encrypt,
	57	filter_fetch_key => \&my_decrypt,
	58	filter_fetch_value => \&my_decrypt,
	59	);
	60
	61	$db->{key1} = "value1";
	62	$db->{key2} = "value2";
	63	print "key1: " . $db->{key1} . "\n";
	64	print "key2: " . $db->{key2} . "\n";
	65
	66	undef $db;
	67	exit;
	68
	69	sub my_encrypt {
	70	return $cipher->encrypt( $_[0] );
	71	}
	72	sub my_decrypt {
	73	return $cipher->decrypt( $_[0] );
	74	}
	75
	76	=head2 Real-time Compression Example
	77
	78	Here is a working example that uses the I<Compress::Zlib> module to do real-time
	79	compression / decompression of keys & values with DBM::Deep Filters.
	80	Please visit L<http://search.cpan.org/search?module=Compress::Zlib> for
	81	more on I<Compress::Zlib>.
	82
	83	use DBM::Deep;
	84	use Compress::Zlib;
	85
	86	my $db = DBM::Deep->new(
	87	file => "foo-compress.db",
	88	filter_store_key => \&my_compress,
	89	filter_store_value => \&my_compress,
	90	filter_fetch_key => \&my_decompress,
	91	filter_fetch_value => \&my_decompress,
	92	);
93
94	$db->{key1} = "value1";
95	$db->{key2} = "value2";
96	print "key1: " . $db->{key1} . "\n";
97	print "key2: " . $db->{key2} . "\n";
98
99	undef $db;
100	exit;
101
102	sub my_compress {
103	return Compress::Zlib::memGzip( $_[0] ) ;
104	}
105	sub my_decompress {
106	return Compress::Zlib::memGunzip( $_[0] ) ;
107	}
108
109	B<Note:> Filtering of keys only applies to hashes. Array "keys" are
110	actually numerical index numbers, and are not filtered.
111
112	=head1 Custom Digest Algorithm
113
114	DBM::Deep by default uses the I<Message Digest 5> (MD5) algorithm for hashing
115	keys. However you can override this, and use another algorithm (such as SHA-256)
116	or even write your own. But please note that DBM::Deep currently expects zero
117	collisions, so your algorithm has to be I<perfect>, so to speak. Collision
118	detection may be introduced in a later version.
119
120	You can specify a custom digest algorithm by passing it into the parameter
121	list for new(), passing a reference to a subroutine as the 'digest' parameter,
122	and the length of the algorithm's hashes (in bytes) as the 'hash_size'
123	parameter. Here is a working example that uses a 256-bit hash from the
124	I<Digest::SHA256> module. Please see
125	L<http://search.cpan.org/search?module=Digest::SHA256> for more information.
126
127	use DBM::Deep;
128	use Digest::SHA256;
129
130	my $context = Digest::SHA256::new(256);
131
132	my $db = DBM::Deep->new(
133	filename => "foo-sha.db",
134	digest => \&my_digest,
135	hash_size => 32,
136	);
137
138	$db->{key1} = "value1";
139	$db->{key2} = "value2";
140	print "key1: " . $db->{key1} . "\n";
141	print "key2: " . $db->{key2} . "\n";
142
143	undef $db;
144	exit;
145
146	sub my_digest {
147	return substr( $context->hash($_[0]), 0, 32 );
148	}
149
150	B<Note:> Your returned digest strings must be B<EXACTLY> the number
151	of bytes you specify in the hash_size parameter (in this case 32). Undefined
152	behavior will occur otherwise.
153
154	B<Note:> If you do choose to use a custom digest algorithm, you must set it
155	every time you access this file. Otherwise, the default (MD5) will be used.
156
157	=head1 PERFORMANCE
158
159	Because DBM::Deep is a conncurrent datastore, every change is flushed to disk
160	immediately and every read goes to disk. This means that DBM::Deep functions
161	at the speed of disk (generally 10-20ms) vs. the speed of RAM (generally
162	50-70ns), or at least 150-200x slower than the comparable in-memory
163	datastructure in Perl.
164
165	There are several techniques you can use to speed up how DBM::Deep functions.
166
167	=over 4
168
169	=item * Put it on a ramdisk
170
171	The easiest and quickest mechanism to making DBM::Deep run faster is to create
172	a ramdisk and locate the DBM::Deep file there. Doing this as an option may
173	become a feature of DBM::Deep, assuming there is a good ramdisk wrapper on CPAN.
174
175	=item * Work at the tightest level possible
176
177	It is much faster to assign the level of your db that you are working with to
178	an intermediate variable than to re-look it up every time. Thus
179
180	# BAD
181	while ( my ($k, $v) = each %{$db->{foo}{bar}{baz}} ) {
182	...
183	}
184
185	# GOOD
186	my $x = $db->{foo}{bar}{baz};
187	while ( my ($k, $v) = each %$x ) {
188	...
189	}
190
191	=item * Make your file as tight as possible
192
193	If you know that you are not going to use more than 65K in your database,
194	consider using the C<pack_size =E<gt> 'small'> option. This will instruct
195	DBM::Deep to use 16bit addresses, meaning that the seek times will be less.
196
197	=back
198
199	=cut