3 XML::LibXML::InputCallback - XML::LibXML Class for Input Callbacks
14 You may get unexpected results if you are trying to load external documents
15 during libxml2 parsing if the location of the resource is not a HTTP, FTP or
16 relative location but a absolute path for example. To get around this
17 limitation, you may add your own input handler to open, read and close
18 particular types of locations or URI classes. Using this input callback
19 handlers, you can handle your own custom URI schemes for example.
21 The input callbacks are used whenever LibXML has to get something other than
22 externally parsed entities from somewhere. They are implemented using a
23 callback stack on the Perl layer in analogy to libxml2's native callback stack.
25 The XML::LibXML::InputCallback class transparently registers the input
26 callbacks for the libxml2's parser processes.
29 =head2 How does XML::LibXML::InputCallback work?
31 The libxml2 library offers a callback implementation as global functions only.
32 To work-around the troubles resulting in having only global callbacks - for
33 example, if the same global callback stack is manipulated by different
34 applications running together in a single Apache Web-server environment -,
35 XML::LibXML::InputCallback comes with a object-oriented and a function-oriented
38 Using the function-oriented part the global callback stack of libxml2 can be
39 manipulated. Those functions can be used as interface to the callbacks on the
40 C- and XS Layer. At the object-oriented part, operations for working with the
41 "pseudo-localized" callback stack are implemented. Currently, you can register
42 and de-register callbacks on the Perl layer and initialize them on a per parser
46 =head3 Callback Groups
48 The libxml2 input callbacks come in groups. One group contains a URI matcher (I<<<<<< match >>>>>>), a data stream constructor (I<<<<<< open >>>>>>), a data stream reader (I<<<<<< read >>>>>>), and a data stream destructor (I<<<<<< close >>>>>>). The callbacks can be manipulated on a per group basis only.
51 =head3 The Parser Process
53 The parser process work on a XML data stream, along which, links to other
54 resources can be embedded. This can be links to external DTDs or XIncludes for
55 example. Those resources are identified by URIs. The callback implementation of
56 libxml2 assumes that one callback group can handle a certain amount of URIs and
57 a certain URI scheme. Per default, callback handlers for I<<<<<< file://* >>>>>>, I<<<<<< file:://*.gz >>>>>>, I<<<<<< http://* >>>>>> and I<<<<<< ftp://* >>>>>> are registered.
59 Callback groups in the callback stack are processed from top to bottom, meaning
60 that callback groups registered later will be processed before the earlier
63 While parsing the data stream, the libxml2 parser checks if a registered
64 callback group will handle a URI - if they will not, the URI will be
65 interpreted as I<<<<<< file://URI >>>>>>. To handle a URI, the I<<<<<< match >>>>>> callback will have to return '1'. If that happens, the handling of the URI will
66 be passed to that callback group. Next, the URI will be passed to the I<<<<<< open >>>>>> callback, which should return a I<<<<<< reference >>>>>> to the data stream if it successfully opened the file, '0' otherwise. If
67 opening the stream was successful, the I<<<<<< read >>>>>> callback will be called repeatedly until it returns an empty string. After the
68 read callback, the I<<<<<< close >>>>>> callback will be called to close the stream.
71 =head3 Organisation of callback groups in XML::LibXML::InputCallback
73 Callback groups are implemented as a stack (Array), each entry holds a
74 reference to an array of the callbacks. For the libxml2 library, the
75 XML::LibXML::InputCallback callback implementation appears as one single
76 callback group. The Perl implementation however allows to manage different
77 callback stacks on a per libxml2-parser basis.
80 =head2 Using XML::LibXML::InputCallback
82 After object instantiation using the parameter-less constructor, you can
83 register callback groups.
87 my $input_callbacks = XML::LibXML::InputCallback->new();
88 $input_callbacks->register_callbacks([ $match_cb1, $open_cb1,
89 $read_cb1, $close_cb1 ] );
90 $input_callbacks->register_callbacks([ $match_cb2, $open_cb2,
91 $read_cb2, $close_cb2 ] );
92 $input_callbacks->register_callbacks( [ $match_cb3, $open_cb3,
93 $read_cb3, $close_cb3 ] );
95 $parser->input_callbacks( $input_callbacks );
96 $parser->parse_file( $some_xml_file );
99 =head2 What about the old callback system prior to XML::LibXML::InputCallback?
101 In XML::LibXML versions prior to 1.59 - i.e. without the
102 XML::LibXML::InputCallback module - you could define your callbacks either
103 using globally or locally. You still can do that using
104 XML::LibXML::InputCallback, and in addition to that you can define the
105 callbacks on a per parser basis!
107 If you use the old callback interface through global callbacks,
108 XML::LibXML::InputCallback will treat them with a lower priority as the ones
109 registered using the new interface. The global callbacks will not override the
110 callback groups registered using the new interface. Local callbacks are
111 attached to a specific parser instance, therefore they are treated with highest
112 priority. If the I<<<<<< match >>>>>> callback of the callback group registered as local variable is identical to one
113 of the callback groups registered using the new interface, that callback group
116 Users of the old callback implementation whose I<<<<<< open >>>>>> callback returned a plain string, will have to adapt their code to return a
117 reference to that string after upgrading to version >= 1.59. The new callback
118 system can only deal with the I<<<<<< open >>>>>> callback returning a reference!
121 =head1 INTERFACE DESCRIPTION
124 =head2 Global Variables
130 Stores the current callback and can be used as shortcut to access the callback
134 =item @_GLOBAL_CALLBACKS
136 Stores all callback groups for the current parser process.
141 Stores the currently used callback group. Used to prevent parser errors when
142 dealing with nested XML data.
149 =head2 Global Callbacks
153 =item _callback_match
155 Implements the interface for the I<<<<<< match >>>>>> callback at C-level and for the selection of the callback group from the
156 callbacks defined at the Perl-level.
161 Forwards the I<<<<<< open >>>>>> callback from libxml2 to the corresponding callback function at the Perl-level.
166 Forwards the read request to the corresponding callback function at the
167 Perl-level and returns the result to libxml2.
170 =item _callback_close
172 Forwards the I<<<<<< close >>>>>> callback from libxml2 to the corresponding callback function at the
186 A simple constructor.
189 =item register_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ])
191 The four callbacks I<<<<<< have >>>>>> to be given as array reference in the above order I<<<<<< match >>>>>>, I<<<<<< open >>>>>>, I<<<<<< read >>>>>>, I<<<<<< close >>>>>>!
194 =item unregister_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ])
196 With no arguments given, C<<<<<< unregister_callbacks() >>>>>> will delete the last registered callback group from the stack. If four
197 callbacks are passed as array reference, the callback group to unregister will
198 be identified by the I<<<<<< match >>>>>> callback and deleted from the callback stack. Note that if several identical I<<<<<< match >>>>>> callbacks are defined in different callback groups, ALL of them will be deleted
202 =item init_callbacks()
204 Initializes the callback system before a parsing process.
207 =item cleanup_callbacks()
209 Resets global variables and the libxml2 callback stack.
212 =item lib_init_callbacks()
214 Used internally for callback registration at C-level.
217 =item lib_cleanup_callbacks()
219 Used internally for callback resetting at the C-level.
228 =head1 EXAMPLE CALLBACKS
230 The following example is a purely fictitious example that uses a
231 MyScheme::Handler object that responds to methods similar to an IO::Handle.
235 # Define the four callback functions
238 return $uri =~ /^myscheme:/; # trigger our callback group at a 'myscheme' URIs
243 my $handler = MyScheme::Handler->new($uri);
247 # The returned $buffer will be parsed by the libxml2 parser
252 read($handler, $buffer, $length);
253 return $buffer; # $buffer will be an empty string '' if read() is done
256 # Close the handle associated with the resource.
262 # Register them with a instance of XML::LibXML::InputCallback
263 my $input_callbacks = XML::LibXML::InputCallback->new();
264 $input_callbacks->register_callbacks([ \&match_uri, \&open_uri,
265 \&read_uri, \&close_uri ] );
267 # Register the callback group at a parser instance
268 $parser->input_callbacks( $input_callbacks );
270 # $some_xml_file will be parsed using our callbacks
271 $parser->parse_file( $some_xml_file );
286 2001-2007, AxKit.com Ltd.
288 2002-2006, Christian Glahn.
290 2006-2009, Petr Pajas.