Commit | Line | Data |
3fea05b9 |
1 | package XML::Simple::FAQ; |
2 | 1; |
3 | |
4 | __END__ |
5 | |
6 | =head1 Frequently Asked Questions about XML::Simple |
7 | |
8 | |
9 | =head1 Basics |
10 | |
11 | |
12 | =head2 What is XML::Simple designed to be used for? |
13 | |
14 | XML::Simple is a Perl module that was originally developed as a tool for |
15 | reading and writing configuration data in XML format. You can use it for |
16 | many other purposes that involve storing and retrieving structured data in |
17 | XML. |
18 | |
19 | You might also find XML::Simple a good starting point for playing with XML |
20 | from Perl. It doesn't have a steep learning curve and if you outgrow its |
21 | capabilities there are plenty of other Perl/XML modules to 'step up' to. |
22 | |
23 | |
24 | =head2 Why store configuration data in XML anyway? |
25 | |
26 | The many advantages of using XML format for configuration data include: |
27 | |
28 | =over 4 |
29 | |
30 | =item * |
31 | |
32 | Using existing XML parsing tools requires less development time, is easier |
33 | and more robust than developing your own config file parsing code |
34 | |
35 | =item * |
36 | |
37 | XML can represent relationships between pieces of data, such as nesting of |
38 | sections to arbitrary levels (not easily done with .INI files for example) |
39 | |
40 | =item * |
41 | |
42 | XML is basically just text, so you can easily edit a config file (easier than |
43 | editing a Win32 registry) |
44 | |
45 | =item * |
46 | |
47 | XML provides standard solutions for handling character sets and encoding |
48 | beyond basic ASCII (important for internationalization) |
49 | |
50 | =item * |
51 | |
52 | If it becomes necessary to change your configuration file format, there are |
53 | many tools available for performing transformations on XML files |
54 | |
55 | =item * |
56 | |
57 | XML is an open standard (the world does not need more proprietary binary |
58 | file formats) |
59 | |
60 | =item * |
61 | |
62 | Taking the extra step of developing a DTD allows the format of configuration |
63 | files to be validated before your program reads them (not directly supported |
64 | by XML::Simple) |
65 | |
66 | =item * |
67 | |
68 | Combining a DTD with a good XML editor can give you a GUI config editor for |
69 | minimal coding effort |
70 | |
71 | =back |
72 | |
73 | |
74 | =head2 What isn't XML::Simple good for? |
75 | |
76 | The main limitation of XML::Simple is that it does not work with 'mixed |
77 | content' (see the next question). If you consider your XML files contain |
78 | marked up text rather than structured data, you should probably use another |
79 | module. |
80 | |
81 | If you are working with very large XML files, XML::Simple's approach of |
82 | representing the whole file in memory as a 'tree' data structure may not be |
83 | suitable. |
84 | |
85 | |
86 | =head2 What is mixed content? |
87 | |
88 | Consider this example XML: |
89 | |
90 | <document> |
91 | <para>This is <em>mixed</em> content.</para> |
92 | </document> |
93 | |
94 | This is said to be mixed content, because the E<lt>paraE<gt> element contains |
95 | both character data (text content) and nested elements. |
96 | |
97 | Here's some more XML: |
98 | |
99 | <person> |
100 | <first_name>Joe</first_name> |
101 | <last_name>Bloggs</last_name> |
102 | <dob>25-April-1969</dob> |
103 | </person> |
104 | |
105 | This second example is not generally considered to be mixed content. The |
106 | E<lt>first_nameE<gt>, E<lt>last_nameE<gt> and E<lt>dobE<gt> elements contain |
107 | only character data and the E<lt>personE<gt> element contains only nested |
108 | elements. (Note: Strictly speaking, the whitespace between the nested |
109 | elements is character data, but it is ignored by XML::Simple). |
110 | |
111 | |
112 | =head2 Why doesn't XML::Simple handle mixed content? |
113 | |
114 | Because if it did, it would no longer be simple :-) |
115 | |
116 | Seriously though, there are plenty of excellent modules that allow you to |
117 | work with mixed content in a variety of ways. Handling mixed content |
118 | correctly is not easy and by ignoring these issues, XML::Simple is able to |
119 | present an API without a steep learning curve. |
120 | |
121 | |
122 | =head2 Which Perl modules do handle mixed content? |
123 | |
124 | Every one of them except XML::Simple :-) |
125 | |
126 | If you're looking for a recommendation, I'd suggest you look at the Perl-XML |
127 | FAQ at: |
128 | |
129 | http://perl-xml.sourceforge.net/faq/ |
130 | |
131 | |
132 | =head1 Installation |
133 | |
134 | |
135 | =head2 How do I install XML::Simple? |
136 | |
137 | If you're running ActiveState Perl, you've probably already got XML::Simple |
138 | (although you may want to upgrade to version 1.09 or better for SAX support). |
139 | |
140 | If you do need to install XML::Simple, you'll need to install an XML parser |
141 | module first. Install either XML::Parser (which you may have already) or |
142 | XML::SAX. If you install both, XML::SAX will be used by default. |
143 | |
144 | Once you have a parser installed ... |
145 | |
146 | On Unix systems, try: |
147 | |
148 | perl -MCPAN -e 'install XML::Simple' |
149 | |
150 | If that doesn't work, download the latest distribution from |
151 | ftp://ftp.cpan.org/pub/CPAN/authors/id/G/GR/GRANTM , unpack it and run these |
152 | commands: |
153 | |
154 | perl Makefile.PL |
155 | make |
156 | make test |
157 | make install |
158 | |
159 | On Win32, if you have a recent build of ActiveState Perl (618 or better) try |
160 | this command: |
161 | |
162 | ppm install XML::Simple |
163 | |
164 | If that doesn't work, you really only need the Simple.pm file, so extract it |
165 | from the .tar.gz file (eg: using WinZIP) and save it in the \site\lib\XML |
166 | directory under your Perl installation (typically C:\Perl). |
167 | |
168 | |
169 | =head2 I'm trying to install XML::Simple and 'make test' fails |
170 | |
171 | Is the directory where you've unpacked XML::Simple mounted from a file server |
172 | using NFS, SMB or some other network file sharing? If so, that may cause |
173 | errors in the the following test scripts: |
174 | |
175 | 3_Storable.t |
176 | 4_MemShare.t |
177 | 5_MemCopy.t |
178 | |
179 | The test suite is designed to exercise the boundary conditions of all |
180 | XML::Simple's functionality and these three scripts exercise the caching |
181 | functions. If XML::Simple is asked to parse a file for which it has a cached |
182 | copy of a previous parse, then it compares the timestamp on the XML file with |
183 | the timestamp on the cached copy. If the cached copy is *newer* then it will |
184 | be used. If the cached copy is older or the same age then the file is |
185 | re-parsed. The test scripts will get confused by networked filesystems if |
186 | the workstation and server system clocks are not synchronised (to the |
187 | second). |
188 | |
189 | If you get an error in one of these three test scripts but you don't plan to |
190 | use the caching options (they're not enabled by default), then go right ahead |
191 | and run 'make install'. If you do plan to use caching, then try unpacking |
192 | the distribution on local disk and doing the build/test there. |
193 | |
194 | It's probably not a good idea to use the caching options with networked |
195 | filesystems in production. If the file server's clock is ahead of the local |
196 | clock, XML::Simple will re-parse files when it could have used the cached |
197 | copy. However if the local clock is ahead of the file server clock and a |
198 | file is changed immediately after it is cached, the old cached copy will be |
199 | used. |
200 | |
201 | Is one of the three test scripts (above) failing but you're not running on |
202 | a network filesystem? Are you running Win32? If so, you may be seeing a bug |
203 | in Win32 where writes to a file do not affect its modfication timestamp. |
204 | |
205 | If none of these scenarios match your situation, please confirm you're |
206 | running the latest version of XML::Simple and then email the output of |
207 | 'make test' to me at grantm@cpan.org |
208 | |
209 | =head2 Why is XML::Simple so slow? |
210 | |
211 | If you find that XML::Simple is very slow reading XML, the most likely reason |
212 | is that you have XML::SAX installed but no additional SAX parser module. The |
213 | XML::SAX distribution includes an XML parser written entirely in Perl. This is |
214 | very portable but not very fast. For better performance install either |
215 | XML::SAX::Expat or XML::LibXML. |
216 | |
217 | |
218 | =head1 Usage |
219 | |
220 | =head2 How do I use XML::Simple? |
221 | |
222 | If you had an XML document called /etc/appconfig/foo.xml you could 'slurp' it |
223 | into a simple data structure (typically a hashref) with these lines of code: |
224 | |
225 | use XML::Simple; |
226 | |
227 | my $config = XMLin('/etc/appconfig/foo.xml'); |
228 | |
229 | The XMLin() function accepts options after the filename. |
230 | |
231 | |
232 | =head2 There are so many options, which ones do I really need to know about? |
233 | |
234 | Although you can get by without using any options, you shouldn't even |
235 | consider using XML::Simple in production until you know what these two |
236 | options do: |
237 | |
238 | =over 4 |
239 | |
240 | =item * |
241 | |
242 | forcearray |
243 | |
244 | =item * |
245 | |
246 | keyattr |
247 | |
248 | =back |
249 | |
250 | The reason you really need to read about them is because the default values |
251 | for these options will trip you up if you don't. Although everyone agrees |
252 | that these defaults are not ideal, there is not wide agreement on what they |
253 | should be changed to. The answer therefore is to read about them (see below) |
254 | and select values which are right for you. |
255 | |
256 | |
257 | =head2 What is the forcearray option all about? |
258 | |
259 | Consider this XML in a file called ./person.xml: |
260 | |
261 | <person> |
262 | <first_name>Joe</first_name> |
263 | <last_name>Bloggs</last_name> |
264 | <hobbie>bungy jumping</hobbie> |
265 | <hobbie>sky diving</hobbie> |
266 | <hobbie>knitting</hobbie> |
267 | </person> |
268 | |
269 | You could read it in with this line: |
270 | |
271 | my $person = XMLin('./person.xml'); |
272 | |
273 | Which would give you a data structure like this: |
274 | |
275 | $person = { |
276 | 'first_name' => 'Joe', |
277 | 'last_name' => 'Bloggs', |
278 | 'hobbie' => [ 'bungy jumping', 'sky diving', 'knitting' ] |
279 | }; |
280 | |
281 | The E<lt>first_nameE<gt> and E<lt>last_nameE<gt> elements are represented as |
282 | simple scalar values which you could refer to like this: |
283 | |
284 | print "$person->{first_name} $person->{last_name}\n"; |
285 | |
286 | The E<lt>hobbieE<gt> elements are represented as an array - since there is |
287 | more than one. You could refer to the first one like this: |
288 | |
289 | print $person->{hobbie}->[0], "\n"; |
290 | |
291 | Or the whole lot like this: |
292 | |
293 | print join(', ', @{$person->{hobbie}} ), "\n"; |
294 | |
295 | The catch is, that these last two lines of code will only work for people |
296 | who have more than one hobbie. If there is only one E<lt>hobbieE<gt> |
297 | element, it will be represented as a simple scalar (just like |
298 | E<lt>first_nameE<gt> and E<lt>last_nameE<gt>). Which might lead you to write |
299 | code like this: |
300 | |
301 | if(ref($person->{hobbie})) { |
302 | print join(', ', @{$person->{hobbie}} ), "\n"; |
303 | } |
304 | else { |
305 | print $person->{hobbie}, "\n"; |
306 | } |
307 | |
308 | Don't do that. |
309 | |
310 | One alternative approach is to set the forcearray option to a true value: |
311 | |
312 | my $person = XMLin('./person.xml', forcearray => 1); |
313 | |
314 | Which will give you a data structure like this: |
315 | |
316 | $person = { |
317 | 'first_name' => [ 'Joe' ], |
318 | 'last_name' => [ 'Bloggs' ], |
319 | 'hobbie' => [ 'bungy jumping', 'sky diving', 'knitting' ] |
320 | }; |
321 | |
322 | Then you can use this line to refer to all the list of hobbies even if there |
323 | was only one: |
324 | |
325 | print join(', ', @{$person->{hobbie}} ), "\n"; |
326 | |
327 | The downside of this approach is that the E<lt>first_nameE<gt> and |
328 | E<lt>last_nameE<gt> elements will also always be represented as arrays even |
329 | though there will never be more than one: |
330 | |
331 | print "$person->{first_name}->[0] $person->{last_name}->[0]\n"; |
332 | |
333 | This might be OK if you change the XML to use attributes for things that |
334 | will always be singular and nested elements for things that may be plural: |
335 | |
336 | <person first_name="Jane" last_name="Bloggs"> |
337 | <hobbie>motorcycle maintenance</hobbie> |
338 | </person> |
339 | |
340 | On the other hand, if you prefer not to use attributes, then you could |
341 | specify that any E<lt>hobbieE<gt> elements should always be represented as |
342 | arrays and all other nested elements should be simple scalar values unless |
343 | there is more than one: |
344 | |
345 | my $person = XMLin('./person.xml', forcearray => [ 'hobbie' ]); |
346 | |
347 | The forcearray option accepts a list of element names which should always |
348 | be forced to an array representation: |
349 | |
350 | forcearray => [ qw(hobbie qualification childs_name) ] |
351 | |
352 | See the XML::Simple manual page for more information. |
353 | |
354 | |
355 | =head2 What is the keyattr option all about? |
356 | |
357 | Consider this sample XML: |
358 | |
359 | <catalog> |
360 | <part partnum="1842334" desc="High pressure flange" price="24.50" /> |
361 | <part partnum="9344675" desc="Threaded gasket" price="9.25" /> |
362 | <part partnum="5634896" desc="Low voltage washer" price="12.00" /> |
363 | </catalog> |
364 | |
365 | You could slurp it in with this code: |
366 | |
367 | my $catalog = XMLin('./catalog.xml'); |
368 | |
369 | Which would return a data structure like this: |
370 | |
371 | $catalog = { |
372 | 'part' => [ |
373 | { |
374 | 'partnum' => '1842334', |
375 | 'desc' => 'High pressure flange', |
376 | 'price' => '24.50' |
377 | }, |
378 | { |
379 | 'partnum' => '9344675', |
380 | 'desc' => 'Threaded gasket', |
381 | 'price' => '9.25' |
382 | }, |
383 | { |
384 | 'partnum' => '5634896', |
385 | 'desc' => 'Low voltage washer', |
386 | 'price' => '12.00' |
387 | } |
388 | ] |
389 | }; |
390 | |
391 | Then you could access the description of the first part in the catalog |
392 | with this code: |
393 | |
394 | print $catalog->{part}->[0]->{desc}, "\n"; |
395 | |
396 | However, if you wanted to access the description of the part with the |
397 | part number of "9344675" then you'd have to code a loop like this: |
398 | |
399 | foreach my $part (@{$catalog->{part}}) { |
400 | if($part->{partnum} eq '9344675') { |
401 | print $part->{desc}, "\n"; |
402 | last; |
403 | } |
404 | } |
405 | |
406 | The knowledge that each E<lt>partE<gt> element has a unique partnum attribute |
407 | allows you to eliminate this search. You can pass this knowledge on to |
408 | XML::Simple like this: |
409 | |
410 | my $catalog = XMLin($xml, keyattr => ['partnum']); |
411 | |
412 | Which will return a data structure like this: |
413 | |
414 | $catalog = { |
415 | 'part' => { |
416 | '5634896' => { 'desc' => 'Low voltage washer', 'price' => '12.00' }, |
417 | '1842334' => { 'desc' => 'High pressure flange', 'price' => '24.50' }, |
418 | '9344675' => { 'desc' => 'Threaded gasket', 'price' => '9.25' } |
419 | } |
420 | }; |
421 | |
422 | XML::Simple has been able to transform $catalog->{part} from an arrayref to |
423 | a hashref (keyed on partnum). This transformation is called 'array folding'. |
424 | |
425 | Through the use of array folding, you can now index directly to the |
426 | description of the part you want: |
427 | |
428 | print $catalog->{part}->{9344675}->{desc}, "\n"; |
429 | |
430 | The 'keyattr' option also enables array folding when the unique key is in a |
431 | nested element rather than an attribute. eg: |
432 | |
433 | <catalog> |
434 | <part> |
435 | <partnum>1842334</partnum> |
436 | <desc>High pressure flange</desc> |
437 | <price>24.50</price> |
438 | </part> |
439 | <part> |
440 | <partnum>9344675</partnum> |
441 | <desc>Threaded gasket</desc> |
442 | <price>9.25</price> |
443 | </part> |
444 | <part> |
445 | <partnum>5634896</partnum> |
446 | <desc>Low voltage washer</desc> |
447 | <price>12.00</price> |
448 | </part> |
449 | </catalog> |
450 | |
451 | See the XML::Simple manual page for more information. |
452 | |
453 | |
454 | =head2 So what's the catch with 'keyattr'? |
455 | |
456 | One thing to watch out for is that you might get array folding even if you |
457 | don't supply the keyattr option. The default value for this option is: |
458 | |
459 | [ 'name', 'key', 'id'] |
460 | |
461 | Which means if your XML elements have a 'name', 'key' or 'id' attribute (or |
462 | nested element) then they may get folded on those values. This means that |
463 | you can take advantage of array folding simply through careful choice of |
464 | attribute names. On the hand, if you really don't want array folding at all, |
465 | you'll need to set 'key attr to an empty list: |
466 | |
467 | my $ref = XMLin($xml, keyattr => []); |
468 | |
469 | A second 'gotcha' is that array folding only works on arrays. That might |
470 | seem obvious, but if there's only one record in your XML and you didn't set |
471 | the 'forcearray' option then it won't be represented as an array and |
472 | consequently won't get folded into a hash. The moral is that if you're |
473 | using array folding, you should always turn on the forcearray option. |
474 | |
475 | You probably want to be as specific as you can be too. For instance, the |
476 | safest way to parse the E<lt>catalogE<gt> example above would be: |
477 | |
478 | my $catalog = XMLin($xml, keyattr => { part => 'partnum'}, |
479 | forcearray => ['part']); |
480 | |
481 | By using the hashref for keyattr, you can specify that only E<lt>partE<gt> |
482 | elements should be folded on the 'partnum' attribute (and that the |
483 | E<lt>partE<gt> elements should not be folded on any other attribute). |
484 | |
485 | By supplying a list of element names for forcearray, you're ensuring that |
486 | folding will work even if there's only one E<lt>partE<gt>. You're also |
487 | ensuring that if the 'partnum' unique key is supplied in a nested element |
488 | then that element won't get forced to an array too. |
489 | |
490 | |
491 | =head2 How do I know what my data structure should look like? |
492 | |
493 | The rules are fairly straightforward: |
494 | |
495 | =over 4 |
496 | |
497 | =item * |
498 | |
499 | each element gets represented as a hash |
500 | |
501 | =item * |
502 | |
503 | unless it contains only text, in which case it'll be a simple scalar value |
504 | |
505 | =item * |
506 | |
507 | or unless there's more than one element with the same name, in which case |
508 | they'll be represented as an array |
509 | |
510 | =item * |
511 | |
512 | unless you've got array folding enabled, in which case they'll be folded into |
513 | a hash |
514 | |
515 | =item * |
516 | |
517 | empty elements (no text contents B<and> no attributes) will either be |
518 | represented as an empty hash, an empty string or undef - depending on the value |
519 | of the 'suppressempty' option. |
520 | |
521 | =back |
522 | |
523 | If you're in any doubt, use Data::Dumper, eg: |
524 | |
525 | use XML::Simple; |
526 | use Data::Dumper; |
527 | |
528 | my $ref = XMLin($xml); |
529 | |
530 | print Dumper($ref); |
531 | |
532 | |
533 | =head2 I'm getting 'Use of uninitialized value' warnings |
534 | |
535 | You're probably trying to index into a non-existant hash key - try |
536 | Data::Dumper. |
537 | |
538 | |
539 | =head2 I'm getting a 'Not an ARRAY reference' error |
540 | |
541 | Something that you expect to be an array is not. The two most likely causes |
542 | are that you forgot to use 'forcearray' or that the array got folded into a |
543 | hash - try Data::Dumper. |
544 | |
545 | |
546 | =head2 I'm getting a 'No such array field' error |
547 | |
548 | Something that you expect to be a hash is actually an array. Perhaps array |
549 | folding failed because one element was missing the key attribute - try |
550 | Data::Dumper. |
551 | |
552 | |
553 | =head2 I'm getting an 'Out of memory' error |
554 | |
555 | Something in the data structure is not as you expect and Perl may be trying |
556 | unsuccessfully to autovivify things - try Data::Dumper. |
557 | |
558 | If you're already using Data::Dumper, try calling Dumper() immediately after |
559 | XMLin() - ie: before you attempt to access anything in the data structure. |
560 | |
561 | |
562 | =head2 My element order is getting jumbled up |
563 | |
564 | If you read an XML file with XMLin() and then write it back out with |
565 | XMLout(), the order of the elements will likely be different. (However, if |
566 | you read the file back in with XMLin() you'll get the same Perl data |
567 | structure). |
568 | |
569 | The reordering happens because XML::Simple uses hashrefs to store your data |
570 | and Perl hashes do not really have any order. |
571 | |
572 | It is possible that a future version of XML::Simple will use Tie::IxHash |
573 | to store the data in hashrefs which do retain the order. However this will |
574 | not fix all cases of element order being lost. |
575 | |
576 | If your application really is sensitive to element order, don't use |
577 | XML::Simple (and don't put order-sensitive values in attributes). |
578 | |
579 | |
580 | =head2 XML::Simple turns nested elements into attributes |
581 | |
582 | If you read an XML file with XMLin() and then write it back out with |
583 | XMLout(), some data which was originally stored in nested elements may end up |
584 | in attributes. (However, if you read the file back in with XMLin() you'll |
585 | get the same Perl data structure). |
586 | |
587 | There are a number of ways you might handle this: |
588 | |
589 | =over 4 |
590 | |
591 | =item * |
592 | |
593 | use the 'forcearray' option with XMLin() |
594 | |
595 | =item * |
596 | |
597 | use the 'noattr' option with XMLout() |
598 | |
599 | =item * |
600 | |
601 | live with it |
602 | |
603 | =item * |
604 | |
605 | don't use XML::Simple |
606 | |
607 | =back |
608 | |
609 | |
610 | =head2 Why does XMLout() insert E<lt>nameE<gt> elements (or attributes)? |
611 | |
612 | Try setting keyattr => []. |
613 | |
614 | When you call XMLin() to read XML, the 'keyattr' option controls whether arrays |
615 | get 'folded' into hashes. Similarly, when you call XMLout(), the 'keyattr' |
616 | option controls whether hashes get 'unfolded' into arrays. As described above, |
617 | 'keyattr' is enabled by default. |
618 | |
619 | =head2 Why are empty elements represented as empty hashes? |
620 | |
621 | An element is always represented as a hash unless it contains only text, in |
622 | which case it is represented as a scalar string. |
623 | |
624 | If you would prefer empty elements to be represented as empty strings or the |
625 | undefined value, set the 'suppressempty' option to '' or undef respectively. |
626 | |
627 | =head2 Why is ParserOpts deprecated? |
628 | |
629 | The C<ParserOpts> option is a remnant of the time when XML::Simple only worked |
630 | with the XML::Parser API. Its value is completely ignored if you're using a |
631 | SAX parser, so writing code which relied on it would bar you from taking |
632 | advantage of SAX. |
633 | |
634 | Even if you are using XML::Parser, it is seldom necessary to pass options to |
635 | the parser object. A number of people have written to say they use this option |
636 | to set XML::Parser's C<ProtocolEncoding> option. Don't do that, it's wrong, |
637 | Wrong, WRONG! Fix the XML document so that it's well-formed and you won't have |
638 | a problem. |
639 | |
640 | Having said all of that, as long as XML::Simple continues to support the |
641 | XML::Parser API, this option will not be removed. There are currently no plans |
642 | to remove support for the XML::Parser API. |
643 | |
644 | =cut |
645 | |
646 | |