XML::Parser::PerlSAX






XML::Parser::PerlSAX − Perl SAX parser using XML::Parser


 use XML::Parser::PerlSAX;

      $parser = XML::Parser::PerlSAX‐>new( [OPTIONS] );
 $result = $parser‐>parse( [OPTIONS] );

      $result = $parser‐>parse($string);

"XML::Parser::PerlSAX" is a PerlSAX parser using the
XML::Parser module.  This man page summarizes the specific
options, handlers, and properties supported by
"XML::Parser::PerlSAX"; please refer to the PerlSAX standard
in ‘"PerlSAX.pod"’ for general usage information.



new Creates a new parser object.  Default options for
    parsing, described below, are passed as key‐value pairs
    or as a single hash.  Options may be changed directly in
    the parser object unless stated otherwise.  Options
    passed to ‘"parse()"’ override the default options in
    the parser object for the duration of the parse.

parse
    Parses a document.  Options, described below, are passed
    as key‐value pairs or as a single hash.  Options passed
    to ‘"parse()"’ override default options in the parser
    object.

location
    Returns the location as a hash:

      ColumnNumber    The column number of the parse.
      LineNumber      The line number of the parse.
      BytePosition    The current byte position of the parse.
      PublicId        A string containing the public identifier, or undef
                      if none is available.
      SystemId        A string containing the system identifier, or undef
                      if none is available.
      Base            The current value of the base for resolving relative
                      URIs.

    ALPHA WARNING: The ‘"SystemId"’ and ‘"PublicId"’
    properties returned are the system and public
    identifiers of the document passed to ‘"parse()"’, not
    the identifiers of the currently parsing external
    entity.  The column, line, and byte positions are of the
    current entity being parsed.













                             ‐2‐


The following options are supported by
"XML::Parser::PerlSAX":

 Handler          default handler to receive events
 DocumentHandler  handler to receive document events
 DTDHandler       handler to receive DTD events
 ErrorHandler     handler to receive error events
 EntityResolver   handler to resolve entities
 Locale           locale to provide localisation for errors
 Source           hash containing the input source for parsing
 UseAttributeOrder set to true to provide AttributeOrder and Defaulted
                   properties in ‘start_element()’

If no handlers are provided then all events will be silently
ignored, except for ‘"fatal_error()"’ which will cause a
‘"die()"’ to be called after calling ‘"end_document()"’.

If a single string argument is passed to the ‘"parse()"’
method, it is treated as if a ‘"Source"’ option was given
with a ‘"String"’ parameter.

The ‘"Source"’ hash may contain the following parameters:

 ByteStream       The raw byte stream (file handle) containing the
                  document.
 String           A string containing the document.
 SystemId         The system identifier (URI) of the document.
 PublicId         The public identifier.
 Encoding         A string describing the character encoding.

If more than one of ‘"ByteStream"’, ‘"String"’, or
‘"SystemId"’, then preference is given first to
‘"ByteStream"’, then ‘"String"’, then ‘"SystemId"’.

The following handlers and properties are supported by
"XML::Parser::PerlSAX":

     DocumentHandler methods


     start_document
         Receive notification of the beginning of a
         document.

         No properties defined.

     end_document
         Receive notification of the end of a document.

         No properties defined.

     start_element
         Receive notification of the beginning of an
         element.









                             ‐3‐


          Name             The element type name.
          Attributes       A hash containing the attributes attached to the
                           element, if any.

         The ‘"Attributes"’ hash contains only string
         values.

         If the ‘"UseAttributeOrder"’ parser option is true,
         the following properties are also passed to
         ‘"start_element"’:

          AttributeOrder   An array of attribute names in the order they were
                           specified, followed by the defaulted attribute
                           names.
          Defaulted        The index number of the first defaulted attribute in
                           ‘AttributeOrder.  If this index is equal to the
                           length of ‘AttributeOrder’, there were no defaulted
                           values.

         Note to "XML::Parser" users:  ‘"Defaulted"’ will be
         half the value of "XML::Parser::Expat"’s
         ‘"specified_attr()"’ function because only
         attribute names are provided, not their values.

     end_element
         Receive notification of the end of an element.

          Name             The element type name.

     characters
         Receive notification of character data.

          Data             The characters from the XML document.

     processing_instruction
         Receive notification of a processing instruction.

          Target           The processing instruction target.
          Data             The processing instruction data, if any.

     comment
         Receive notification of a comment.

          Data             The comment data, if any.

     start_cdata
         Receive notification of the start of a CDATA
         section.

         No properties defined.

     end_cdata
         Receive notification of the end of a CDATA section.










                             ‐4‐


         No properties defined.

     entity_reference
         Receive notification of an internal entity
         reference.  If this handler is defined, internal
         entities will not be expanded and not passed to the
         ‘"characters()"’ handler.  If this handler is not
         defined, internal entities will be expanded if
         possible and passed to the ‘"characters()"’
         handler.

          Name             The entity reference name
          Value            The entity reference value

          DTDHandler methods


     notation_decl
         Receive notification of a notation declaration
         event.

          Name             The notation name.
          PublicId         The notation’s public identifier, if any.
          SystemId         The notation’s system identifier, if any.
          Base             The base for resolving a relative URI, if any.

     unparsed_entity_decl
         Receive notification of an unparsed entity
         declaration event.

          Name             The unparsed entity’s name.
          SystemId         The entity’s system identifier.
          PublicId         The entity’s public identifier, if any.
          Base             The base for resolving a relative URI, if any.

     entity_decl
         Receive notification of an entity declaration
         event.

          Name             The entity name.
          Value            The entity value, if any.
          PublicId         The notation’s public identifier, if any.
          SystemId         The notation’s system identifier, if any.
          Notation         The notation declared for this entity, if any.

         For internal entities, the ‘"Value"’ parameter will
         contain the value and the ‘"PublicId"’,
         ‘"SystemId"’, and ‘"Notation"’ will be undefined.
         For external entities, the ‘"Value"’ parameter will
         be undefined, the ‘"SystemId"’ parameter will have
         the system id, the ‘"PublicId"’ parameter will have
         the public id if it was provided (it will be
         undefined otherwise), the ‘"Notation"’ parameter
         will contain the notation name for unparsed









                             ‐5‐


         entities.  If this is a parameter entity
         declaration, then a ’%’ will be prefixed to the
         entity name.

         Note that ‘"entity_decl()"’ and
         ‘"unparsed_entity_decl()"’ overlap.  If both
         methods are implemented by a handler, then this
         handler will not be called for unparsed entities.

     element_decl
         Receive notification of an element declaration
         event.

          Name             The element type name.
          Model            The content model as a string.

     attlist_decl
         Receive notification of an attribute list
         declaration event.

         This handler is called for each attribute in an
         ATTLIST declaration found in the internal subset.
         So an ATTLIST declaration that has multiple
         attributes will generate multiple calls to this
         handler.

          ElementName      The element type name.
          AttributeName    The attribute name.
          Type             The attribute type.
          Fixed            True if this is a fixed attribute.

         The default for ‘"Type"’ is the default value,
         which will either be "#REQUIRED", "#IMPLIED" or a
         quoted string (i.e. the returned string will begin
         and end with a quote character).

     doctype_decl
         Receive notification of a DOCTYPE declaration
         event.

          Name             The document type name.
          SystemId         The document’s system identifier.
          PublicId         The document’s public identifier, if any.
          Internal         The internal subset as a string, if any.

         Internal will contain all whitespace, comments,
         processing instructions, and declarations seen in
         the internal subset. The declarations will be there
         whether or not they have been processed by another
         handler (except for unparsed entities processed by
         the Unparsed handler).  However, comments and
         processing instructions will not appear if they’ve
         been processed by their respective handlers.










                             ‐6‐


     xml_decl
         Receive notification of an XML declaration event.

          Version          The version.
          Encoding         The encoding string, if any.
          Standalone       True, false, or undefined if not declared.

          EntityResolver


     resolve_entity
         Allow the handler to resolve external entities.

          Name             The notation name.
          SystemId         The notation’s system identifier.
          PublicId         The notation’s public identifier, if any.
          Base             The base for resolving a relative URI, if any.

         ‘"resolve_entity()"’ should return undef to request
         that the parser open a regular URI connection to
         the system identifier or a hash describing the new
         input source.  This hash has the same properties as
         the ‘"Source"’ parameter to ‘"parse()"’:

           PublicId    The public identifier of the external entity being
                       referenced, or undef if none was supplied.
           SystemId    The system identifier of the external entity being
                       referenced.
           String      String containing XML text
           ByteStream  An open file handle.
           CharacterStream
                       An open file handle.
           Encoding    The character encoding, if known.

     Ken MacLeod, ken@bitsko.slc.ut.us

     perl(1), PerlSAX.pod(3)

      Extensible Markup Language (XML) <http://www.w3c.org/XML/>
      SAX 1.0: The Simple API for XML <http://www.megginson.com/SAX/>