uri

uri(3tcl)         Tcl Uniform Resource Identifier Management         uri(3tcl)



______________________________________________________________________________

NAME
       uri - URI utilities

SYNOPSIS
       package require Tcl  8.2

       package require uri  ?1.2.7?

       uri::setQuirkOption option ?value?

       uri::split url ?defaultscheme?

       uri::join ?key value?...

       uri::resolve base url

       uri::isrelative url

       uri::geturl url ?options...?

       uri::canonicalize uri

       uri::register schemeList script

______________________________________________________________________________

DESCRIPTION
       This package does two things.

       First, it provides a number of commands for manipulating URLs/URIs and
       fetching data specified by them. For fetching data this package
       analyses the requested URL/URI and then dispatches it to the
       appropriate package (http, ftp, ...) for actual retrieval.  Currently
       these commands are defined for the schemes http, https, ftp, mailto,
       news, ldap, ldaps and file.  The package uri::urn adds scheme urn.

       Second, it provides regular expressions for a number of registered
       URL/URI schemes. Registered schemes are currently ftp, ldap, ldaps,
       file, http, https, gopher, mailto, news, wais and prospero.  The
       package uri::urn adds scheme urn.

       The commands of the package conform to RFC 3986 (https://www.rfc-
       editor.org/rfc/rfc3986.txt), with the exception of a loophole arising
       from RFC 1630 and described in RFC 3986 Sections 5.2.2 and 5.4.2. The
       loophole allows a relative URI to include a scheme if it is the same as
       the scheme of the base URI against which it is resolved. RFC 3986
       recommends avoiding this usage.

COMMANDS
       uri::setQuirkOption option ?value?
              uri::setQuirkOption is an accessor command for a number of
              "quirk options".  The command has the same semantics as the
              command set: when called with one argument it reads an existing
              value; with two arguments it writes a new value.  The value of a
              "quirk option" is boolean: the value false requests conformance
              with RFC 3986, while true requests use of the quirk.  See
              section QUIRK OPTIONS for discussion of the different options
              and their purpose.

       uri::split url ?defaultscheme?
              uri::split takes a url, decodes it and then returns a list of
              key/value pairs suitable for array set containing the
              constituents of the url. If the scheme is missing from the url
              it defaults to the value of defaultscheme if it was specified,
              or http else. Currently the schemes http, https, ftp, mailto,
              news, ldap, ldaps and file are supported by the package itself.
              See section EXTENDING on how to expand that range.

              The set of constituents of a URL (= the set of keys in the
              returned dictionary) is dependent on the scheme of the URL. The
              only key which is therefore always present is scheme. For the
              following schemes the constituents and their keys are known:

              ftp    user, pwd, host, port, path, type, pbare.  The pbare is
                     optional.

              http(s)
                     user, pwd, host, port, path, query, fragment, pbare.  The
                     pbare is optional.

              file   path, host. The host is optional.

              mailto user, host. The host is optional.

              ldap(s)
                     host, port, dn, attrs, scope, filter, extensions

              news   Either message-id or newsgroup-name.

              For discussion of the boolean pbare see options NoInitialSlash
              and NoExtraKeys in QUIRK OPTIONS.

              The constituents are returned as slices of the argument url,
              without removal of percent-encoding ("url-encoding") or other
              adaptations.  Notably, on Windows® the path in scheme file is
              not a valid local filename.  See EXAMPLES for more information.


       uri::join ?key value?...
              uri::join takes a list of key/value pairs (generated by
              uri::split, for example) and returns the canonical URL they
              represent. Currently the schemes http, https, ftp, mailto, news,
              ldap, ldaps and file are supported by the package itself. See
              section EXTENDING on how to expand that range.

              The arguments are expected to be slices of a valid URL, with
              percent-encoding ("url-encoding") and any other necessary
              adaptations.  Notably, on Windows the path in scheme file is not
              a valid local filename.  See EXAMPLES for more information.

       uri::resolve base url
              uri::resolve resolves the specified url relative to base, in
              conformance with RFC 3986. In other words: a non-relative url is
              returned unchanged, whereas for a relative url the missing parts
              are taken from base and prepended to it. The result of this
              operation is returned. For an empty url the result is base,
              without its URI fragment (if any).  The command is available for
              schemes http, https, ftp, and file.

       uri::isrelative url
              uri::isrelative determines whether the specified url is absolute
              or relative.  The command is available for a url of any scheme.

       uri::geturl url ?options...?
              uri::geturl decodes the specified url and then dispatches the
              request to the package appropriate for the scheme found in the
              URL. The command assumes that the package to handle the given
              scheme either has the same name as the scheme itself (including
              possible capitalization) followed by ::geturl, or, in case of
              this failing, has the same name as the scheme itself (including
              possible capitalization). It further assumes that whatever
              package was loaded provides a geturl-command in the namespace of
              the same name as the package itself. This command is called with
              the given url and all given options. Currently geturl does not
              handle any options itself.

              Note: file-URLs are an exception to the rule described above.
              They are handled internally.

              It is not possible to specify results of the command. They
              depend on the geturl-command for the scheme the request was
              dispatched to.

       uri::canonicalize uri
              uri::canonicalize returns the canonical form of a URI.  The
              canonical form of a URI is one where relative path
              specifications, i.e. "." and "..", have been resolved.  The
              command is available for all URI schemes that have uri::split
              and uri::join commands. The command returns a canonicalized URI
              if the URI scheme has a path component (i.e. http, https, ftp,
              and file).  For schemes that have uri::split and uri::join
              commands but no path component (i.e. mailto, news, ldap, and
              ldaps), the command returns the uri unchanged.

       uri::register schemeList script
              uri::register registers the first element of schemeList as a new
              scheme and the remaining elements as aliases for this scheme. It
              creates the namespace for the scheme and executes the script in
              the new namespace. The script has to declare variables
              containing regular expressions relevant to the scheme. At least
              the variable schemepart has to be declared as that one is used
              to extend the variables keeping track of the registered schemes.

SCHEMES
       In addition to the commands mentioned above this package provides
       regular expression to recognize URLs for a number of URL schemes.

       For each supported scheme a namespace of the same name as the scheme
       itself is provided inside of the namespace uri containing the variable
       url whose contents are a regular expression to recognize URLs of that
       scheme. Additional variables may contain regular expressions for parts
       of URLs for that scheme.

       The variable uri::schemes contains a list of all registered schemes.
       Currently these are ftp, ldap, ldaps, file, http, https, gopher,
       mailto, news, wais and prospero.

EXTENDING
       Extending the range of schemes supported by uri::split and uri::join is
       easy because both commands do not handle the request by themselves but
       dispatch it to another command in the uri namespace using the scheme of
       the URL as criterion.

       uri::split and uri::join call Split[string totitle <scheme>] and
       Join[string totitle <scheme>] respectively.

       The provision of split and join commands is sufficient to extend the
       commands uri::canonicalize and uri::geturl (the latter subject to the
       availability of a suitable package with a geturl command).  In
       contrast, to extend the command uri::resolve to a new scheme, the
       command itself must be modified.

       To extend the range of schemes for which pattern information is
       available, use the command uri::register.

       An example of a package that provides both commands and pattern
       information for a new scheme is uri::urn, which adds scheme urn.

QUIRK OPTIONS
       The value of a "quirk option" is boolean: the value false requests
       conformance with RFC 3986, while true requests use of the quirk.  Use
       command uri::setQuirkOption to access the values of quirk options.

       Quirk options are useful both for allowing backwards compatibility when
       a command specification changes, and for adding useful features that
       are not included in RFC specifications.  The following quirk options
       are currently defined:

       NoInitialSlash
              This quirk option concerns the leading character of path (if
              non-empty) in the schemes http, https, and ftp.

              RFC 3986 defines path in an absolute URI to have an initial "/",
              unless the value of path is the empty string. For the scheme
              file, all versions of package uri follow this rule.  The quirk
              option NoInitialSlash does not apply to scheme file.

              For the schemes http, https, and ftp, versions of uri before
              1.2.7 define the path NOT to include an initial "/".  When the
              quirk option NoInitialSlash is true (the default), this behavior
              is also used in version 1.2.7.  To use instead values of path as
              defined by RFC 3986, set this quirk option to false.

              This setting does not affect RFC 3986 conformance.  If
              NoInitialSlash is true, then the value of path in the schemes
              http, https, or ftp, cannot distinguish between URIs in which
              the full "RFC 3986 path" is the empty string "" or a single
              slash "/" respectively.  The missing information is recorded in
              an additional uri::split key pbare.

              The boolean pbare is defined when quirk options NoInitialSlash
              and NoExtraKeys have values true and false respectively.  In
              this case, if the value of path is the empty string "", pbare is
              true if the full "RFC 3986 path" is "", and pbare is false if
              the full "RFC 3986 path" is "/".

              Using this quirk option NoInitialSlash is a matter of
              preference.

       NoExtraKeys
              This quirk option permits full backward compatibility with
              versions of uri before 1.2.7, by omitting the uri::split key
              pbare described above (see quirk option NoInitialSlash).  The
              outcome is greater backward compatibility of the uri::split
              command, but an inability to distinguish between URIs in which
              the full "RFC 3986 path" is the empty string "" or a single
              slash "/" respectively - i.e. a minor non-conformance with RFC
              3986.

              If the quirk option NoExtraKeys is false (the default), command
              uri::split returns an additional key pbare, and the commands
              comply with RFC 3986. If the quirk option NoExtraKeys is true,
              the key pbare is not defined and there is not full conformance
              with RFC 3986.

              Using the quirk option NoExtraKeys is NOT recommended, because
              if set to true it will reduce conformance with RFC 3986.  The
              option is included only for compatibility with code, written for
              earlier versions of uri, that needs values of path without a
              leading "/", AND ALSO cannot tolerate unexpected keys in the
              results of uri::split.

       HostAsDriveLetter
              When handling the scheme file on the Windows platform, versions
              of uri before 1.2.7 use the host field to represent a Windows
              drive letter and the colon that follows it, and the path field
              to represent the filename path after the colon.  Such URIs are
              invalid, and are not recognized by any RFC. When the quirk
              option HostAsDriveLetter is true, this behavior is also used in
              version 1.2.7.  To use file URIs on Windows that conform to RFC
              3986, set this quirk option to false (the default).

              Using this quirk is NOT recommended, because if set to true it
              will cause the uri commands to expect and produce invalid URIs.
              The option is included only for compatibility with legacy code.

       RemoveDoubleSlashes
              When a URI is canonicalized by uri::canonicalize, its path is
              normalized by removal of segments "." and "..".  RFC 3986 does
              not mandate the removal of empty segments "" (i.e. the merger of
              double slashes, which is a feature of filename normalization but
              not of URI path normalization): it treats URIs with excess
              slashes as referring to different resources.  When the quirk
              option RemoveDoubleSlashes is true (the default), empty segments
              will be removed from path.  To prevent removal, and thereby
              conform to RFC 3986, set this quirk option to false.

              Using this quirk is a matter of preference.  A URI with double
              slashes in its path was most likely generated by error,
              certainly so if it has a straightforward mapping to a file on a
              server.  In some cases it may be better to sanitize the URI; in
              others, to keep the URI and let the server handle the possible
              error.

   BACKWARD COMPATIBILITY
       To behave as similarly as possible to versions of uri earlier than
       1.2.7, set the following quirk options:

       •      uri::setQuirkOption NoInitialSlash 1

       •      uri::setQuirkOption NoExtraKeys 1

       •      uri::setQuirkOption HostAsDriveLetter 1

       •      uri::setQuirkOption RemoveDoubleSlashes 0

       In code that can tolerate the return by uri::split of an additional key
       pbare, set

       •      uri::setQuirkOption NoExtraKeys 0

       in order to achieve greater compliance with RFC 3986.

   NEW DESIGNS
       For new projects, the following settings are recommended:

       •      uri::setQuirkOption NoInitialSlash 0

       •      uri::setQuirkOption NoExtraKeys 0

       •      uri::setQuirkOption HostAsDriveLetter 0

       •      uri::setQuirkOption RemoveDoubleSlashes 0|1

   DEFAULT VALUES
       The default values for package uri version 1.2.7 are intended to be a
       compromise between backwards compatibility and improved features.
       Different default values may be chosen in future versions of package
       uri.

       •      uri::setQuirkOption NoInitialSlash 1

       •      uri::setQuirkOption NoExtraKeys 0

       •      uri::setQuirkOption HostAsDriveLetter 0

       •      uri::setQuirkOption RemoveDoubleSlashes 1

EXAMPLES
       A Windows® local filename such as "C:\Other Files\startup.txt" is not
       suitable for use as the path element of a URI in the scheme file.

       The Tcl command file normalize will convert the backslashes to forward
       slashes.  To generate a valid path for the scheme file, the normalized
       filename must be prepended with "/", and then any characters that do
       not match the regexp bracket expression


                  [a-zA-Z0-9$_.+!*'(,)?:@&=-]

       must be percent-encoded.

       The result in this example is "/C:/Other%20Files/startup.txt" which is
       a valid value for path.


              % uri::join path /C:/Other%20Files/startup.txt scheme file

              file:///C:/Other%20Files/startup.txt

              % uri::split file:///C:/Other%20Files/startup.txt

              path /C:/Other%20Files/startup.txt scheme file


       On UNIX® systems filenames begin with "/" which is also used as the
       directory separator.  The only action needed to convert a filename to a
       valid path is percent-encoding.

CREDITS
       Original code (regular expressions) by Andreas Kupries.  Modularisation
       by Steve Ball, also the split/join/resolve functionality. RFC 3986
       conformance by Keith Nash.

BUGS, IDEAS, FEEDBACK
       This document, and the package it describes, will undoubtedly contain
       bugs and other problems.  Please report such in the category uri of the
       Tcllib Trackers [http://core.tcl.tk/tcllib/reportlist].  Please also
       report any ideas for enhancements you may have for either package
       and/or documentation.

       When proposing code changes, please provide unified diffs, i.e the
       output of diff -u.

       Note further that attachments are strongly preferred over inlined
       patches. Attachments can be made by going to the Edit form of the
       ticket immediately after its creation, and then using the left-most
       button in the secondary navigation bar.

KEYWORDS
       fetching information, file, ftp, gopher, http, https, ldap, mailto,
       news, prospero, rfc 1630, rfc 2255, rfc 2396, rfc 3986, uri, url, wais,
       www

CATEGORY
       Networking



tcllib                               1.2.7                           uri(3tcl)