bibclean

BIBCLEAN(1)                 General Commands Manual                BIBCLEAN(1)



NAME
       bibclean - prettyprint and syntax check BibTeX and Scribe bibliography
       data base files

SYNOPSIS
       bibclean [ -author ] [ -error-log filename ] [ -help ] [ -? ]
                [ -init-file filename ] [ -long-field fieldname ]
                [ -max-width nnn ] [ -[no-]align-equals ]
                [ -[no-]check-values ] [ -[no-]delete-empty-values ]
                [ -[no-]file-position ] [ -[no-]fix-font-changes ]
                [ -[no-]fix-initials ] [ -[no-]fix-names ]
                [ -[no-]German-style ] [ -[no-]keep-linebreaks ]
                [ -[no-]keep-parbreaks ] [ -[no-]keep-preamble-spaces ]
                [ -[no-]keep-spaces ] [ -[no-]keep-string-spaces ]
                [ -[no-]parbreaks ] [ -[no-]prettyprint ]
                [ -[no-]print-patterns ] [ -[no-]read-init-files ]
                [ -[no-]remove-OPT-prefixes ] [ -[no-]scribe ]
                [ -[no-]trace-file-opening ] [ -[no-]warnings ] [ -version ]
                ( <infile | bibfile1 bibfile2 bibfile3 ... ) >outfile

       All options can be abbreviated to a unique leading prefix.

       An explicit file name of ``-'' represents standard input; it is assumed
       if no input files are specified.

DESCRIPTION
       bibclean prettyprints input BibTeX files to stdout, and checks the
       brace balance and bibliography entry syntax as well.  It can be used to
       detect problems in BibTeX files that sometimes confuse even BibTeX
       itself, and importantly, can be used to normalize the appearance of
       collections of BibTeX files.

       Here is a summary of the formatting actions:

       ·  BibTeX items are formatted into a consistent structure with one
          field = "value" pair per line, and the initial @ and trailing right
          brace in column 1.

       ·  Tabs are expanded into blank strings; their use is discouraged
          because they inhibit portability, and can suffer corruption in
          electronic mail.

       ·  Long string values are split at a blank and continued onto the next
          line with leading indentation.

       ·  A single blank line separates adjacent bibliography entries.

       ·  Text outside BibTeX entries is passed through verbatim.

       ·  Outer parentheses around entries are converted to braces.

       ·  Personal names in author and editor field values are normalized to
          the form ``P. D. Q.  Bach'', from ``P.D.Q. Bach'' and ``Bach,
          P.D.Q.''.

       ·  Hyphen sequences in page numbers are converted to en-dashes.

       ·  Month values are converted to standard BibTeX string abbreviations.

       ·  In titles, sequences of upper-case characters at brace level zero
          are braced to protect them from being converted to lower-case
          letters by some bibliography styles.

       ·  CODEN, ISBN (International Standard Book Number) and ISSN
          (International Standard Serial Number) entry values are examined to
          verify the checksums of each listed number, and correct ISBN
          hyphenation is automatically supplied.

       The standardized format of the output of bibclean facilitates the later
       application of simple filters, such as bibcheck(1), bibdup(1),
       bibextract(1), bibindex(1), bibjoin(1), biblabel(1), biblook(1),
       biborder(1), bibsort(1), citefind(1), and citetags(1), to process the
       text, and also is the one expected by the GNU Emacs BibTeX support
       functions.

OPTIONS
       Command-line switches may be abbreviated to a unique leading prefix,
       and letter case is not significant.  All options are parsed before any
       input bibliography files are read, no matter what their order on the
       command line.  Options that correspond to a yes/no setting of a flag
       have a form with a prefix "no-" to set the flag to no.  For such
       options, the last setting determines the flag value used.  This is
       significant when options are also specified in initialization files
       (see the INITIALIZATION FILES manual section).

       The leading hyphen that distinguishes an option from a filename may be
       doubled, for compatibility with GNU and POSIX conventions.  Thus,
       -author and --author are equivalent.

       To avoid confusion with options, if a filename begins with a hyphen, it
       must be disguised by a leading absolute or relative directory path,
       e.g., /tmp/-foo.bib or ./-foo.bib.

       -author   Display an author credit on the standard error unit, stderr,
                 and then exit with a success return code.  Sometimes an
                 executable program is separated from its documentation and
                 source code; this option provides a way to recover from that.

       -error-log filename
                 Redirect stderr to the indicated file, which will then
                 contain all of the error and warning messages.  This option
                 is provided for those systems that have difficulty
                 redirecting stderr.

       -help or -?
                 Display a help message on stderr, giving a usage description,
                 similar to this section of the manual pages, and then exit
                 with a success return code.

       -init-file filename
                 Provide an explicit value pattern initialization file.  It
                 will be processed after any system-wide and job-wide
                 initialization files, and may override them.  It in turn may
                 be overridden by a subsequent file-specific initialization
                 file.  For further details, see the INITIALIZATION FILES
                 manual section.

       -long-field fieldname
                 Suppress warnings that field named fieldname have lenghts
                 exceeding the standard BibTeX limits.  NB! This is a Debian-
                 specific extension!

       -max-width nnn
                 bibclean normally limits output line widths to 72 characters,
                 and in the interests of consistency, that value should not be
                 changed.  Occasionally, special-purpose applications may
                 require different maximum line widths, so this option
                 provides that capability.  The number following the option
                 name can be specified in decimal, octal (starting with 0), or
                 hexadecimal (starting with 0x).  A zero or negative value is
                 interpreted to mean unlimited, so -max-width 0 can be used to
                 ensure that each field/value pair appears on a single line.

                 When -no-prettyprint requests bibclean to act as a lexical
                 analyzer, the default line width is unlimited, unless
                 overridden by this option.

                 When bibclean is prettyprinting, line wrapping will be done
                 only at a space. Consequently, a long non-blank character
                 sequence may result in the output exceeding the requested
                 line width.

                 When bibclean is lexing, line wrapping is done by inserting a
                 backslash-newline pair when the specified maximum is reached,
                 so no line length will ever exceed the maximum.

       -[no-]align-equals
                 With the positive form, align the equals sign in key/value
                 assignments at the same column, separated by a single space
                 from the value string.  Otherwise, the equals sign follows
                 the key, separated by a single space.  Default: no.

       -[no-]check-values
                 With the positive form, apply heuristic pattern matching to
                 field values in order to detect possible errors (e.g., ``year
                 = "192"'' instead of ``year = "1992"''), and issue warnings
                 when unexpected patterns are found.

                 This checking is usually beneficial, but if it produces too
                 many bogus warnings for a particular bibliography file, you
                 can disable it with the negative form of this option.
                 Default: yes.

       -[no-]delete-empty-values
                 With the positive form, remove all field/value pairs for
                 which the value is an empty string.  This is helpful in
                 cleaning up bibliographies generated from text editor
                 templates. Compare this option with -[no-]remove-OPT-prefixes
                 described below.  Default: no.

       -[no-]file-position
                 With the positive form, give detailed file position
                 information in warning and error messages.  Default: no.

       -[no-]fix-font-changes
                 With the positive form, supply an additional brace level
                 around font changes in titles to protect against downcasing
                 by some BibTeX styles.  Font changes that already have more
                 than one level of braces are not modified.

                 For example, if a title contains the Latin phrase {\em
                 Dictyostelium Discoideum} or {\em {D}ictyostelium
                 {D}iscoideum}, then downcasing will incorrectly convert the
                 phrase to lower-case letters.  Most BibTeX users are
                 surprised that bracing the initial letters does not prevent
                 the downcase action.  The correct coding is {{\em
                 Dictyostelium Discoideum}}.  However, there are also
                 legitimate cases where an extra level of bracing wrongly
                 protects from downcasing.  Consequently, bibclean will
                 normally not supply an extra level of braces, but if you have
                 a bibliography where the extra braces are routinely missing,
                 you can use this option to supply them.

                 If you think that you need this option, it is strongly
                 recommended that you apply bibclean to your bibliography file
                 with and without -fix-font-changes, then compare the two
                 output files to ensure that extra braces are not being
                 supplied in titles where they should not be present.  You
                 will have to decide which of the two output files is the
                 better choice, then repair the incorrect title bracing by
                 hand.

                 Since font changes in titles are uncommon, except for cases
                 of the type which this option is designed to correct, it
                 should do more good than harm.  Default: no.

       -[no-]fix-initials
                 With the positive form, insert a space after a period
                 following author initials.  Default: yes.

       -[no-]fix-names
                 With the positive form, reorder author and editor name lists
                 to remove commas at brace level zero, placing first names or
                 initials before last names.  Default: yes.

       -[no-]German-style
                 With the positive form, interpret quote characters ["] inside
                 braced value strings at brace level 1 according to the
                 conventions of the TeX style file german.sty, which overloads
                 quote to simplify input and representation of German umlaut
                 accents, sharp-s (es-zet), ligature separators, invisible
                 hyphens, raised/lowered quotes, French guillemets, and
                 discretionary hyphens.  Recognized character combinations
                 will be braced to prevent BibTeX from interpreting the quote
                 as a string delimiter.

                 Quoted strings receive no special handling from this option,
                 and since German nouns in titles must anyway be protected
                 from the downcasing operation of most BibTeX bibliography
                 styles, German value strings that use the overloaded quote
                 character can always be entered in the form "{...}", without
                 the need to specify this option at all.

                 Default: no.

       -[no-]keep-linebreaks
                 Normally, line breaks inside value strings are collapsed into
                 a single space, so that long value strings can later be
                 broken to provide lines of reasonable length.

                 With the positive form, linebreaks are preserved in value
                 strings.  If -max-width is set to zero, this preserves the
                 original line breaks.  Spacing outside value strings remains
                 under bibclean's control, and is not affected by this option.

                 Default: no.

       -[no-]keep-parbreaks
                 With the positive form, preserve paragraph breaks (either
                 formfeeds, or lines containing only spaces) in value strings.
                 Normally, paragraph breaks are collapsed into a single space.
                 Spacing outside value strings remains under bibclean's
                 control, and is not affected by this option.  Default: no.

       -[no-]keep-preamble-spaces
                 With the positive form, preserve all whitespace in
                 @Preamble{...} entries.  Default: no.

       -[no-]keep-spaces
                 With the positive form, preserve all spaces in value strings.
                 Normally, multiple spaces are collapsed into a single space.
                 This option can be used together with -keep-linebreaks,
                 -keep-parbreaks, and -max-width 0 to preserve the form of
                 value strings while still providing syntax and value
                 checking.  Spacing outside value strings remains under
                 bibclean's control, and is not affected by this option.
                 Default: no.

       -[no-]keep-string-spaces
                 With the positive form, preserve all whitespace in
                 @String{...} entries.  Default: no.

       -[no-]parbreaks
                 With the negative form, a paragraph break (either a formfeed,
                 or a line containing only spaces) is not permitted in value
                 strings, or between field/value pairs.  This may be useful to
                 quickly trap runaway strings arising from mismatched
                 delimiters.  Default: yes.

       -[no-]prettyprint
                 Normally, bibclean functions as a prettyprinter.  However,
                 with the negative form of this option, it acts as a lexical
                 analyzer instead, producing a stream of lexical tokens.  See
                 the LEXICAL ANALYSIS manual section for further details.
                 Default: yes.

       -[no-]print-patterns
                 With the positive form, print the value patterns read from
                 initialization files as they are added to internal tables.
                 Use this option to check newly-added patterns, or to see what
                 patterns are being used.

                 These patterns are the ones that will be used in checking
                 value strings for valid syntax, and all of them are specified
                 in initialization files, rather than hard-coded into the
                 program.  For further details, see the INITIALIZATION FILES
                 manual section.  Default: no.

       -[no-]read-init-files
                 With the negative form, suppress loading of system-, user-,
                 and file-specific initialization files.  Initializations will
                 come only from those files explicitly given by -init-file
                 filename options.  Default: yes.

       -[no-]remove-OPT-prefixes
                 With the positive form, remove the ``OPT'' prefix from each
                 field name where the corresponding value is not an empty
                 string.  The prefix ``OPT'' must be entirely in upper-case to
                 be recognized.

                 This option is for bibliographies generated with the help of
                 the GNU Emacs BibTeX editing support, which generates
                 templates with optional fields identified by the ``OPT''
                 prefix.  Although the function M-x bibtex-remove-OPT normally
                 bound to the keystrokes C-c C-o does the job, users often
                 forget, with the result that BibTeX does not recognize the
                 field name, and ignores the value string.  Compare this
                 option with -[no-]delete-empty-values described above.
                 Default: no.

       -[no-]scribe
                 With the positive form, accept input syntax conforming to the
                 Scribe document system.  The output will be converted to
                 conform to BibTeX syntax.  See the SCRIBE BIBLIOGRAPHY FORMAT
                 manual section for further details.  Default: no.

       -[no-]trace-file-opening
                 With the positive form, record in the error log file the
                 names of all files which bibclean attempts to open.  Use this
                 option to identify where initialization files are located.
                 Default: no.

       -[no-]warnings
                 With the positive form, allow all warning messages.  The
                 negative form is not recommended since it may mask problems
                 that should be repaired.  Default: yes.

       -version  Display the program version number on stderr, and then exit
                 with a success return code.  This will also include an
                 indication of who compiled the program, the host name on
                 which it was compiled, the time of compilation, and the type
                 of string-value matching code selected, when that information
                 is available to the compiler.

ERROR RECOVERY AND WARNINGS
       When bibclean detects an error, it issues an error message to both
       stderr and stdout.  That way, the user is clearly notified, and the
       output bibliography also contains the message at the point of error.

       Error messages begin with a distinctive pair of queries, ??, beginning
       in column 1, followed by the input file name and line number.  If the
       -file-position option was specified, they also contain the input and
       output positions of the current file, entry, and value.  Each position
       includes the file byte number, the line number, and the column number.
       In the event of a runaway string argument, the entry and value
       positions should precisely pinpoint the erroneous bibliography entry,
       and the file positions will indicate where it was detected, which may
       be rather later in the files.

       Warning messages identify possible problems, and are therefore sent
       only to stderr, and not to stdout, so they never appear in the output
       file.  They are identified by a distinctive pair of percents, %%,
       beginning in column 1, and as with error messages, may be followed by
       file position messages if the -file-position option was specified.

       For convenience, the first line of each error and warning message sent
       to stderr is formatted according to the expectations of the GNU Emacs
       next-error command.  You can invoke bibclean with the Emacs M-x
       compile<RET>bibclean filename.bib >filename.new command, then use the
       next-error command, normally bound to C-x ` (that's a grave, or back,
       accent), to move to the location of the error in the input file.

       If error messages are ignored, and left in the output bibliography
       file, they will precipitate an error when the bibliography is next
       processed with BibTeX.

       After issuing an error message, bibclean then resynchronizes its input
       by copying it verbatim to stdout until a new bibliography entry is
       recognized on a line in which the first non-blank character is an at-
       sign (@).  This ensures that nothing is lost from the input file(s),
       allowing corrections to be made in either the input or the output
       files.  However, if bibclean detects an internal error in its data
       structures, it will terminate abruptly without further input or output
       processing; this kind of error should never happen, and if it does, it
       should be reported immediately to the author of the program.  Errors in
       initialization files, and running out of dynamic memory, will also
       immediately terminate bibclean.

INITIALIZATION FILES
       bibclean can be compiled with one of three different types of pattern
       matching; the choice is made by the installer at compile time:

              ·  The original version uses explicit hand-coded tests of value-
                 string syntax.

              ·  The second version uses regular-expression pattern-matching
                 host library routines together with regular-expression
                 patterns that come entirely from initialization files.

              ·  The third version uses special patterns that come entirely
                 from initialization files.

       This Debianized version of bibclean uses the third version.  However,
       command-line options can also be specified in initialization files, no
       matter which pattern matching choice was selected.

       When bibclean starts, it searches for initialization files, using the
       first one of $(HOME)/.bibcleanrc, /usr/share/bibcleanrc, and
       /etc/bibcleanrc that exists.  Afterwards, it reads the first
       .bibcleanrc found in the BIBINPUTS search path.  The name .bibcleanrc
       can be changed at run time through a setting of the environment
       variable BIBCLEANINI.  If the name starts with a dot, it will be
       stripped when looking in /usr/share and /etc.

       Then, when command-line arguments are processed, any additional files
       specified by -init-filefilename options are also processed.  Finally,
       immediately before each named bibliography file is processed, an
       attempt is made to process an initialization file with the same name,
       but with the extension changed to .ini.  The default extension can be
       changed by a setting of the environment variable BIBCLEANEXT.  This
       scheme permits system-wide, user-wide, session-wide, and file-specific
       initialization files to be supported.

       When input is taken from stdin, there is no file-specific
       initialization.

       For precise control, the -no-read-init-files option suppresses all
       initialization files except those explicitly named by -init-
       filefilename options, either on the command line, or in requested
       initialization files.

       Recursive execution of initialization files with nested -init-file
       options is permitted; if the recursion is circular, bibclean will
       finally get a non-fatal initialization file open failure after opening
       too many files.  This terminates further initialization file
       processing.  As the recursion unwinds, the files are all closed, then
       execution proceeds normally.

       An initialization file may contain empty lines, comments from percent
       to end of line (just like TeX), option switches, and field/pattern or
       field/pattern/message assignments.  Leading and trailing spaces are
       ignored.  This is best illustrated by a short example:

       % This is a small bibclean initialization file

       -init-file /u/math/bib/.bibcleanrc %% departmental patterns

       chapter = "\"D\""                 %% 23

       pages   = "\"D--D\""              %% 23--27

       volume  = "\"D \\an\\d D\""       %% 11 and 12

       year    = \
          "\"dddd, dddd, dddd\"" \
          "Multiple years specified."      %% 1989, 1990, 1991

       -no-fix-names   %% do not modify author/editor lists

       Long logical lines can be split into multiple physical lines by
       breaking at a backslash-newline pair; the backslash-newline pair is
       discarded.  This processing happens while characters are being read,
       before any further interpretation of the input stream.

       Each logical line must contain a complete option (and its value, if
       any), or a complete field/pattern pair, or a field/pattern/message
       triple.

       Comments are stripped during the parsing of the field, pattern, and
       message values.  The comment start symbol is not recognized inside
       quoted strings, so it can be freely used in such strings.

       Comments on logical lines that were input as multiple physical lines
       via the backslash-newline convention must appear on the last physical
       line; otherwise, the remaining physical lines will become part of the
       comment.

       Pattern strings must be enclosed in quotation marks; within such
       strings, a backslash starts an escape mechanism that is commonly used
       in UNIX software.  The recognized escape sequences are:

              \a     alarm bell (octal 007)

              \b     backspace (octal 010)

              \f     formfeed (octal 014)

              \n     newline (octal 012)

              \r     carriage return (octal 015)

              \t     horizontal tab (octal 011)

              \v     vertical tab (octal 013)

              \ooo   character number octal ooo (e.g \012 is linefeed).  Up to
                     3 octal digits may be used.

              \0xhh  character number hexadecimal hh (e.g., \0x0a is
                     linefeed).  xhh may be in either letter case.  Any number
                     of hexadecimal digits may be used.

       Backslash followed by any other character produces just that character.
       Thus, \% gets a literal percent into a string (preventing its
       interpretation as a comment), \" produces a quotation mark, and \\
       produces a single backslash.

       An ASCII NUL (\0) in a string will terminate it; this is a feature of
       the C programming language in which bibclean is implemented.

       Field/pattern pairs can be separated by arbitrary space, and
       optionally, either an equals sign or colon functioning as an assignment
       operator.  Thus, the following are equivalent:

       pages="\"D--D\""
       pages:"\"D--D\""
       pages "\"D--D\""
         pages = "\"D--D\""
         pages : "\"D--D\""
       pages   "\"D--D\""

       Each field name can have an arbitrary number of patterns associated
       with it; however, they must be specified in separate field/pattern
       assignments.

       An empty pattern string causes previously-loaded patterns for that
       field name to be forgotten.  This feature permits an initialization
       file to completely discard patterns from earlier initialization files.

       Patterns for value strings are represented in a tiny special-purpose
       language that is both convenient and suitable for bibliography value-
       string syntax checking.  While not as powerful as the language of
       regular-expression patterns, its parsing can be portably implemented in
       less than 3% of the code in a widely-used regular-expression parser
       (the GNU regexp package).

       The patterns are represented by the following special characters:

              <space>  one or more spaces

              a        exactly one letter

              A        one or more letters

              d        exactly one digit

              D        one or more digits

              r        exactly one Roman numeral

              R        one or more Roman numerals (i.e. a Roman number)

              w        exactly one word (one or more letters and digits)

              W        one or more space-separated words, beginning and ending
                       with a word

              .        one `special' character, one of the characters
                       <space>!#()*+,-./:;?[]~, a subset of punctuation
                       characters that are typically used in string values

              :        one or more `special' characters

              X        one or more `special'-separated words, beginning and
                       ending with a word

              \x       exactly one x (x is any character), possibly with an
                       escape sequence interpretation given earlier

              x        exactly the character x (x is anything but one of these
                       pattern characters: aAdDrRwW.:<space>\)

       The X pattern character is very powerful, but generally inadvisable,
       since it will match almost anything likely to be found in a BibTeX
       value string.  The reason for providing pattern matching on the value
       strings is to uncover possible errors, not mask them.

       There is no provision for specifying ranges or repetitions of
       characters, but this can usually be done with separate patterns.  It is
       a good idea to accompany the pattern with a comment showing the kind of
       thing it is expected to match.  Here is a portion of an initialization
       file giving a few of the patterns used to match number value strings:

       number  =       "\"D\""         %% 23
       number  =       "\"A AD\""      %% PN LPS5001
       number  =       "\"A D(D)\""    %% RJ 34(49)
       number  =       "\"A D\""       %% XNSS 288811
       number  =       "\"A D\\.D\""   %% Version 3.20
       number  =       "\"A-A-D-D\""   %% UMIAC-TR-89-11
       number  =       "\"A-A-D\""     %% CS-TR-2189
       number  =       "\"A-A-D\\.D\"" %% CS-TR-21.7

       For a bibliography that contains only article entries, this list should
       probably be reduced to just the first pattern, so that anything other
       than a digit string fails the pattern-match test.  This is easily done
       by keeping bibliography-specific patterns in a corresponding file with
       extension .ini, since that file is read automatically.

       You should be sure to use empty pattern strings in this pattern file to
       discard patterns from earlier initialization files.

       The value strings passed to the pattern matcher contain surrounding
       quotes, so the patterns should also.  However, you could use a pattern
       specification like "\"D" to match an initial digit string followed by
       anything else; the omission of the final quotation mark \" in the
       pattern allows the match to succeed without checking that the next
       character in the value string is a quotation mark.

       Because the value strings are intended to be processed by TeX, the
       pattern matching ignores braces, and TeX control sequences, together
       with any space following those control sequences.  Spaces around braces
       are preserved.  This convention allows the pattern fragment A-AD-D to
       match the value string TN-K\slash 27-70, because the value is
       implicitly collapsed to TN-K27-70 during the matching operation.

       bibclean's normal action when a string value fails to match any of the
       corresponding patterns is to issue a warning message something like
       this: "Unexpected value in ``year = "192"''.  In most cases, that is
       sufficient to alert the user to a problem.  In some cases, however, it
       may be desirable to associate a different message with a particular
       pattern.  This can be done by supplying a message string following the
       pattern string.  Format items %% (single percent), %e (entry name), %f
       (field name), %k (citation key), and %v (string value) are available to
       get current values expanded in the messages.  Here is an example:

       chapter = "\"D:D\"" "Colon found in ``%f = %v''" %% 23:2

       To be consistent with other messages output by bibclean, the message
       string should not end with punctuation.

       If you wish to make the message an error, rather than just a warning,
       begin it with a query (?), like this:

       chapter = "\"D:D\"" "?Colon found in ``%f = %v''" %% 23:2

       The query will not be included in the output message.

       Escape sequences are supported in message strings, just as they are in
       pattern strings.  You can use this to advantage for fancy things, such
       as terminal display mode control.  If you rewrite the previous example
       as

       chapter = "\"D:D\"" \
                 "?\033[7mColon found in ``%f = %v''\033[0m" %% 23:2

       the error message will appear in inverse video on display screens that
       support ANSI terminal control sequences.  Such practice is not normally
       recommended, since it may have undesirable effects on some output
       devices.  Nevertheless, you may find it useful for restricted
       applications.

       For some types of bibliography fields, bibclean contains special-
       purpose code to supplement or replace the pattern matching:

              ·  CODEN, ISBN and ISSN field values are handled this way
                 because their validation requires evaluation of checksums
                 that cannot be expressed by simple patterns; no patterns are
                 even used in these three cases.

              ·  chapter, number, pages, and volume values are checked only by
                 pattern matching.

              ·  month values are first checked against the standard BibTeX
                 month abbreviations, and only if no match is found are
                 patterns then used.

              ·  year values are first checked against patterns, then if no
                 match is found, the year numbers are found and converted to
                 integer values for testing against reasonable bounds.

       Values for other fields are checked only against patterns.  You can
       provide patterns for any field you like, even ones bibclean does not
       already know about.  New ones are simply added to an internal table
       that is searched for each string to be validated.

       The special field, key, represents the bibliographic citation key.  It
       can be given patterns, like any other field.  Here is an initialization
       file pattern assignment that will match an author name, a colon, an
       alphabetic string, and a two-digit year:

       key = "A:Add"                     %% Knuth:TB86

       Notice that no quotation marks are included in the pattern, because the
       citation keys are not quoted.  You can use such patterns to help
       enforce uniform naming conventions for citation keys, which is
       increasingly important as your bibliography data base grows.

LEXICAL ANALYSIS
       When -no-prettyprint is specified, bibclean acts as a lexical analyzer
       instead of a prettyprinter, producing output in lines of the form

              <token-number><tab><token-name><tab>"<token-value>"

       Each output line contains a single complete token, identified by a
       small integer number for use by a computer program, a token type name
       for human readers, and a string value in quotes.

       Special characters in the token value string are represented with
       ANSI/ISO Standard C escape sequences, so all characters other than NUL
       are representable, and multi-line values can be represented in a single
       line.

       Here are the token numbers and token type names that can appear in the
       output when -prettyprint is specified:

               0   UNKNOWN
               1   ABBREV
               2   AT
               3   COMMA
               4   COMMENT
               5   ENTRY
               6   EQUALS
               7   FIELD
               8   INCLUDE
               9   INLINE
              10   KEY
              11   LBRACE
              12   LITERAL
              13   NEWLINE
              14   PREAMBLE
              15   RBRACE
              16   SHARP
              17   SPACE
              18   STRING
              19   VALUE

       Programs that parse such output should also be prepared for lines
       beginning with the warning prefix, %%, or the error prefix, ??, and for
       ANSI/ISO Standard C line number directives of the form
              # line 273 "texbook1.bib"
       which record the line number and file name of the current input file.

       If a -max-width nnn command-line option was specified, long output
       lines will be wrapped at a backslash-newline pair, and consequently,
       software that processes the lexical token stream should be prepared to
       collapse such wrapped lines back into single lines.

       As an example of the use of -no-prettyprint, the UNIX command pipeline
              bibclean -no-prettyprint mylib.bib | \
                  awk '$2 == "KEY" {print $3}' | \
                  sed -e 's/"//g' | \
                  sort
       will extract a sorted list of all citation keys in the file mylib.bib.

       A certain amount of processing will have been done on the tokens.  In
       particular, delimiters equivalent to braces will have been replaced by
       braces, and braced strings will have become quoted strings.

       The LITERAL token type is used for arbitrary text that bibclean does
       not examine further, such as the contents of a @Preamble{...} or a
       @Comment{...}.

       The UNKNOWN token type should never appear in the output stream.  It is
       used internally to initialize token type variables.

SCRIBE BIBLIOGRAPHY FORMAT
       bibclean's support for the Scribe bibliography format is based on the
       syntax description in the Scribe Introductory User's Manual, 3rd
       Edition, May 1980.  Scribe was originally developed by Brian Reid at
       Carnegie-Mellon University, and is now marketed by Unilogic, Ltd.

       The BibTeX bibliography format was strongly influenced by Scribe, and
       indeed, with care, it is possible to share bibliography files between
       the two systems.  Nevertheless, there are some differences, so here is
       a summary of features of the Scribe bibliography file format:

       (1)   Letter case is not significant in field names and entry names,
             but case is preserved in value strings.

       (2)   In field/value pairs, the field and value may be separated by one
             of three characters: =, /, or space.  Space may optionally
             surround these separators.

       (3)   Value delimiters are any of these seven pairs: { }   [ ]   ( )
             < >   ' '   " "   ` `

       (4)   Value delimiters may not be nested, even though with the first
             four delimiter pairs, nested balanced delimiters would be
             unambiguous.

       (5)   Delimiters can be omitted around values that contain only
             letters, digits, sharp (#), ampersand (&), period (.), and
             percent (%).

       (6)   Outside of delimited values, a literal at-sign (@) is represented
             by doubled at-signs (@@).

       (7)   Bibliography entries begin with @name, as for BibTeX, but any of
             the seven Scribe value delimiter pairs may be used to surround
             the values in field/value pairs.  As in (4), nested delimiters
             are forbidden.

       (8)   Arbitrary space may separate entry names from the following
             delimiters.

       (9)   @Comment is a special command whose delimited value is discarded.
             As in (4), nested delimiters are forbidden.

       (10)  The special form

             @Begin{comment}
              ...
             @End{comment}

             permits encapsulating arbitrary text containing any characters or
             delimiters, other than ``@End{comment}''.  Any of the seven
             delimiter pairs may be used around the word ``comment'' following
             the ``@Begin'' or ``@End''; the delimiters in the two cases need
             not be the same, and consequently,
             ``@Begin{comment}''/``@End{comment}'' pairs may not be nested.

       (11)  The key field is required in each bibliography entry.

       (12)  A backslashed quote in a string will be assumed to be a TeX
             accent, and braced appropriately.  While such accents do not
             conform to Scribe syntax, Scribe-format bibliographies have been
             found that appear to be intended for TeX processing.

       Because of this loose syntax, bibclean's normal error detection
       heuristics are less effective, and consequently, Scribe mode input is
       not the default; it must be explicitly requested.

ENVIRONMENT VARIABLES
       BIBCLEANEXT  File extension of bibliography-specific initialization
                    files.  Default: .ini.

       BIBCLEANINI  Name of bibclean initialization files.  Default:
                    .bibcleanrc.

       BIBINPUTS    Search path for bibclean and BibTeX input files.  This is
                    a colon-separated list of directories that are searched in
                    order from first to last.  It is not an error for a
                    specified directory to not exist.

FILES
       *.bib          BibTeX and Scribe bibliography data base files.

       *.ini          File-specific initialization files.

       /usr/share/bibcleanrc, /etc/bibcleanrc
                      System-wide initialization files.

       .bibcleanrc    User-specific initialization files.

SEE ALSO
       bibcheck(1), bibdup(1), bibextract(1), bibindex(1), bibjoin(1),
       biblabel(1), biblex(1), biblook(1), biborder(1), bibparse(1),
       bibsort(1), bibtex(1), bibunlex(1), citefind(1), citesub(1),
       citetags(1), latex(1), scribe(1), tex(1).

AUTHOR
       Nelson H. F. Beebe
       Center for Scientific Computing
       University of Utah
       Department of Mathematics, 322 INSCC
       155 S 1400 E RM 233
       Salt Lake City, UT 84112-0090
       USA
       Tel: +1 801 581 5254
       FAX: +1 801 585 1640, +1 801 581 4148
       Email: beebe@math.utah.edu, beebe@acm.org, beebe@ieee.org (Internet)
       URL: http://www.math.utah.edu/~beebe

       This Debianization of bibclean was done by Henning Makholm
       <henning@makholm.net>, and differs from the upstream source in where it
       looks for the system-wide initialization file (vanilla bibclean expects
       to find it in $PATH), and has also been patched to ignore the built-in
       BibTeX field-length limit for abstract fields.



Version 2.11.4                    09 May 1998                      BIBCLEAN(1)