texexpand − expand  \input and \include statements in a TeX

General translation mechanism:

     The main program latex2html calls texexpand with the
document name in order to expand some of its \input and
\include statements, here also called ’merging’, and to
write a list of sensitized style, class, input, or include
file names.  When texexpand has finished, all is contained
in one file, TMP_foo.  (assumed foo.tex is the name of the
document to translate).

     In this version, texexpand cares for following
environments that may span include files / section
boundaries: a) \begin{comment} b) %begin{comment} c)
\begin{any}  introduced with \excludecomment d) %begin{any}
e) \begin{verbatim} f) \begin{latexonly} g)

     e) − g) prevent texexpand from expanding input files,
but the environment content goes fully into the output file.

     Together with each merging of \input etc. there are so‐
called %%%texexpand markers accompanying the boundary.

     When latex2html reads in the output file, it uses these
markers to write each part to a separate file, and process
them further.

     Detailed technical notes:

     1. %begin{latexonly} and %end{latexonly} have to be on
a separate line.  Anything between these tags (including the
tags) is discarded.

     2. \begin{latexonly} and \end{latexonly} have to be on
a separate line.  Anything between these tags (including the
tags) is not expanded.

     3. [%\]begin{"to exclude"} and [%\]end{"to exclude"}
have to be on a separate line.  Anything between these tags
(including the tags) is discarded.

     4. \begin{verbatim/verbatim*} and
\end{verbatim/verbatim*} have to be on a separate line.
Anything between these tags (including the tags) is not

     5. The scope of any such tags may extend over several
files.  The opening tag for latexonly may occur on a
different include level than the closing tag.  The opening
tag for verbatim/"to exclude" must occur within the same
file than the closing tag.


     6. Warnings are printed when the document has been
parsed and open tags remain.

     7. When in a "to exclude"/verbatim environment,
texexpand won’t recognize ANY command except the
corresponding closing tag.  There cannot be any nested
constructions.  This behaviour is identical to that of

     8. \begin{latexonly},\end{latexonly} may be nested,
whereas %begin{latexonly},%end{latexonly} may not be nested.

     9. A "%" tag cannot close a "\" tag, and vice versa.

     10. Every \document(class|style), \usepackage, \input
and \include command has to be on a separate line.

     11. Everything behind a ‘%’ that isn’t preceded by a
‘\’ is regarded as a comment, i.e. it is printed but not

     12. If any command listed in 10. is preceded by an
occurence of ‘\verb’ or ‘\latex’ then it is NOT interpreted.
This crashes on lines like this: blah blah \verb+foo foo+
\input{bar} % bar won’t be loaded!

     13. Packages provided via \usepackage are handled the
same way as ‘options’ in \document(class|style), i.e. they
are included when −auto_exclude is off, the package isn’t in
@dont_include *OR* the package is in @do_include (new). They
are added to the style file together with their options if
the file itself hasn’t been merged.
\documentclass[options]{class} searches for every
option.clo, \documentstyle[options]{style} searches for
every option.sty.  \usepackage[options]{packages} searches
for every package.sty.

     14. Each texinputs directory is searched for input
files/styles. If it ends in ‘//’, the whole subdirectory
tree is searched.

     15. \input / \include merge the given file (if found
under the given name or with .tex extension) if its basename
is in @do_include or if it isn’t in @dont_include or if the
given filename doesn’t end in .sty/.clo/.cls when
−auto_exclude is set.


     Recognizes \documentclass, \documentstyle, \usepackage,
\RequirePackage, \begin{verbatim}...\end{verbatim},
\begin{latexonly}...\end{latexonly}, \input, \include,
\verb, \latex \endinput, \end{document} \includecomment,


\excludecomment \begin{"to exclude"}, \end{"to exclude"}
%begin{"to exclude"}, %end{"to exclude"}

Include and parse a file.  This routine is recursive, see
also &process_input_include_file, &process_document_header,
and &process_package_cmd.

     Two global flags control the states of texexpand.
 o $active is true if we should interprete the lines to
expand files, check for packages, etc.
 o $mute is true if we should prevent the lines from going
into the out file.

     We have three general states of texexpand:
 1) interprete the lines and pass them to the out file This
is the normal case. Corresponding: $active true, $mute false

      2) interprete minimal and suppress them
This is when parsing inside a comment environment, which
also would retain its body from LaTeX. => $active false, $mute true

 3) interprete minimal and pass the lines to the out file
This is inside a verbatim or latexonly environment.
The line of course must be at least interpreted to determine the closing tag.
=> $active false, $mute false

Any environment may extend over several include files.  Any
environement except verbatim and latexonly may have its
opening or closing tag on different input levels.  The
comment and verbatim environments cannot be nested, as is
with LaTeX.  We must at least parse verbatim/comment
environments in latexonly environments, to catch fake
latexonly tags.

     The work scheme: Five functions influence texexpand’s
behavior.  o &process_file opens the given file and parses
the non‐comment part in order to set $active and $mute (see
above).  It calls &interprete to interprete the non‐comment
content and either continues with the next line of its file
or terminates if &interprete detected the \end{document} or
an \endinput.

     o &interprete handles some LaTeX tags with respect to
the three states controlled by $active and $mute.  Regarding
to \input|include, \document(class|style), and
\(use|Require)package the functions
&process_input_include_file, &process_document_header, and
&process_package_cmd are called respectively.

     o These three functions check if the file name or
option files are enabled or disabled for merging (via
TEXE_DO_INCLUDE or TEXE_DONT_INCLUDE).  Any file that is to
include will be ’merged’ into the current file, i.e.  the
function &process_file is called at this place in time


(recursively).  This will stop interpretation at the current
line in file, start with the new file to process and
continues with the next line as soon as the new file is
interpreted to its end.

     The call tree (noweb+xy.sty would be handy here):

 |    |
 |    v
 |  interprete (with respect to the current line, one of that three)
 |    |                           |                        |
 |    v                           v                        v
 |  process_input_include_file  process_document_header  process_package_cmd
 |    |                           |                        |
 |    v                           v                        v

Bugs: o Since the latexonly environment is not parsed, its
contents might introduce environments which are not

     o The closing tag for latexonly is not found if hidden
inside an input file.

     o One environment tag per line, yet!

     o If I would have to design test cases for this beast I
would immediately desintegrate into a logic cloud.


     o Ok, I designed test cases for it.  Please refer to
test ’expand’ of the regression test suite in the
developers’ module of the l2h repository.

     o −unsegment feature: In this (rare) case, the user
wants to translate a segmented document not in segments but
in a whole (for testing, say).  We enable this by
recognizing the \segment command in &interprete, causing the
segment file to be treated like \input but loosing the first
lines prior to \startdocument (incl.), as controlled via
$segmentfile.  On how to segment a document you are best
guided by section ‘‘Document Segmentation’’ of the
LaTeX2HTML manual.

This utility is automatically configured and built to work
on the local setup. If this setup changes (e.g. some of the
external commands are moved), the script has be be


      Based on texexpand by Robert Thau, MIT AI lab, including modifications by
 Franz Vojik <vojik@de.tu‐muenchen.informatik>
 Nikos Drakos <nikos@cbl.leeds.ac.uk>
 Sebastian Rahtz <spqr@uk.ac.tex.ftp>
 Maximilian Ott <max@com.nec.nj.ccrl>
 Martin Boyer
 Herbert Swan
 Jens Lippmann