KONWERT(1)                     Linux User's Manual                    KONWERT(1)

       konwert - interface for various character encoding conversions

       konwert FILTER [FILE]... [-o DEST | -O]

       Konwert allows filtering multiple files through multiple filters.  It
       filters the specified FILEs, or stdin if none are given.

       Simple FILTER is the name of an executable file from the directory
       ~/.konwert/filters or the system-wide one, normally
       /usr/share/konwert/filters.  Such program itself filters stdin to stdout.

       The filtering rule can be more complex:

       konwert FILTER1+FILTER2 means konwert FILTER1 | konwert FILTER2.

       konwert FORMAT1-FORMAT2, unless such filter exists, tries to find a
       common FORMAT3, such that both filters FORMAT1-FORMAT3 and
       FORMAT3-FORMAT1 do exist.

       konwert FILTER/ARG/... passes arguments to the filter. Arguments can also
       be specified here: FORMAT1/ARGS-FORMAT2.  The meaning of arguments
       depends on the particular filter.

       konwert '(COMMAND ARGS...)' executes this arbitrary shell command. This
       is useful with -o or -O options. The command cannot contain the string
       )+, which will terminate this filter's specification.

       -o DEST   output goes to this file/directory instead of stdout

       -O        every input file is replaced with its translation

       --help    display help and exit

       --version output version information and exit

       Redirecting output to one of the source files with either -o or > instead
       of -O will corrupt it! Option -O creates a temporary file in /tmp and
       later copies it back onto the source.

       You can convert text between any two charsets, for example konwert

       Characters unavailable in the target charset will be substituted with
       approximations with available ones. The approximations need not be single

       The following character sets are currently supported:

       ascii  7bit ASCII

       utf8 = unicode  Unicode UTF-8

       iso1 = isolatin1
              ISO-8859-1 aka ISO Latin 1 (Western European)
       iso2 = isolatin2
              ISO-8859-2 aka ISO Latin 2 (Central European)
       iso3 = isolatin3
              ISO-8859-3 aka ISO Latin 3 (Esperanto)
       iso4 = isolatin4
              ISO-8859-4 aka ISO Latin 4 (Baltic)
       iso5 = isolatincyr
              ISO-8859-5 (Cyrillic)
       iso6 = isolatinarabic
              ISO-8859-6 (Arabic)
       iso7 = isolatingreek
              ISO-8859-7 (Greek)
       iso8 = isolatinhebrew
              ISO-8859-8 (Hebrew)
       iso9 = isolatin5 = isolatintur
              ISO-8859-9 aka ISO Latin 5 (Turkish)
       iso10 = isolatin6 = isolatinnordic
              ISO-8859-10 aka ISO Latin 6 (Nordic)
       iso12 = isolatin7 = isolatinceltic
              ISO-8859-12 aka ISO Latin 6 (Celtic) - Draft
       iso13 = isolatin8 = isolatinbaltic
              ISO-8859-13 aka ISO Latin 6 (Baltic) - Draft
       iso14 = isolatin9 = isolatinsami
              ISO-8859-14 aka ISO Latin 6 (Sámi) - Draft
       iso15  ISO-8859-15 - Draft

       koi8r    KOI8-R (Russian)
       koi8u    KOI8-U (Ukrainian, Byelorussian)
       koi8uni  KOI8-Uni (Cyrillic)

       cp1250 = wince = winlatin2    Windows CP-1250 aka Win Latin 2 (Central
       cp1251 = wincyr               Windows CP-1251 (Cyrillic)
       cp1252 = winwest = winlatin1  Windows CP-1252 aka Win Latin 1 (Western
       cp1253 = wingr                Windows CP-1253 (Greek)
       cp1254 = wintur               Windows CP-1254 (Turkish)
       cp1255 = winhebrew            Windows CP-1255 (Hebrew)
       cp1256 = winarabic            Windows CP-1256 (Arabic)
       cp1257 = winbaltic            Windows CP-1257 (Baltic)
       cp1258 = winviet              Windows CP-1258 (Vietnamese)

       cp437 = icmeng               DOS CP-437 (English)
       cp737 = dosgreek             DOS CP-737 (Greek)
       cp775 = dosbaltic            DOS CP-775 (Baltic)
       cp850 = doswest = doslatin1  DOS CP-850 aka DOS Latin 1 (Western
       cp852 = dosce = doslatin2    DOS CP-852 aka DOS Latin 2 (Central
       cp855 = doscyr               DOS CP-855 (Cyrillic)
       cp857 = dostur               DOS CP-857 (Turkish)
       cp860 = dosportugal          DOS CP-860 (Portugal)
       cp861 = dosiceland           DOS CP-861 (Icelandic)
       cp862 = doshebrew            DOS CP-862 (Hebrew)
       cp863 = doscanadfr           DOS CP-863 (Canadian French)
       cp864 = dosarabic            DOS CP-864 (Arabic)
       cp865 = dosnordic            DOS CP-865 (Nordic)
       cp866 = dosrussian           DOS CP-866 (Russian)
       cp869 = dosgreek2            DOS CP-869 (Greek2)
       cp874 = dosthai              DOS CP-874 (Thai)

       mac         Macintosh Roman (Western European)
       macce       Macintosh Central European
       maccyr      Macintosh Cyrillic
       macgreek    Macintosh Greek
       maciceland  Macintosh Icelandic
       mactur      Macintosh Turkish

       microvex     DOS charsets for Polish

       xjp      Amiga charsets for Polish

       kamenicky  DOS charset for Czech and Slovak

       wingreek  WinGreek (Windows font-based encoding for ancient Greek)

       babelpl  TeX [polish]{babel}: "a"c"e"l"n"o"s"z"r
       ciachy   TeX \prefixing: /a/c/e/l/n/o/s/x/z

       xmetodo        Esperanto: cx gx hx jx sx ux (vx w)
       hmetodo        Esperanto: ch gh hh jh sh u
       antauxcxap     Esperanto: ^c ^g ^h ^j ^s ^u (~u)
       postcxap       Esperanto: c^ g^ h^ j^ s^ u^ (u~)
       apostrofoj     Esperanto: c' g' h' j' s' u'
       malapostrofoj  Esperanto: c` g` h` j` s` u`

       viscii  VISCII (Vietnamese)
       viqri   Vietnamese Quoted Readable Implicit

       htmldec  SGML/HTML character references (decimal): Æ ě →
       htmlhex  SGML/HTML character references (hexadecimal): Æ ě
       htmlent  SGML/HTML character entities (names): Æ &ecaron →
       html     All three above (only as input format)

       tex    TeX with some LaTeX or AMS-TeX extensions. There is no distinction
              between normal and math mode - you will probably have to insert
              some $'s manually.

       mnemonic   RFC 1345 mnemonics preceded by &
       mnemonic1  RFC 1345 mnemonics preceded by `

       any/LANGUAGE (e.g. any/pl-iso2)
              This special input format will detect the encoding automatically,
              basing on the frequencies of characters found in text. Every
              language is associated with a set of possible encodings used for
              it and average frequencies of its letters (excluding ASCII
              letters). The best fitting encoding is used for conversion.
              Currently supported languages are cs (Czech), de (German), el
              (Greek), eo (Esperanto), es (Spanish), fr (French), he (Hebrew),
              it (Italian), pl (Polish), pt (Portuguese), ru (Russian), and sv

       varpl  Mixed Polish ISO-8859-2, CP-1250, and UTF-8. If you are reading
              Polish newsgroups I suggest putting it as a filter in your
              newsreader (for speed improvement it's better to call it directly,
              rather than through konwert).

       vareo  Mixed various Esperanto encodings.

       /1 (e.g. konwert iso2-ascii/1)
              Each unavailable character will be replaced only with a single
              approximate char, not string. This is useful with the filterm
              program or with preformatted text. This option is automatically
              turned on when a filter is used as output for filterm.

       /html  Text is assumed to be HTML. The characters " & < > resulting from
              other characters' approximations will be properly escaped as
              &quot; &amp; &lt; &gt;.  The <META http-equiv="content-type"
              content="text/html; charset=..."> header will be fixed if present.

              Convert META as above. Unavailable characters will be encoded in

              Convert META as above. Unavailable characters will be encoded in
              hexadecimal &#xUnicode;.

       /tex   Unavailable characters will be described in TeX. Characters # $ %
              & \ ^ _ { | } ~ resulting from some characters' approximations
              will be properly escaped into \# \$ \% \& $\backslash$ \^{} \_ \{
              $|$ \} \~{}.

              Recognizes some ASCII representations of characters, e.g. (c) ...
              1/2 >=.

              Russian text will be replaced with its Polish phonetic

       Some output filters can use the language information for choosing better
       approximations of unavailable letters, for example /de (German): ä ae
       instead of a.

              Detects the encoding, but instead of text conversion only shows
              the encoding's name. The additional option /all shows all possible
              encodings, sorted from better to worse ones.

       crlf   Force specific end-of-line marker convention.  cr = Macintosh, lf
              = Unix and Amiga, crlf = Windows and DOS.  The input convention is
              detected automatically.

       expand Expands tabs into spaces (uses the textutils program expand).

              Compresses spaces into tabs (uses the textutils program unexpand).

              Removes spaces and tabs at end of line.

              MIME Quoted Printable encoding: =A3=F3d=BC.

              Rich Text Format: \'a3\'f3d\'9f.

              Escapes " & < > into SGML/HTML entities &quot; &amp; &lt; &gt;.
              Useful for including a text file inside HTML <PRE> </PRE> tags.


       rot13  Guvf vf n qrzbafgengvba bs ebg13.

              Self-explanatory. Currently ASCII only.

       prn7pl Converts polish chars to control sequences for EPSON-compatible
              printer. Using only 7-bit chars, backspacing printer's head and
              vertical positioning chars ,.'` it creates pseudo-polish glyphs.
              You can specify options: /nlq (default) which optimizes output for
              better quality printers and /draft - useful for ex. for 9-nails


       trs(1), filterm(1)

       APPLE character in mac* charsets, and CH and ch characters in koi8cs are
       not preserved in conversion even when they are available. Also they don't
       respect the /1 option. Reason: they are not in Unicode.

       Konwert is a package for conversion between various character encodings.

       Copyright (c) 1998 Marcin 'Qrczak' Kowalczyk

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       Public License for more details.

       You should have received a copy of the GNU General Public License along
       with this program; if not, write to the Free Software Foundation, Inc.,
       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

        __("<   Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.home.ml.org/
        \__/       GCS/M d- s+:-- a21 C+++>+++$ UL++>++++$ P+++ L++>++++$ E->++
         ^^                W++ N+++ o? K? w(---) O? M- V? PS-- PE++ Y? PGP->+ t
       QRCZAK                  5? X- R tv-- b+>++ DI D- G+ e>++++ h! r--%>++ y-

Konwert                            30 Jul 1998                        KONWERT(1)