konwert

KONWERT(1)                    Linux User's Manual                   KONWERT(1)



NAME
       konwert - interface for various character encoding conversions

SYNOPSIS
       konwert FILTER [FILE]... [-o DEST | -O]

DESCRIPTION
       Konwert allows filtering multiple files through multiple filters.  It
       filters the specified FILEs, or stdin if none are given.

       Simple FILTER is the name of an executable file from the directory
       ~/.konwert/filters or the system-wide one, normally
       /usr/share/konwert/filters.  Such program itself filters stdin to
       stdout.

       The filtering rule can be more complex:

       konwert FILTER1+FILTER2 means konwert FILTER1 | konwert FILTER2.

       konwert FORMAT1-FORMAT2, unless such filter exists, tries to find a
       common FORMAT3, such that both filters FORMAT1-FORMAT3 and
       FORMAT3-FORMAT1 do exist.

       konwert FILTER/ARG/... passes arguments to the filter. Arguments can
       also be specified here: FORMAT1/ARGS-FORMAT2.  The meaning of arguments
       depends on the particular filter.

       konwert '(COMMAND ARGS...)' executes this arbitrary shell command. This
       is useful with -o or -O options. The command cannot contain the string
       )+, which will terminate this filter's specification.

   OPTIONS
       -o DEST   output goes to this file/directory instead of stdout

       -O        every input file is replaced with its translation

       --help    display help and exit

       --version output version information and exit

       Redirecting output to one of the source files with either -o or >
       instead of -O will corrupt it! Option -O creates a temporary file in
       /tmp and later copies it back onto the source.

CHARACTER ENCODING CONVERSIONS
       You can convert text between any two charsets, for example konwert
       cp437-iso2.

       Characters unavailable in the target charset will be substituted with
       approximations with available ones. The approximations need not be
       single characters.

       The following character sets are currently supported:

       ascii  7bit ASCII

       utf8 = unicode  Unicode UTF-8

       iso1 = isolatin1
              ISO-8859-1 aka ISO Latin 1 (Western European)
       iso2 = isolatin2
              ISO-8859-2 aka ISO Latin 2 (Central European)
       iso3 = isolatin3
              ISO-8859-3 aka ISO Latin 3 (Esperanto)
       iso4 = isolatin4
              ISO-8859-4 aka ISO Latin 4 (Baltic)
       iso5 = isolatincyr
              ISO-8859-5 (Cyrillic)
       iso6 = isolatinarabic
              ISO-8859-6 (Arabic)
       iso7 = isolatingreek
              ISO-8859-7 (Greek)
       iso8 = isolatinhebrew
              ISO-8859-8 (Hebrew)
       iso9 = isolatin5 = isolatintur
              ISO-8859-9 aka ISO Latin 5 (Turkish)
       iso10 = isolatin6 = isolatinnordic
              ISO-8859-10 aka ISO Latin 6 (Nordic)
       iso12 = isolatin7 = isolatinceltic
              ISO-8859-12 aka ISO Latin 6 (Celtic) - Draft
       iso13 = isolatin8 = isolatinbaltic
              ISO-8859-13 aka ISO Latin 6 (Baltic) - Draft
       iso14 = isolatin9 = isolatinsami
              ISO-8859-14 aka ISO Latin 6 (Sámi) - Draft
       iso15  ISO-8859-15 - Draft

       koi8r    KOI8-R (Russian)
       koi8u    KOI8-U (Ukrainian, Byelorussian)
       koi8uni  KOI8-Uni (Cyrillic)

       cp1250 = wince = winlatin2    Windows CP-1250 aka Win Latin 2 (Central
                                     European)
       cp1251 = wincyr               Windows CP-1251 (Cyrillic)
       cp1252 = winwest = winlatin1  Windows CP-1252 aka Win Latin 1 (Western
                                     European)
       cp1253 = wingr                Windows CP-1253 (Greek)
       cp1254 = wintur               Windows CP-1254 (Turkish)
       cp1255 = winhebrew            Windows CP-1255 (Hebrew)
       cp1256 = winarabic            Windows CP-1256 (Arabic)
       cp1257 = winbaltic            Windows CP-1257 (Baltic)
       cp1258 = winviet              Windows CP-1258 (Vietnamese)

       cp437 = icmeng               DOS CP-437 (English)
       cp737 = dosgreek             DOS CP-737 (Greek)
       cp775 = dosbaltic            DOS CP-775 (Baltic)
       cp850 = doswest = doslatin1  DOS CP-850 aka DOS Latin 1 (Western
                                    European)
       cp852 = dosce = doslatin2    DOS CP-852 aka DOS Latin 2 (Central
                                    European)
       cp855 = doscyr               DOS CP-855 (Cyrillic)
       cp857 = dostur               DOS CP-857 (Turkish)
       cp860 = dosportugal          DOS CP-860 (Portugal)
       cp861 = dosiceland           DOS CP-861 (Icelandic)
       cp862 = doshebrew            DOS CP-862 (Hebrew)
       cp863 = doscanadfr           DOS CP-863 (Canadian French)
       cp864 = dosarabic            DOS CP-864 (Arabic)
       cp865 = dosnordic            DOS CP-865 (Nordic)
       cp866 = dosrussian           DOS CP-866 (Russian)
       cp869 = dosgreek2            DOS CP-869 (Greek2)
       cp874 = dosthai              DOS CP-874 (Thai)

       mac         Macintosh Roman (Western European)
       macce       Macintosh Central European
       maccyr      Macintosh Cyrillic
       macgreek    Macintosh Greek
       maciceland  Macintosh Icelandic
       mactur      Macintosh Turkish

       csk,
       cyfromat,
       dhn,
       fidomazovia,
       iea,
       logic,
       mazovia,
       microvex     DOS charsets for Polish

       amigapl,
       fat,
       xjp      Amiga charsets for Polish

       kamenicky  DOS charset for Czech and Slovak

       wingreek  WinGreek (Windows font-based encoding for ancient Greek)

       babelpl  TeX [polish]{babel}: "a"c"e"l"n"o"s"z"r
       ciachy   TeX \prefixing: /a/c/e/l/n/o/s/x/z

       xmetodo        Esperanto: cx gx hx jx sx ux (vx w)
       hmetodo        Esperanto: ch gh hh jh sh u
       antauxcxap     Esperanto: ^c ^g ^h ^j ^s ^u (~u)
       postcxap       Esperanto: c^ g^ h^ j^ s^ u^ (u~)
       apostrofoj     Esperanto: c' g' h' j' s' u'
       malapostrofoj  Esperanto: c` g` h` j` s` u`

       viscii  VISCII (Vietnamese)
       viqri   Vietnamese Quoted Readable Implicit

       htmldec  SGML/HTML character references (decimal): Æ ě
                →
       htmlhex  SGML/HTML character references (hexadecimal): Æ ě
                →
       htmlent  SGML/HTML character entities (names): Æ &ecaron →
       html     All three above (only as input format)

       tex    TeX with some LaTeX or AMS-TeX extensions. There is no
              distinction between normal and math mode - you will probably
              have to insert some $'s manually.

       mnemonic   RFC 1345 mnemonics preceded by &
       mnemonic1  RFC 1345 mnemonics preceded by `

       any/LANGUAGE (e.g. any/pl-iso2)
              This special input format will detect the encoding
              automatically, basing on the frequencies of characters found in
              text. Every language is associated with a set of possible
              encodings used for it and average frequencies of its letters
              (excluding ASCII letters). The best fitting encoding is used for
              conversion. Currently supported languages are cs (Czech), de
              (German), el (Greek), eo (Esperanto), es (Spanish), fr (French),
              he (Hebrew), it (Italian), pl (Polish), pt (Portuguese), ru
              (Russian), and sv (Swedish).

       varpl  Mixed Polish ISO-8859-2, CP-1250, and UTF-8. If you are reading
              Polish newsgroups I suggest putting it as a filter in your
              newsreader (for speed improvement it's better to call it
              directly, rather than through konwert).

       vareo  Mixed various Esperanto encodings.

OPTIONS CONTROLLING THE ABOVE CONVERSIONS
       /1 (e.g. konwert iso2-ascii/1)
              Each unavailable character will be replaced only with a single
              approximate char, not string. This is useful with the filterm
              program or with preformatted text. This option is automatically
              turned on when a filter is used as output for filterm.

       /html  Text is assumed to be HTML. The characters " & < > resulting
              from other characters' approximations will be properly escaped
              as &quot; &amp; &lt; &gt;.  The <META http-equiv="content-type"
              content="text/html; charset=..."> header will be fixed if
              present.

       /htmldec
              Convert META as above. Unavailable characters will be encoded in
              &#Unicode;.

       /htmlhex
              Convert META as above. Unavailable characters will be encoded in
              hexadecimal &#xUnicode;.

       /tex   Unavailable characters will be described in TeX. Characters # $
              % & \ ^ _ { | } ~ resulting from some characters' approximations
              will be properly escaped into \# \$ \% \& $\backslash$ \^{} \_
              \{ $|$ \} \~{}.

       /asciichar
              Recognizes some ASCII representations of characters, e.g. (c)
              ... 1/2 >=.

       /rosyjski
              Russian text will be replaced with its Polish phonetic
              transcription.

       Some output filters can use the language information for choosing
       better approximations of unavailable letters, for example /de (German):
       ä ae instead of a.

OTHER FILTERS
       any/LANGUAGE-test
              Detects the encoding, but instead of text conversion only shows
              the encoding's name. The additional option /all shows all
              possible encodings, sorted from better to worse ones.

       cr
       lf
       crlf   Force specific end-of-line marker convention.  cr = Macintosh,
              lf = Unix and Amiga, crlf = Windows and DOS.  The input
              convention is detected automatically.

       expand Expands tabs into spaces (uses the textutils program expand).

       unexpand
              Compresses spaces into tabs (uses the textutils program
              unexpand).

       rmspacesateol
              Removes spaces and tabs at end of line.

       qp-8bit
       8bit-qp
              MIME Quoted Printable encoding: =A3=F3d=BC.

       rtf-8bit
       8bit-rtf
              Rich Text Format: \'a3\'f3d\'9f.

       txt-htmlchar
              Escapes " & < > into SGML/HTML entities &quot; &amp; &lt; &gt;.
              Useful for including a text file inside HTML <PRE> </PRE> tags.

       htmlchar-txt
              Reverse.

       rot13  Guvf vf n qrzbafgengvba bs ebg13.

       toupper
       tolower
              Self-explanatory. Currently ASCII only.

       prn7pl Converts polish chars to control sequences for EPSON-compatible
              printer. Using only 7-bit chars, backspacing printer's head and
              vertical positioning chars ,.'` it creates pseudo-polish glyphs.
              You can specify options: /nlq (default) which optimizes output
              for better quality printers and /draft - useful for ex. for
              9-nails printer.

FILES
       /usr/share/konwert/filters/*
       ~/.konwert/filters/*

SEE ALSO
       trs(1), filterm(1)

BUGS
       APPLE character in mac* charsets, and CH and ch characters in koi8cs
       are not preserved in conversion even when they are available. Also they
       don't respect the /1 option. Reason: they are not in Unicode.

COPYRIGHT
       Konwert is a package for conversion between various character
       encodings.

       Copyright (c) 1998 Marcin 'Qrczak' Kowalczyk

       This program is free software; you can redistribute it and/or modify it
       under the terms of the GNU General Public License as published by the
       Free Software Foundation; either version 2 of the License, or (at your
       option) any later version.

       This program is distributed in the hope that it will be useful, but
       WITHOUT ANY WARRANTY; without even the implied warranty of
       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
       General Public License for more details.

       You should have received a copy of the GNU General Public License along
       with this program; if not, write to the Free Software Foundation, Inc.,
       59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

AUTHOR
        __("<   Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.home.ml.org/
        \__/       GCS/M d- s+:-- a21 C+++>+++$ UL++>++++$ P+++ L++>++++$ E->++
         ^^                W++ N+++ o? K? w(---) O? M- V? PS-- PE++ Y? PGP->+ t
       QRCZAK                  5? X- R tv-- b+>++ DI D- G+ e>++++ h! r--%>++ y-



Konwert                           30 Jul 1998                       KONWERT(1)