tr






This manual page is part of the POSIX Programmer’s Manual.
The Linux implementation of this interface may differ
(consult the corresponding Linux manual page for details of
Linux behavior), or the interface may not be implemented on
Linux.


tr — translate characters



tr [−c|−C] [−s] string1 string2
tr −s [−c|−C] string1
tr −d [−c|−C] string1
tr −ds [−c|−C] string1 string2

The utility shall copy the standard input to the standard
output with substitution or deletion of selected characters.
The options specified and the and operands shall control
translations that occur while copying characters and single‐
character collating elements.

The utility shall conform to the Base Definitions volume of
POSIX.1‐2008, The following options shall be supported:

−c        Complement the set of values specified by See the
          EXTENDED DESCRIPTION section.

−C        Complement the set of characters specified by See
          the EXTENDED DESCRIPTION section.

−d        Delete all occurrences of input characters that
          are specified by

−s        Replace instances of repeated characters with a
          single character, as described in the EXTENDED
          DESCRIPTION section.

The following operands shall be supported:

string1, string2
          Translation control strings. Each string shall
          represent a set of characters to be converted into
          an array of characters used for the translation.
          For a detailed description of how the strings are
          interpreted, see the EXTENDED DESCRIPTION section.

The standard input can be any type of file.

None.













                             ‐2‐


The following environment variables shall affect the
execution of

LANG      Provide a default value for the
          internationalization variables that are unset or
          null. (See the Base Definitions volume of
          POSIX.1‐2008, for the precedence of
          internationalization variables used to determine
          the values of locale categories.)

LC_ALL    If set to a non‐empty string value, override the
          values of all the other internationalization
          variables.

LC_COLLATE
          Determine the locale for the behavior of range
          expressions and equivalence classes.

LC_CTYPE  Determine the locale for the interpretation of
          sequences of bytes of text data as characters (for
          example, single‐byte as opposed to multi‐byte
          characters in arguments) and the behavior of
          character classes.

LC_MESSAGES
          Determine the locale that should be used to affect
          the format and contents of diagnostic messages
          written to standard error.

NLSPATH   Determine the location of message catalogs for the
          processing of

Default.

The output shall be identical to the input, with the
exception of the specified transformations.

The standard error shall be used only for diagnostic
messages.

None.

The operands and (if specified) define two arrays of
characters. The constructs in the following list can be used
to specify characters or single‐character collating
elements. If any of the constructs result in multi‐character
collating elements, shall exclude, without a diagnostic,
those multi‐character elements from the resulting array.

character Any character not described by one of the
          conventions below shall represent itself.

\octal    Octal sequences can be used to represent
          characters with specific coded values. An octal









                             ‐3‐


          sequence shall consist of a <backslash> followed
          by the longest sequence of one, two, or three‐
          octal‐digit characters (01234567). The sequence
          shall cause the value whose encoding is
          represented by the one, two, or three‐digit octal
          integer to be placed into the array. Multi‐byte
          characters require multiple, concatenated escape
          sequences of this type, including the leading
          <backslash> for each byte.

\character
          The <backslash>‐escape sequences in the Base
          Definitions volume of POSIX.1‐2008, (shall be
          supported. The results of using any other
          character, other than an octal digit, following
          the <backslash> are unspecified. Also, if there is
          no character following the <backslash>, the
          results are unspecified.

cc       In the POSIX locale, this construct shall
          represent the range of collating elements between
          the range endpoints (as long as neither endpoint
          is an octal sequence of the form \octal),
          inclusive, as defined by the collation sequence.
          The characters or collating elements in the range
          shall be placed in the array in ascending
          collation sequence. If the second endpoint
          precedes the starting endpoint in the collation
          sequence, it is unspecified whether the range of
          collating elements is empty, or this construct is
          treated as invalid. In locales other than the
          POSIX locale, this construct has unspecified
          behavior.
                    If either or both of the range endpoints
                    are octal sequences of the form \octal,
                    this shall represent the range of
                    specific coded values between the two
                    range endpoints, inclusive.

[:class:] Represents all characters belonging to the defined
          character class, as defined by the current setting
          of the locale category. The following character
          class names shall be accepted when specified in

          alnum   blank   digit   lower   punct   upper
          alpha   cntrl   graph   print   space   xdigit

                    In addition, character class expressions
                    of the form [:shall be recognized in
                    those locales where the keyword has been
                    given a definition in the category.
                    When both the and options are specified,
                    any of the character class names shall
                    be accepted in Otherwise, only character









                             ‐4‐


                    class names or are valid in and then
                    only if the corresponding character
                    class (and respectively) is specified in
                    the same relative position in Such a
                    specification shall be interpreted as a
                    request for case conversion. When
                    [:appears in and [:appears in the arrays
                    shall contain the characters from the
                    mapping in the category of the current
                    locale. When [:appears in and [:appears
                    in the arrays shall contain the
                    characters from the mapping in the
                    category of the current locale. The
                    first character from each mapping pair
                    shall be in the array for and the second
                    character from each mapping pair shall
                    be in the array for in the same relative
                    position.  Except for case conversion,
                    the characters specified by a character
                    class expression shall be placed in the
                    array in an unspecified order.  If the
                    name specified for does not define a
                    valid character class in the current
                    locale, the behavior is undefined.

[=equiv=] Represents all characters or collating elements
          belonging to the same equivalence class as as
          defined by the current setting of the locale
          category. An equivalence class expression shall be
          allowed only in or in when it is being used by the
          combined and options. The characters belonging to
          the equivalence class shall be placed in the array
          in an unspecified order.

[x*n]     Represents repeated occurrences of the character
          Because this expression is used to map multiple
          characters to one, it is only valid when it occurs
          in If is omitted or is zero, it shall be
          interpreted as large enough to extend the sequence
          to the length of the sequence. If has a leading
          zero, it shall be interpreted as an octal value.
          Otherwise, it shall be interpreted as a decimal
          value.  When the option is not specified:

 *  If is present, each input character found in the array
    specified by shall be replaced by the character in the
    same relative position in the array specified by If the
    array specified by is shorter that the one specified by
    or if a character occurs more than once in the results
    are unspecified.

 *  If the option is specified, the complements of the
    characters specified by (the set of all characters in
    the current character set, as defined by the current









                             ‐5‐


    setting of except for those actually specified in the
    operand) shall be placed in the array in ascending
    collation sequence, as defined by the current setting of

 *  If the option is specified, the complement of the values
    specified by shall be placed in the array in ascending
    order by binary value.

 *  Because the order in which characters specified by
    character class expressions or equivalence class
    expressions is undefined, such expressions should only
    be used if the intent is to map several characters into
    one. An exception is case conversion, as described
    previously.  When the option is specified:

 *  Input characters found in the array specified by shall
    be deleted.

 *  When the option is specified with all characters except
    those specified by shall be deleted. The contents of are
    ignored, unless the option is also specified.

 *  When the option is specified with all values except
    those specified by shall be deleted. The contents of
    shall be ignored, unless the option is also specified.

 *  The same string cannot be used for both the and the
    option; when both options are specified, both (used for
    deletion) and (used for squeezing) shall be required.
    When the option is specified, after any deletions or
    translations have taken place, repeated sequences of the
    same character shall be replaced by one occurrence of
    the same character, if the character is found in the
    array specified by the last operand. If the last operand
    contains a character class, such as the following
    example:

        tr −s ’[:space:]’
    the last operand’s array shall contain all of the
    characters in that character class. However, in a case
    conversion, as described previously, such as:

        tr −s ’[:upper:]’ ’[:lower:]’
    the last operand’s array shall contain only those
    characters defined as the second characters in each of
    the or character pairs, as appropriate.  An empty string
    used for or produces undefined results.

The following exit values shall be returned:

 0    All input was processed successfully.

>0    An error occurred.










                             ‐6‐


Default.


If necessary, and can be quoted to avoid pattern matching by
the shell.  If an ordinary digit (representing itself) is to
follow an octal sequence, the octal sequence must use the
full three digits to avoid ambiguity.  When is shorter than
a difference results between historical System V and BSD
systems. A BSD system pads with the last character found in
Thus, it is possible to do the following:

     tr 0123456789 d
which would translate all digits to the letter Since this
area is specifically unspecified in this volume of
POSIX.1‐2008, both the BSD and System V behaviors are
allowed, but a conforming application cannot rely on the BSD
behavior. It would have to code the example in the following
way:

     tr 0123456789 ’[d*]’
It should be noted that, despite similarities in appearance,
the string operands used by are not regular expressions.
Unlike some historical implementations, this definition of
the utility correctly processes NUL characters in its input
stream. NUL characters can be stripped by using:

     tr −d ’\000’



 1. The following example creates a list of all words in one
    per line in where a word is taken to be a maximal string
    of letters.

             tr −cs "[:alpha:]" "[\n*]" <file1 >file2

 2. The next example translates all lowercase characters in
    to uppercase and writes the results to standard output.

             tr "[:lower:]" "[:upper:]" <file1

 3. This example uses an equivalence class to identify
    accented variants of the base character in which are
    stripped of diacritical marks and written to

             tr "[=e=]" "[e*]" <file1 >file2

In some early proposals, an explicit option was added to
disable the historical behavior of stripping NUL characters
from the input. It was considered that automatically
stripping NUL characters from the input was not correct
functionality.  However, the removal of in a later proposal
does not remove the requirement that correctly process NUL
characters in its input stream. NUL characters can be









                             ‐7‐


stripped by using ’\000’.  Historical implementations of
differ widely in syntax and behavior. For example, the BSD
version has not needed the bracket characters for the
repetition sequence. The utility syntax is based more
closely on the System V and XPG3 model while attempting to
accommodate historical BSD implementations. In the case of
the short padding, the decision was to unspecify the
behavior and preserve System V and XPG3 scripts, which might
find difficulty with the BSD method.  The assumption was
made that BSD users of have to make accommodations to meet
the syntax defined here. Since it is possible to use the
repetition sequence to duplicate the desired behavior,
whereas there is no simple way to achieve the System V
method, this was the correct, if not desirable, approach.
The use of octal values to specify control characters, while
having historical precedents, is not portable. The
introduction of escape sequences for control characters
should provide the necessary portability. It is recognized
that this may cause some historical scripts to break.  An
early proposal included support for multi‐character
collating elements.  It was pointed out that, while does
employ some syntactical elements from REs, the aim of is
quite different; ranges, for example, do not have a similar
meaning (‘‘any of the chars in the range matches’’, versus
‘‘translate each character in the range to the output
counterpart’’). As a result, the previously included support
for multi‐character collating elements has been removed.
What remains are ranges in current collation order (to
support, for example, accented characters), character
classes, and equivalence classes.  In XPG3 the [:and
[=conventions are shown with double brackets, as in RE
syntax. However, does not implement RE principles; it just
borrows part of the syntax.  Consequently, [:and [=should be
regarded as syntactical elements on a par with [which is not
an RE bracket expression.  The standard developers will
consider changes to that allow it to translate characters
between different character encodings, or they will consider
providing a new utility to accomplish this.  On historical
System V systems, a range expression requires enclosing
square‐brackets, such as:

    tr ’[a‐z]’ ’[A‐Z]’
However, BSD‐based systems did not require the brackets, and
this convention is used here to avoid breaking large numbers
of BSD scripts:

    tr a‐z A‐Z
The preceding System V script will continue to work because
the brackets, treated as regular characters, are translated
to themselves.  However, any System V script that relied on
representing the three characters and have to be rewritten
as The ISO POSIX‐2:1993 standard had a option that behaved
similarly to the option, but did not supply functionality
equivalent to the option specified in POSIX.1‐2008. This









                             ‐8‐


meant that historical practice of being able to specify
(which would delete all bytes with the top bit set) would
have no effect because, in the C locale, bytes with the
values octal 200 to octal 377 are not characters.  The
earlier version also said that octal sequences referred to
collating elements and could be placed adjacent to each
other to specify multi‐byte characters. However, it was
noted that this caused ambiguities because would not be able
to tell whether adjacent octal sequences were intending to
specify multi‐byte characters or multiple single byte
characters. POSIX.1‐2008 specifies that octal sequences
always refer to single byte binary values when used to
specify an endpoint of a range of collating elements.
Earlier versions of this standard allowed for
implementations with bytes other than eight bits, but this
has been modified in this version.

None.

The Base Definitions volume of POSIX.1‐2008,

Portions of this text are reprinted and reproduced in
electronic form from IEEE Std 1003.1, 2013 Edition, Standard
for Information Technology ‐‐ Portable Operating System
Interface (POSIX), The Open Group Base Specifications Issue
7, Copyright (C) 2013 by the Institute of Electrical and
Electronics Engineers, Inc and The Open Group.  (This is
POSIX.1‐2008 with the 2013 Technical Corrigendum 1 applied.)
In the event of any discrepancy between this version and the
original IEEE and The Open Group Standard, the original IEEE
and The Open Group Standard is the referee document. The
original Standard can be obtained online at
http://www.unix.org/online.html .

Any typographical or formatting errors that appear in this
page are most likely to have been introduced during the
conversion of the source files to man page format. To report
such errors, see https://www.kernel.org/doc/man‐
pages/reporting_bugs.html .