samtools

samtools(1)                   Bioinformatics tools                   samtools(1)



NAME
       samtools - Utilities for the Sequence Alignment/Map (SAM) format

SYNOPSIS
       samtools view -bt ref_list.txt -o aln.bam aln.sam.gz

       samtools sort -T /tmp/aln.sorted -o aln.sorted.bam aln.bam

       samtools index aln.sorted.bam

       samtools idxstats aln.sorted.bam

       samtools flagstat aln.sorted.bam

       samtools stats aln.sorted.bam

       samtools bedcov aln.sorted.bam

       samtools depth aln.sorted.bam

       samtools view aln.sorted.bam chr2:20,100,000-20,200,000

       samtools merge out.bam in1.bam in2.bam in3.bam

       samtools faidx ref.fasta

       samtools fqidx ref.fastq

       samtools tview aln.sorted.bam ref.fasta

       samtools split merged.bam

       samtools quickcheck in1.bam in2.cram

       samtools dict -a GRCh38 -s "Homo sapiens" ref.fasta

       samtools fixmate in.namesorted.sam out.bam

       samtools mpileup -C50 -f ref.fasta -r chr3:1,000-2,000 in1.bam in2.bam

       samtools flags PAIRED,UNMAP,MUNMAP

       samtools fastq input.bam > output.fastq

       samtools fasta input.bam > output.fasta

       samtools addreplacerg -r 'ID:fish' -r 'LB:1334' -r 'SM:alpha' -o
       output.bam input.bam

       samtools collate -o aln.name_collated.bam aln.sorted.bam

       samtools depad input.bam

       samtools markdup in.algnsorted.bam out.bam


DESCRIPTION
       Samtools is a set of utilities that manipulate alignments in the BAM
       format. It imports from and exports to the SAM (Sequence Alignment/Map)
       format, does sorting, merging and indexing, and allows to retrieve reads
       in any regions swiftly.

       Samtools is designed to work on a stream. It regards an input file `-' as
       the standard input (stdin) and an output file `-' as the standard output
       (stdout). Several commands can thus be combined with Unix pipes. Samtools
       always output warning and error messages to the standard error output
       (stderr).

       Samtools is also able to open a BAM (not SAM) file on a remote FTP or
       HTTP server if the BAM file name starts with `ftp://' or `http://'.
       Samtools checks the current working directory for the index file and will
       download the index upon absence. Samtools does not retrieve the entire
       alignment file unless it is asked to do so.


COMMANDS AND OPTIONS
       view      samtools view [options] in.sam|in.bam|in.cram [region...]

                 With no options or regions specified, prints all alignments in
                 the specified input alignment file (in SAM, BAM, or CRAM
                 format) to standard output in SAM format (with no header).

                 You may specify one or more space-separated region
                 specifications after the input filename to restrict output to
                 only those alignments which overlap the specified region(s).
                 Use of region specifications requires a coordinate-sorted and
                 indexed input file (in BAM or CRAM format).

                 The -b, -C, -1, -u, -h, -H, and -c options change the output
                 format from the default of headerless SAM, and the -o and -U
                 options set the output file name(s).

                 The -t and -T options provide additional reference data. One of
                 these two options is required when SAM input does not contain
                 @SQ headers, and the -T option is required whenever writing
                 CRAM output.

                 The -L, -M, -r, -R, -s, -q, -l, -m, -f, -F, and -G options
                 filter the alignments that will be included in the output to
                 only those alignments that match certain criteria.

                 The -x and -B options modify the data which is contained in
                 each alignment.

                 Finally, the -@ option can be used to allocate additional
                 threads to be used for compression, and the -?  option requests
                 a long help message.


       REGIONS:
                 Regions can be specified as: RNAME[:STARTPOS[-ENDPOS]] and all
                 position coordinates are 1-based.

                 Important note: when multiple regions are given, some
                 alignments may be output multiple times if they overlap more
                 than one of the specified regions.

                 Examples of region specifications:

                 chr1      Output all alignments mapped to the reference
                           sequence named `chr1' (i.e. @SQ SN:chr1).

                 chr2:1000000
                           The region on chr2 beginning at base position
                           1,000,000 and ending at the end of the chromosome.

                 chr3:1000-2000
                           The 1001bp region on chr3 beginning at base position
                           1,000 and ending at base position 2,000 (including
                           both end positions).

                 '*'       Output the unmapped reads at the end of the file.
                           (This does not include any unmapped reads placed on a
                           reference sequence alongside their mapped mates.)

                 .         Output all alignments.  (Mostly unnecessary as not
                           specifying a region at all has the same effect.)

       OPTIONS:

                 -b        Output in the BAM format.

                 -C        Output in the CRAM format (requires -T).

                 -1        Enable fast BAM compression (implies -b).

                 -u        Output uncompressed BAM. This option saves time spent
                           on compression/decompression and is thus preferred
                           when the output is piped to another samtools command.

                 -h        Include the header in the output.

                 -H        Output the header only.

                 -c        Instead of printing the alignments, only count them
                           and print the total number. All filter options, such
                           as -f, -F, and -q, are taken into account.

                 -?        Output long help and exit immediately.

                 -o FILE   Output to FILE [stdout].

                 -U FILE   Write alignments that are not selected by the various
                           filter options to FILE.  When this option is used,
                           all alignments (or all alignments intersecting the
                           regions specified) are written to either the output
                           file or this file, but never both.

                 -t FILE   A tab-delimited FILE.  Each line must contain the
                           reference name in the first column and the length of
                           the reference in the second column, with one line for
                           each distinct reference.  Any additional fields
                           beyond the second column are ignored. This file also
                           defines the order of the reference sequences in
                           sorting. If you run: `samtools faidx <ref.fa>', the
                           resulting index file <ref.fa>.fai can be used as this
                           FILE.

                 -T FILE   A FASTA format reference FILE, optionally compressed
                           by bgzip and ideally indexed by samtools faidx.  If
                           an index is not present, one will be generated for
                           you.

                 -L FILE   Only output alignments overlapping the input BED FILE
                           [null].

                 -M        Use the multi-region iterator on the union of the BED
                           file and command-line region arguments.  This avoids
                           re-reading the same regions of files so can sometimes
                           be much faster.  Note this also removes duplicate
                           sequences.  Without this a sequence that overlaps
                           multiple regions specified on the command line will
                           be reported multiple times.

                 -r STR    Output alignments in read group STR [null].  Note
                           that records with no RG tag will also be output when
                           using this option.  This behaviour may change in a
                           future release.

                 -R FILE   Output alignments in read groups listed in FILE
                           [null].  Note that records with no RG tag will also
                           be output when using this option.  This behaviour may
                           change in a future release.

                 -q INT    Skip alignments with MAPQ smaller than INT [0].

                 -l STR    Only output alignments in library STR [null].

                 -m INT    Only output alignments with number of CIGAR bases
                           consuming query sequence ≥ INT [0]

                 -f INT    Only output alignments with all bits set in INT
                           present in the FLAG field.  INT can be specified in
                           hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or
                           in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -F INT    Do not output alignments with any bits set in INT
                           present in the FLAG field.  INT can be specified in
                           hex by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or
                           in octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -G INT    Do not output alignments with all bits set in INT
                           present in the FLAG field.  This is the opposite of
                           -f such that -f12 -G12 is the same as no filtering at
                           all.  INT can be specified in hex by beginning with
                           `0x' (i.e. /^0x[0-9A-F]+/) or in octal by beginning
                           with `0' (i.e. /^0[0-7]+/) [0].

                 -x STR    Read tag to exclude from output (repeatable) [null]

                 -B        Collapse the backward CIGAR operation.

                 -s FLOAT  Output only a proportion of the input alignments.
                           This subsampling acts in the same way on all of the
                           alignment records in the same template or read pair,
                           so it never keeps a read but not its mate.

                           The integer and fractional parts of the -s INT.FRAC
                           option are used separately: the part after the
                           decimal point sets the fraction of templates/pairs to
                           be kept, while the integer part is used as a seed
                           that influences which subset of reads is kept.

                           When subsampling data that has previously been
                           subsampled, be sure to use a different seed value
                           from those used previously; otherwise more reads will
                           be retained than expected.

                 -@ INT    Number of BAM compression threads to use in addition
                           to main thread [0].

                 -S        Ignored for compatibility with previous samtools
                           versions.  Previously this option was required if
                           input was in SAM format, but now the correct format
                           is automatically detected by examining the first few
                           characters of input.


       sort      samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format]
                 [-n] [-t tag] [-T tmpprefix] [-@ threads]
                 [in.sam|in.bam|in.cram]

                 Sort alignments by leftmost coordinates, or by read name when
                 -n is used.  An appropriate @HD-SO sort order header tag will
                 be added or an existing one updated if necessary.

                 The sorted output is written to standard output by default, or
                 to the specified file (out.bam) when -o is used.  This command
                 will also create temporary files tmpprefix.%d.bam as needed
                 when the entire alignment data cannot fit into memory (as
                 controlled via the -m option).

                 Options:

                 -l INT     Set the desired compression level for the final
                            output file, ranging from 0 (uncompressed) or 1
                            (fastest but minimal compression) to 9 (best
                            compression but slowest to write), similarly to
                            gzip(1)'s compression level setting.

                            If -l is not used, the default compression level
                            will apply.

                 -m INT     Approximately the maximum required memory per
                            thread, specified either in bytes or with a K, M, or
                            G suffix.  [768 MiB]

                            To prevent sort from creating a huge number of
                            temporary files, it enforces a minimum value of 1M
                            for this setting.

                 -n         Sort by read names (i.e., the QNAME field) rather
                            than by chromosomal coordinates.

                 -t TAG     Sort first by the value in the alignment tag TAG,
                            then by position or name (if also using -n).  -o
                            FILE Write the final sorted output to FILE, rather
                            than to standard output.

                 -O FORMAT  Write the final output as sam, bam, or cram.

                            By default, samtools tries to select a format based
                            on the -o filename extension; if output is to
                            standard output or no format can be deduced, bam is
                            selected.

                 -T PREFIX  Write temporary files to PREFIX.nnnn.bam, or if the
                            specified PREFIX is an existing directory, to
                            PREFIX/samtools.mmm.mmm.tmp.nnnn.bam, where mmm is
                            unique to this invocation of the sort command.

                            By default, any temporary files are written
                            alongside the output file, as out.bam.tmp.nnnn.bam,
                            or if output is to standard output, in the current
                            directory as samtools.mmm.mmm.tmp.nnnn.bam.

                 -@ INT     Set number of sorting and compression threads.  By
                            default, operation is single-threaded.

                 Ordering Rules

                 The following rules are used for ordering records.

                 If option -t is in use, records are first sorted by the value
                 of the given alignment tag, and then by position or name (if
                 using -n).  For example, “-t RG” will make read group the
                 primary sort key.  The rules for ordering by tag are:


                 •   Records that do not have the tag are sorted before ones
                     that do.

                 •   If the types of the tags are different, they will be sorted
                     so that single character tags (type A) come before array
                     tags (type B), then string tags (types H and Z), then
                     numeric tags (types f and i).

                 •   Numeric tags (types f and i) are compared by value.  Note
                     that comparisons of floating-point values are subject to
                     issues of rounding and precision.

                 •   String tags (types H and Z) are compared based on the
                     binary contents of the tag using the C strcmp(3) function.

                 •   Character tags (type A) are compared by binary character
                     value.

                 •   No attempt is made to compare tags of other types — notably
                     type B array values will not be compared.

                 When the -n option is present, records are sorted by name.
                 Names are compared so as to give a “natural” ordering — i.e.
                 sections consisting of digits are compared numerically while
                 all other sections are compared based on their binary
                 representation.  This means “a1” will come before “b1” and “a9”
                 will come before “a10”.  Records with the same name will be
                 ordered according to the values of the READ1 and READ2 flags
                 (see flags).

                 When the -n option is not present, reads are sorted by
                 reference (according to the order of the @SQ header records),
                 then by position in the reference, and then by the REVERSE
                 flag.

                 Note


                 Historically samtools sort also accepted a less flexible way of
                 specifying the final and temporary output filenames:

                        samtools sort [-f] [-o] in.bam out.prefix

                 This has now been removed.  The previous out.prefix argument
                 (and -f option, if any) should be changed to an appropriate
                 combination of -T PREFIX and -o FILE.  The previous -o option
                 should be removed, as output defaults to standard output.


       index     samtools index [-bc] [-m INT] aln.bam|aln.cram [out.index]

                 Index a coordinate-sorted BAM or CRAM file for fast random
                 access.  (Note that this does not work with SAM files even if
                 they are bgzip compressed — to index such files, use tabix(1)
                 instead.)

                 This index is needed when region arguments are used to limit
                 samtools view and similar commands to particular regions of
                 interest.

                 If an output filename is given, the index file will be written
                 to out.index.  Otherwise, for a CRAM file aln.cram, index file
                 aln.cram.crai will be created; for a BAM file aln.bam, either
                 aln.bam.bai or aln.bam.csi will be created, depending on the
                 index format selected.

                 Options:

                 -b      Create a BAI index.  This is currently the default when
                         no format options are used.

                 -c      Create a CSI index.  By default, the minimum interval
                         size for the index is 2^14, which is the same as the
                         fixed value used by the BAI format.

                 -m INT  Create a CSI index, with a minimum interval size of
                         2^INT.


       idxstats  samtools idxstats in.sam|in.bam|in.cram

                 Retrieve and print stats in the index file corresponding to the
                 input file.  Before calling idxstats, the input BAM file should
                 be indexed by samtools index.

                 If run on a SAM or CRAM file or an unindexed BAM file, this
                 command will still produce the same summary statistics, but
                 does so by reading through the entire file.  This is far slower
                 than using the BAM indices.

                 The output is TAB-delimited with each line consisting of
                 reference sequence name, sequence length, # mapped reads and #
                 unmapped reads. It is written to stdout.


       flagstat  samtools flagstat in.sam|in.bam|in.cram

                 Does a full pass through the input file to calculate and print
                 statistics to stdout.

                 Provides counts for each of 13 categories based primarily on
                 bit flags in the FLAG field. Each category in the output is
                 broken down into QC pass and QC fail, which is presented as
                 "#PASS + #FAIL" followed by a description of the category.

                 The first row of output gives the total number of reads that
                 are QC pass and fail (according to flag bit 0x200). For
                 example:

                   122 + 28 in total (QC-passed reads + QC-failed reads)

                 Which would indicate that there are a total of 150 reads in the
                 input file, 122 of which are marked as QC pass and 28 of which
                 are marked as "not passing quality controls"

                 Following this, additional categories are given for reads which
                 are:


                         secondary
                                0x100 bit set

                         supplementary
                                0x800 bit set

                         duplicates
                                0x400 bit set

                         mapped 0x4 bit not set

                         paired in sequencing
                                0x1 bit set

                         read1  both 0x1 and 0x40 bits set

                         read2  both 0x1 and 0x80 bits set

                         properly paired
                                both 0x1 and 0x2 bits set and 0x4 bit not set

                         with itself and mate mapped
                                0x1 bit set and neither 0x4 nor 0x8 bits set

                         singletons
                                both 0x1 and 0x8 bits set and bit 0x4 not set

                 And finally, two rows are given that additionally filter on the
                 reference name (RNAME), mate reference name (MRNM), and mapping
                 quality (MAPQ) fields:


                         with mate mapped to a different chr
                                0x1 bit set and neither 0x4 nor 0x8 bits set and
                                MRNM not equal to RNAME

                         with mate mapped to a different chr (mapQ>=5)
                                0x1 bit set and neither 0x4 nor 0x8 bits set and
                                MRNM not equal to RNAME and MAPQ >= 5


       stats     samtools stats [options] in.sam|in.bam|in.cram [region...]

                 samtools stats collects statistics from BAM files and outputs
                 in a text format.  The output can be visualized graphically
                 using plot-bamstats.

                 Options:

                 -c, --coverage MIN,MAX,STEP
                         Set coverage distribution to the specified range (MIN,
                         MAX, STEP all given as integers) [1,1000,1]

                 -d, --remove-dups
                         Exclude from statistics reads marked as duplicates

                 -f, --required-flag STR|INT
                         Required flag, 0 for unset. See also `samtools flags`
                         [0]

                 -F, --filtering-flag STR|INT
                         Filtering flag, 0 for unset. See also `samtools flags`
                         [0]

                 --GC-depth FLOAT
                         the size of GC-depth bins (decreasing bin size
                         increases memory requirement) [2e4]

                 -h, --help
                         This help message

                 -i, --insert-size INT
                         Maximum insert size [8000]

                 -I, --id STR
                         Include only listed read group or sample name []

                 -l, --read-length INT
                         Include in the statistics only reads with the given
                         read length []

                 -m, --most-inserts FLOAT
                         Report only the main part of inserts [0.99]

                 -P, --split-prefix STR
                         A path or string prefix to prepend to filenames output
                         when creating categorised statistics files with
                         -S/--split.  [input filename]

                 -q, --trim-quality INT
                         The BWA trimming parameter [0]

                 -r, --ref-seq FILE
                         Reference sequence (required for GC-depth and
                         mismatches-per-cycle calculation).  []

                 -S, --split TAG
                         In addition to the complete statistics, also output
                         categorised statistics based on the tagged field TAG
                         (e.g., use --split RG to split into read groups).

                         Categorised statistics are written to files named
                         <prefix>_<value>.bamstat, where prefix is as given by
                         --split-prefix (or the input filename by default) and
                         value has been encountered as the specified tagged
                         field's value in one or more alignment records.

                 -t, --target-regions FILE
                         Do stats in these regions only. Tab-delimited file
                         chr,from,to, 1-based, inclusive.  []

                 -x, --sparse
                         Suppress outputting IS rows where there are no
                         insertions.


       bedcov    samtools bedcov [options] region.bed
                 in1.sam|in1.bam|in1.cram[...]

                 Reports the total read base count (i.e. the sum of per base
                 read depths) for each genomic region specified in the supplied
                 BED file. The regions are output as they appear in the BED file
                 and are 0-based.  Counts for each alignment file supplied are
                 reported in separate columns.

                 Options:

                 -Q INT Only count reads with mapping quality greater than INT

                 -j     Do not include deletions (D) and ref skips (N) in bedcov
                        computation.


       depth     samtools depth [options] [in1.sam|in1.bam|in1.cram
                 [in2.sam|in2.bam|in2.cram] [...]]

                 Computes the depth at each position or region.

                 Options:

                 -a      Output all positions (including those with zero depth)

                 -a -a, -aa
                         Output absolutely all positions, including unused
                         reference sequences.  Note that when used in
                         conjunction with a BED file the -a option may sometimes
                         operate as if -aa was specified if the reference
                         sequence has coverage outside of the region specified
                         in the BED file.

                 -b FILE Compute depth at list of positions or regions in
                         specified BED FILE.  []

                 -f FILE Use the BAM files specified in the FILE (a file of
                         filenames, one file per line) []

                 -l INT  Ignore reads shorter than INT

                 -m, -d INT
                         Truncate reported depth at a maximum of INT reads.
                         [8000]. If 0, depth is set to the maximum integer
                         value, effectively removing any depth limit.

                 -q INT  Only count reads with base quality greater than INT

                 -Q INT  Only count reads with mapping quality greater than INT

                 -r CHR:FROM-TO
                         Only report depth in specified region.


       merge     samtools merge [-nur1f] [-h inh.sam] [-R reg] [-b <list>]
                 <out.bam> <in1.bam> [<in2.bam> <in3.bam> ... <inN.bam>]

                 Merge multiple sorted alignment files, producing a single
                 sorted output file that contains all the input records and
                 maintains the existing sort order.

                 If -h is specified the @SQ headers of input files will be
                 merged into the specified header, otherwise they will be merged
                 into a composite header created from the input headers.  If in
                 the process of merging @SQ lines for coordinate sorted input
                 files, a conflict arises as to the order (for example
                 input1.bam has @SQ for a,b,c and input2.bam has b,a,c) then the
                 resulting output file will need to be re-sorted back into
                 coordinate order.

                 Unless the -c or -p flags are specified then when merging @RG
                 and @PG records into the output header then any IDs found to be
                 duplicates of existing IDs in the output header will have a
                 suffix appended to them to differentiate them from similar
                 header records from other files and the read records will be
                 updated to reflect this.

                 The ordering of the records in the input files must match the
                 usage of the -n and -t command-line options.  If they do not,
                 the output order will be undefined.  See sort for information
                 about record ordering.

                 OPTIONS:

                 -1      Use zlib compression level 1 to compress the output.

                 -b FILE List of input BAM files, one file per line.

                 -f      Force to overwrite the output file if present.

                 -h FILE Use the lines of FILE as `@' headers to be copied to
                         out.bam, replacing any header lines that would
                         otherwise be copied from in1.bam.  (FILE is actually in
                         SAM format, though any alignment records it may contain
                         are ignored.)

                 -n      The input alignments are sorted by read names rather
                         than by chromosomal coordinates

                 -t TAG  The input alignments have been sorted by the value of
                         TAG, then by either position or name (if -n is given).

                 -R STR  Merge files in the specified region indicated by STR
                         [null]

                 -r      Attach an RG tag to each alignment. The tag value is
                         inferred from file names.

                 -u      Uncompressed BAM output

                 -c      When several input files contain @RG headers with the
                         same ID, emit only one of them (namely, the header line
                         from the first file we find that ID in) to the merged
                         output file.  Combining these similar headers is
                         usually the right thing to do when the files being
                         merged originated from the same file.

                         Without -c, all @RG headers appear in the output file,
                         with random suffixes added to their IDs where necessary
                         to differentiate them.

                 -p      Similarly, for each @PG ID in the set of files to
                         merge, use the @PG line of the first file we find that
                         ID in rather than adding a suffix to differentiate
                         similar IDs.


       faidx     samtools faidx <ref.fasta> [region1 [...]]

                 Index reference sequence in the FASTA format or extract
                 subsequence from indexed reference sequence. If no region is
                 specified, faidx will index the file and create <ref.fasta>.fai
                 on the disk. If regions are specified, the subsequences will be
                 retrieved and printed to stdout in the FASTA format.

                 The input file can be compressed in the BGZF format.

                 The sequences in the input file should all have different
                 names.  If they do not, indexing will emit a warning about
                 duplicate sequences and retrieval will only produce
                 subsequences from the first sequence with the duplicated name.

                 FASTQ files can be read and indexed by this command.  Without
                 using --fastq any extracted subsequence will be in FASTA
                 format.

                 Options

                 -o, --output FILE
                         Write FASTA to file rather than to stdout.

                 -n, --length INT
                         Length of FASTA sequence line.  [60]

                 -c, --continue
                         Continue working if a non-existant region is requested.

                 -r, --region-file FILE
                         Read regions from a file. Format is chr:from-to, one
                         per line.

                 -f, --fastq
                         Read FASTQ files and output extracted sequences in
                         FASTQ format.  Same as using samtools fqidx.

                 -i, --reverse-complement
                         Output the sequence as the reverse complement.  When
                         this option is used, “/rc” will be appended to the
                         sequence names.  To turn this off or change the string
                         appended, use the --mark-strand option.

                 --mark-strand TYPE
                         Append strand indicator to sequence name.  TYPE can be
                         one of:

                         rc     Append '/rc' when writing the reverse
                                complement.  This is the default.

                         no     Do not append anything.

                         sign   Append '(+)' for forward strand or '(-)' for
                                reverse complement.  This matches the output of
                                “bedtools getfasta -s”.

                         custom,<pos>,<neg>
                                Append string <pos> to names when writing the
                                forward strand and <neg> when writing the
                                reverse strand.  Spaces are preserved, so it is
                                possible to move the indicator into the comment
                                part of the description line by including a
                                leading space in the strings <pos> and <neg>.

                 -h, --help
                         Print help message and exit.


       fqidx     samtools fqidx <ref.fastq> [region1 [...]]

                 Index reference sequence in the FASTQ format or extract
                 subsequence from indexed reference sequence. If no region is
                 specified, fqidx will index the file and create <ref.fastq>.fai
                 on the disk. If regions are specified, the subsequences will be
                 retrieved and printed to stdout in the FASTQ format.

                 The input file can be compressed in the BGZF format.

                 The sequences in the input file should all have different
                 names.  If they do not, indexing will emit a warning about
                 duplicate sequences and retrieval will only produce
                 subsequences from the first sequence with the duplicated name.

                 samtools fqidx should only be used on fastq files with a small
                 number of entries.  Trying to use it on a file containing
                 millions of short sequencing reads will produce an index that
                 is almost as big as the original file, and searches using the
                 index will be very slow and use a lot of memory.

                 Options

                 -o, --output FILE
                         Write FASTQ to file rather than to stdout.

                 -n, --length INT
                         Length of FASTQ sequence line.  [60]

                 -c, --continue
                         Continue working if a non-existant region is requested.

                 -r, --region-file FILE
                         Read regions from a file. Format is chr:from-to, one
                         per line.

                 -i, --reverse-complement
                         Output the sequence as the reverse complement.  When
                         this option is used, “/rc” will be appended to the
                         sequence names.  To turn this off or change the string
                         appended, use the --mark-strand option.

                 --mark-strand TYPE
                         Append strand indicator to sequence name.  TYPE can be
                         one of:

                         rc     Append '/rc' when writing the reverse
                                complement.  This is the default.

                         no     Do not append anything.

                         sign   Append '(+)' for forward strand or '(-)' for
                                reverse complement.  This matches the output of
                                “bedtools getfasta -s”.

                         custom,<pos>,<neg>
                                Append string <pos> to names when writing the
                                forward strand and <neg> when writing the
                                reverse strand.  Spaces are preserved, so it is
                                possible to move the indicator into the comment
                                part of the description line by including a
                                leading space in the strings <pos> and <neg>.

                 -h, --help
                         Print help message and exit.


       tview     samtools tview [-p chr:pos] [-s STR] [-d display]
                 <in.sorted.bam> [ref.fasta]

                 Text alignment viewer (based on the ncurses library). In the
                 viewer, press `?' for help and press `g' to check the alignment
                 start from a region in the format like `chr10:10,000,000' or
                 `=10,000,000' when viewing the same reference sequence.

                 Options:

                 -d display    Output as (H)tml or (C)urses or (T)ext

                 -p chr:pos    Go directly to this position

                 -s STR        Display only alignments from this sample or read
                               group


       split     samtools split [options] merged.sam|merged.bam|merged.cram

                 Splits a file by read group.

                 Options:

                 -u FILE1      Put reads with no RG tag or an unrecognised RG
                               tag into FILE1

                 -u FILE1:FILE2
                               As above, but assigns an RG tag as given in the
                               header of FILE2

                 -f STRING     Output filename format string (see below)
                               ["%*_%#.%."]

                 -v            Verbose output

                 Format string expansions:

                              %%   %
                              %*   basename
                              %#   @RG index
                              %!   @RG ID
                              %.   output format filename extension


       quickcheck
                 samtools quickcheck [options] in.sam|in.bam|in.cram [ ... ]

                 Quickly check that input files appear to be intact. Checks that
                 beginning of the file contains a valid header (all formats)
                 containing at least one target sequence and then seeks to the
                 end of the file and checks that an end-of-file (EOF) is present
                 and intact (BAM only).

                 Data in the middle of the file is not read since that would be
                 much more time consuming, so please note that this command will
                 not detect internal corruption, but is useful for testing that
                 files are not truncated before performing more intensive tasks
                 on them.

                 This command will exit with a non-zero exit code if any input
                 files don't have a valid header or are missing an EOF block.
                 Otherwise it will exit successfully (with a zero exit code).

                 Options:

                 -v      Verbose output: will additionally print the names of
                         all input files that don't pass the check to stdout.
                         Multiple -v options will cause additional messages
                         regarding check results to be printed to stderr.

                 -q      Quiet mode: disables warning messages on stderr about
                         files that fail.  If both -q and -v options are used
                         then the appropriate level of -v takes precedence.


       dict      samtools dict <ref.fasta|ref.fasta.gz>

                 Create a sequence dictionary file from a fasta file.

                 OPTIONS:

                 -a, --assembly STR
                            Specify the assembly for the AS tag.

                 -H, --no-header
                            Do not print the @HD header line.

                 -o, --output FILE
                            Output to FILE [stdout].

                 -s, --species STR
                            Specify the species for the SP tag.

                 -u, --uri STR
                            Specify the URI for the UR tag. Defaults to the
                            absolute path of ref.fasta unless reading from
                            stdin.


       fixmate   samtools fixmate [-rpcm] [-O format] in.nameSrt.bam out.bam

                 Fill in mate coordinates, ISIZE and mate related flags from a
                 name-sorted alignment.

                 OPTIONS:

                 -r         Remove secondary and unmapped reads.

                 -p         Disable FR proper pair check.

                 -c         Add template cigar ct tag.

                 -m         Add ms (mate score) tags.  These are used by markdup
                            to select the best reads to keep.

                 -O FORMAT  Write the final output as sam, bam, or cram.

                            By default, samtools tries to select a format based
                            on the output filename extension; if output is to
                            standard output or no format can be deduced, bam is
                            selected.


       mpileup   samtools mpileup [-EB] [-C capQcoef] [-r reg] [-f in.fa] [-l
                 list] [-Q minBaseQ] [-q minMapQ] in.bam [in2.bam [...]]

                 Generate pileup for one or multiple BAM files. Alignment
                 records are grouped by sample (SM) identifiers in @RG header
                 lines. If sample identifiers are absent, each input file is
                 regarded as one sample.

                 Samtools mpileup can still produce VCF and BCF output, but this
                 feature is deprecated and will be removed in a future release.
                 Please use bcftools mpileup for this instead.  (Documentation
                 on the deprecated options has been removed from this manual
                 page, but older versions are available online at
                 <http://www.htslib.org/doc/>.)

                 In the pileup format (without -u or -g), each line represents a
                 genomic position, consisting of chromosome name, 1-based
                 coordinate, reference base, the number of reads covering the
                 site, read bases, base qualities and alignment mapping
                 qualities. Information on match, mismatch, indel, strand,
                 mapping quality and start and end of a read are all encoded at
                 the read base column. At this column, a dot stands for a match
                 to the reference base on the forward strand, a comma for a
                 match on the reverse strand, a '>' or '<' for a reference skip,
                 `ACGTN' for a mismatch on the forward strand and `acgtn' for a
                 mismatch on the reverse strand. A pattern
                 `\+[0-9]+[ACGTNacgtn]+' indicates there is an insertion between
                 this reference position and the next reference position. The
                 length of the insertion is given by the integer in the pattern,
                 followed by the inserted sequence. Similarly, a pattern
                 `-[0-9]+[ACGTNacgtn]+' represents a deletion from the
                 reference. The deleted bases will be presented as `*' in the
                 following lines. Also at the read base column, a symbol `^'
                 marks the start of a read. The ASCII of the character following
                 `^' minus 33 gives the mapping quality. A symbol `$' marks the
                 end of a read segment.

                 Note that there are two orthogonal ways to specify locations in
                 the input file; via -r region and -l file.  The former uses
                 (and requires) an index to do random access while the latter
                 streams through the file contents filtering out the specified
                 regions, requiring no index.  The two may be used in
                 conjunction.  For example a BED file containing locations of
                 genes in chromosome 20 could be specified using -r 20 -l
                 chr20.bed, meaning that the index is used to find chromosome 20
                 and then it is filtered for the regions listed in the bed file.

                 Input Options:

                 -6, --illumina1.3+
                           Assume the quality is in the Illumina 1.3+ encoding.

                 -A, --count-orphans
                           Do not skip anomalous read pairs in variant calling.

                 -b, --bam-list FILE
                           List of input BAM files, one file per line [null]

                 -B, --no-BAQ
                           Disable base alignment quality (BAQ) computation.
                           See BAQ below.

                 -C, --adjust-MQ INT
                           Coefficient for downgrading mapping quality for reads
                           containing excessive mismatches. Given a read with a
                           phred-scaled probability q of being generated from
                           the mapped position, the new mapping quality is about
                           sqrt((INT-q)/INT)*INT. A zero value disables this
                           functionality; if enabled, the recommended value for
                           BWA is 50. [0]

                 -d, --max-depth INT
                           At a position, read maximally INT reads per input
                           file. Setting this limit reduces the amount of memory
                           and time needed to process regions with very high
                           coverage.  Passing zero for this option sets it to
                           the highest possible value, effectively removing the
                           depth limit. [8000]

                           Note that up to release 1.8, samtools would enforce a
                           minimum value for this option.  This no longer
                           happens and the limit is set exactly as specified.

                 -E, --redo-BAQ
                           Recalculate BAQ on the fly, ignore existing BQ tags.
                           See BAQ below.

                 -f, --fasta-ref FILE
                           The faidx-indexed reference file in the FASTA format.
                           The file can be optionally compressed by bgzip.
                           [null]

                           Supplying a reference file will enable base alignment
                           quality calculation for all reads aligned to a
                           reference in the file.  See BAQ below.

                 -G, --exclude-RG FILE
                           Exclude reads from readgroups listed in FILE (one
                           @RG-ID per line)

                 -l, --positions FILE
                           BED or position list file containing a list of
                           regions or sites where pileup or BCF should be
                           generated. Position list files contain two columns
                           (chromosome and position) and start counting from 1.
                           BED files contain at least 3 columns (chromosome,
                           start and end position) and are 0-based half-open.
                           While it is possible to mix both position-list and
                           BED coordinates in the same file, this is strongly
                           ill advised due to the differing coordinate systems.
                           [null]

                 -q, -min-MQ INT
                           Minimum mapping quality for an alignment to be used
                           [0]

                 -Q, --min-BQ INT
                           Minimum base quality for a base to be considered [13]

                 -r, --region STR
                           Only generate pileup in region. Requires the BAM
                           files to be indexed.  If used in conjunction with -l
                           then considers the intersection of the two requests.
                           STR [all sites]

                 -R, --ignore-RG
                           Ignore RG tags. Treat all reads in one BAM as one
                           sample.

                 --rf, --incl-flags STR|INT
                           Required flags: skip reads with mask bits unset
                           [null]

                 --ff, --excl-flags STR|INT
                           Filter flags: skip reads with mask bits set
                           [UNMAP,SECONDARY,QCFAIL,DUP]

                 -x, --ignore-overlaps
                           Disable read-pair overlap detection.

                 Output Options:

                 -o, --output FILE
                           Write pileup output to FILE, rather than the default
                           of standard output.

                           (The same short option is used for both the
                           deprecated --open-prob option and --output .  If -o's
                           argument contains any non-digit characters other than
                           a leading + or - sign, it is interpreted as --output.
                           Usually the filename extension will take care of
                           this, but to write to an entirely numeric filename
                           use -o ./123 or --output 123.)

                 -O, --output-BP
                           Output base positions on reads.

                 -s, --output-MQ
                           Output mapping quality.

                 --output-QNAME
                           Output an extra column containing comma-separated
                           read names.

                 -a        Output all positions, including those with zero
                           depth.

                 -a -a, -aa
                           Output absolutely all positions, including unused
                           reference sequences.  Note that when used in
                           conjunction with a BED file the -a option may
                           sometimes operate as if -aa was specified if the
                           reference sequence has coverage outside of the region
                           specified in the BED file.

                 BAQ (Base Alignment Quality)

                 BAQ is the Phred-scaled probability of a read base being
                 misaligned.  It greatly helps to reduce false SNPs caused by
                 misalignments.  BAQ is calculated using the probabilistic
                 realignment method described in the paper “Improving SNP
                 discovery by base alignment quality”, Heng Li, Bioinformatics,
                 Volume 27, Issue 8
                 <https://doi.org/10.1093/bioinformatics/btr076>

                 BAQ is turned on when a reference file is supplied using the -f
                 option.  To disable it, use the -B option.

                 It is possible to store pre-calculated BAQ values in a SAM BQ:Z
                 tag.  Samtools mpileup will use the precalculated values if it
                 finds them.  The -E option can be used to make it ignore the
                 contents of the BQ:Z tag and force it to recalculate the BAQ
                 scores by making a new alignment.


       flags     samtools flags INT|STR[,...]

                 Convert between textual and numeric flag representation.

                 FLAGS:

                   0x1   PAIRED          paired-end (or multiple-segment) sequencing technology
                   0x2   PROPER_PAIR     each segment properly aligned according to the aligner
                   0x4   UNMAP           segment unmapped
                   0x8   MUNMAP          next segment in the template unmapped
                  0x10   REVERSE         SEQ is reverse complemented
                  0x20   MREVERSE        SEQ of the next segment in the template is reverse complemented
                  0x40   READ1           the first segment in the template
                  0x80   READ2           the last segment in the template
                 0x100   SECONDARY       secondary alignment
                 0x200   QCFAIL          not passing quality controls
                 0x400   DUP             PCR or optical duplicate
                 0x800   SUPPLEMENTARY   supplementary alignment


       fastq/a   samtools fastq [options] in.bam
                 samtools fasta [options] in.bam

                 Converts a BAM or CRAM into either FASTQ or FASTA format
                 depending on the command invoked. The files will be
                 automatically compressed if the file names have a .gz or .bgzf
                 extension.

                 The input to this program must be collated by name.  Use
                 samtools collate or samtools sort -n to ensure this.

                 For each different QNAME, the input records are categorised
                 according to the state of the READ1 and READ2 flag bits.  The
                 three categories used are:

                 1 : Only READ1 is set.

                 2 : Only READ2 is set.

                 0 : Either both READ1 and READ2 are set; or neither is set.

                 The exact meaning of these categories depends on the sequencing
                 technology used.  It is expected that ordinary single and
                 paired-end sequencing reads will be in categories 1 and 2 (in
                 the case of paired-end reads, one read of the pair will be in
                 category 1, the other in category 2).  Category 0 is
                 essentially a “catch-all” for reads that do not fit into a
                 simple paired-end sequencing model.

                 For each category only one sequence will be written for a given
                 QNAME.  If more than one record is available for a given QNAME
                 and category, the first in input file order that has quality
                 values will be used.  If none of the candidate records has
                 quality values, then the first in input file order will be used
                 instead.

                 Sequences will be written to standard output unless one of the
                 -1,-2, or -0 options is used, in which case sequences for that
                 category will be written to the specified file.

                 If a singleton file is specified using the -s option then only
                 paired sequences will be output for categories 1 and 2; paired
                 meaning that for a given QNAME there are sequences for both
                 category 1 and 2.  If there is a sequence for only one of
                 categories 1 or 2 then it will be diverted into the specified
                 singletons file.  This can be used to prepare fastq files for
                 programs that cannot handle a mixture of paired and singleton
                 reads.

                 The -s option only affects category 1 and 2 records.  The
                 output for category 0 will be the same irrespective of the use
                 of this option.

                 OPTIONS:

                 -n      By default, either '/1' or '/2' is added to the end of
                         read names where the corresponding READ1 or READ2 FLAG
                         bit is set.  Using -n causes read names to be left as
                         they are.

                 -N      Always add either '/1' or '/2' to the end of read names
                         even when put into different files.

                 -O      Use quality values from OQ tags in preference to
                         standard quality string if available.

                 -s FILE Write singleton reads to FILE.

                 -t      Copy RG, BC and QT tags to the FASTQ header line, if
                         they exist.

                 -T TAGLIST
                         Specify a comma-separated list of tags to copy to the
                         FASTQ header line, if they exist.

                 -1 FILE Write reads with the READ1 FLAG set (and READ2 not set)
                         to FILE instead of outputting them.  If the -s option
                         is used, only paired reads will be written to this
                         file.

                 -2 FILE Write reads with the READ2 FLAG set (and READ1 not set)
                         to FILE instead of outputting them.  If the -s option
                         is used, only paired reads will be written to this
                         file.

                 -0 FILE Write reads where the READ1 and READ2 FLAG bits set are
                         either both set or both unset to FILE instead of
                         outputting them.

                 -f INT  Only output alignments with all bits set in INT present
                         in the FLAG field.  INT can be specified in hex by
                         beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in octal
                         by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -F INT  Do not output alignments with any bits set in INT
                         present in the FLAG field.  INT can be specified in hex
                         by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in
                         octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -G INT  Only EXCLUDE reads with all of the bits set in INT
                         present in the FLAG field.  INT can be specified in hex
                         by beginning with `0x' (i.e. /^0x[0-9A-F]+/) or in
                         octal by beginning with `0' (i.e. /^0[0-7]+/) [0].

                 -i      add Illumina Casava 1.8 format entry to header (eg
                         1:N:0:ATCACG)

                 -c [0..9]
                         set compression level when writing gz or bgzf fastq
                         files.

                 --i1 FILE
                         write first index reads to FILE

                 --i2 FILE
                         write second index reads to FILE

                 --barcode-tag TAG
                         aux tag to find index reads in [default: BC]

                 --quality-tag TAG
                         aux tag to find index quality in [default: QT]

                 --index-format STR
                         string to describe how to parse the barcode and quality
                         tags. For example:


                         i14i8   the first 14 characters are index 1, the next 8
                                 characters are index 2

                         n8i14   ignore the first 8 characters, and use the next
                                 14 characters for index 1

                                 If the tag contains a separator, then the
                                 numeric part can be replaced with '*' to mean
                                 'read until the separator or end of tag', for
                                 example:

                         n*i*    ignore the left part of the tag until the
                                 separator, then use the second part

                 EXAMPLES

                 Output paired reads to separate files, discarding singletons,
                 supplementary and secondary reads.  The resulting files can be
                 used with, for example, the bwa aligner.

                     samtools fastq -1 paired1.fq -2 paired2.fq -0 /dev/null -s /dev/null -n -F 0x900 in.bam


                 Output paired and singleton reads in a single file, discarding
                 supplementary and secondary reads.  To get all of the reads in
                 a single file, it is necessary to redirect the output of
                 samtools fastq.  The output file is suitable for use with bwa
                 mem -p which understands interleaved files containing a mixture
                 of paired and singleton reads.

                     samtools fastq -0 /dev/null -F 0x900 in.bam > all_reads.fq


                 Output paired reads in a single file, discarding supplementary
                 and secondary reads.  Save any singletons in a separate file.
                 Append /1 and /2 to read names.  This format is suitable for
                 use by NextGenMap when using its -p and -q options.  With this
                 aligner, paired reads must be mapped separately to the
                 singletons.

                     samtools fastq -0 /dev/null -s single.fq -N -F 0x900 in.bam > paired.fq


                 BUGS


                 o The way of specifying output files is far to complicated and
                   easy to get wrong.


                 o The default value for the -F option should really be 0x900 so
                   that secondary and supplementary reads are automatically
                   excluded.  The existing default of 0 is retained for reasons
                   of compatibility.



       collate   samtools collate [options] in.sam|in.bam|in.cram [<prefix>]

                 Shuffles and groups reads together by their names.  A faster
                 alternative to a full query name sort, collate ensures that
                 reads of the same name are grouped together in contiguous
                 groups, but doesn't make any guarantees about the order of read
                 names between groups.

                 The output from this command should be suitable for any
                 operation that requires all reads from the same template to be
                 grouped together.

                 If present, <prefix> is used to name the temporary files that
                 collate uses when sorting the data.  If neither the '-O' nor
                 '-o' options are used, <prefix> must be present and collate
                 will use it to make an output file name by appending a suffix
                 depending on the format written (.bam by default).

                 If either the -O or -o option is used, <prefix> is optional.
                 If <prefix> is absent, collate will write the temporary files
                 to a system-dependent location (/tmp on UNIX).

                 Using -f for fast mode will output only primary alignments that
                 have either the READ1 or READ2 flags set (but not both).  Any
                 other alignment records will be filtered out.  The collation
                 will only work correctly if there are no more than two reads
                 for any given QNAME after filtering.

                 Fast mode keeps a buffer of alignments in memory so that it can
                 write out most pairs as soon as they are found instead of
                 storing them in temporary files.  This allows collate to avoid
                 some work and so finish more quickly compared to the standard
                 mode.  The number of alignments held can be changed using -r,
                 storing more alignments uses more memory but increases the
                 number of pairs that can be written early.

                 While collate normally randomises the ordering of read pairs,
                 fast mode does not.  Position-dependent biases that would
                 normally be broken up can remain in the fast collate output.
                 It is therefore not a good idea to use fast mode when preparing
                 data for programs that expect randomly ordered paired reads.
                 For example using fast collate instead of the standard mode may
                 lead to significantly different results from aligners that
                 estimate library insert sizes on batches of reads.

                 Options:

                 -O      Output to stdout.  This option cannot be used with
                         '-o'.

                 -o FILE Write output to FILE.  This option cannot be used with
                         '-O'.

                 -u      Write uncompressed BAM output

                 -l INT  Compression level.  [1]

                 -n INT  Number of temporary files to use.  [64]

                 -f      Fast mode (primary alignments only).

                 -r INT  Number of reads to store in memory (for use with -f).
                         [10000]


       reheader  samtools reheader [-iP] in.header.sam in.bam

                 Replace the header in in.bam with the header in in.header.sam.
                 This command is much faster than replacing the header with a
                 BAM→SAM→BAM conversion.

                 By default this command outputs the BAM or CRAM file to
                 standard output (stdout), but for CRAM format files it has the
                 option to perform an in-place edit, both reading and writing to
                 the same file.  No validity checking is performed on the
                 header, nor that it is suitable to use with the sequence data
                 itself.

                 OPTIONS:

                 -P, --no-PG
                         Do not generate an @PG header line.

                 -i, --in-place
                         Perform the header edit in-place, if possible.  This
                         only works on CRAM files and only if there is
                         sufficient room to store the new header.  The amount of
                         space available will differ for each CRAM file.


       cat       samtools cat [-b list] [-h header.sam] [-o out.bam] <in1.bam>
                 <in2.bam> [ ... ]

                 Concatenate BAMs or CRAMs. Although this works on either BAM or
                 CRAM, all input files must be the same format as each other.
                 The sequence dictionary of each input file must be identical,
                 although this command does not check this. This command uses a
                 similar trick to reheader which enables fast BAM concatenation.

                 OPTIONS:

                 -b FOFN Read the list of input BAM or CRAM files from FOFN.
                         These are concatenated prior to any files specified on
                         the command line.  Multiple -b FOFN options may be
                         specified to concatenate multiple lists of BAM/CRAM
                         files.

                 -h FILE Uses the SAM header from FILE.  By default the header
                         is taken from the first file to be concatenated.

                 -o FILE Write the concatenated output to FILE.  By default this
                         is sent to stdout.


       rmdup     samtools rmdup [-sS] <input.srt.bam> <out.bam>

                 This command is obsolete. Use markdup instead.

                 Remove potential PCR duplicates: if multiple read pairs have
                 identical external coordinates, only retain the pair with
                 highest mapping quality.  In the paired-end mode, this command
                 ONLY works with FR orientation and requires ISIZE is correctly
                 set. It does not work for unpaired reads (e.g. two ends mapped
                 to different chromosomes or orphan reads).

                 OPTIONS:

                 -s      Remove duplicates for single-end reads. By default, the
                         command works for paired-end reads only.

                 -S      Treat paired-end reads and single-end reads.


       addreplacerg
                 samtools addreplacerg [-r rg line | -R rg ID] [-m mode] [-l
                 level] [-o out.bam] <input.bam>

                 Adds or replaces read group tags in a file.

                 OPTIONS:

                 -r STRING
                         Allows you to specify a read group line to append to
                         the header and applies it to the reads specified by the
                         -m option. If repeated it automatically adds in tabs
                         between invocations.

                 -R STRING
                         Allows you to specify the read group ID of an existing
                         @RG line and applies it to the reads specified.

                 -m MODE If you choose orphan_only then existing RG tags are not
                         overwritten, if you choose overwrite_all, existing RG
                         tags are overwritten. The default is overwrite_all.

                 -o STRING
                         Write the final output to STRING. The default is to
                         write to stdout.

                         By default, samtools tries to select a format based on
                         the output filename extension; if output is to standard
                         output or no format can be deduced, bam is selected.


       calmd     samtools calmd [-Eeubr] [-C capQcoef] <aln.bam> <ref.fasta>

                 Generate the MD tag. If the MD tag is already present, this
                 command will give a warning if the MD tag generated is
                 different from the existing tag. Output SAM by default.

                 Calmd can also read and write CRAM files although in most cases
                 it is pointless as CRAM recalculates MD and NM tags on the fly.
                 The one exception to this case is where both input and output
                 CRAM files have been / are being created with the no_ref
                 option.

                 OPTIONS:

                 -A      When used jointly with -r this option overwrites the
                         original base quality.

                 -e      Convert a the read base to = if it is identical to the
                         aligned reference base. Indel caller does not support
                         the = bases at the moment.

                 -u      Output uncompressed BAM

                 -b      Output compressed BAM

                 -C INT  Coefficient to cap mapping quality of poorly mapped
                         reads. See the pileup command for details. [0]

                 -r      Compute the BQ tag (without -A) or cap base quality by
                         BAQ (with -A).

                 -E      Extended BAQ calculation. This option trades
                         specificity for sensitivity, though the effect is
                         minor.


       targetcut samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1
                 em1] [-2 em2] [-f ref] <in.bam>

                 This command identifies target regions by examining the
                 continuity of read depth, computes haploid consensus sequences
                 of targets and outputs a SAM with each sequence corresponding
                 to a target. When option -f is in use, BAQ will be applied.
                 This command is only designed for cutting fosmid clones from
                 fosmid pool sequencing [Ref. Kitzman et al. (2010)].


       phase     samtools phase [-AF] [-k len] [-b prefix] [-q minLOD] [-Q
                 minBaseQ] <in.bam>

                 Call and phase heterozygous SNPs.

                 OPTIONS:

                 -A      Drop reads with ambiguous phase.

                 -b STR  Prefix of BAM output. When this option is in use,
                         phase-0 reads will be saved in file STR.0.bam and
                         phase-1 reads in STR.1.bam.  Phase unknown reads will
                         be randomly allocated to one of the two files. Chimeric
                         reads with switch errors will be saved in
                         STR.chimeric.bam.  [null]

                 -F      Do not attempt to fix chimeric reads.

                 -k INT  Maximum length for local phasing. [13]

                 -q INT  Minimum Phred-scaled LOD to call a heterozygote. [40]

                 -Q INT  Minimum base quality to be used in het calling. [13]


       depad     samtools depad [-SsCu1] [-T ref.fa] [-o output] <in.bam>

                 Converts a BAM aligned against a padded reference to a BAM
                 aligned against the depadded reference.  The padded reference
                 may contain verbatim "*" bases in it, but "*" bases are also
                 counted in the reference numbering.  This means that a sequence
                 base-call aligned against a reference "*" is considered to be a
                 cigar match ("M" or "X") operator (if the base-call is "A",
                 "C", "G" or "T").  After depadding the reference "*" bases are
                 deleted and such aligned sequence base-calls become insertions.
                 Similarly transformations apply for deletions and padding cigar
                 operations.

                 OPTIONS:

                 -S     Ignored for compatibility with previous samtools
                        versions.  Previously this option was required if input
                        was in SAM format, but now the correct format is
                        automatically detected by examining the first few
                        characters of input.

                 -s     Output in SAM format.  The default is BAM.

                 -C     Output in CRAM format.  The default is BAM.

                 -u     Do not compress the output.  Applies to either BAM or
                        CRAM output format.

                 -1     Enable fastest compression level.  Only works for BAM or
                        CRAM output.

                 -T FILE
                        Provides the padded reference file.  Note that without
                        this the @SQ line lengths will be incorrect, so for most
                        use cases this option will be considered as mandatory.

                 -o FILE
                        Specifies the output filename.  By default output is
                        sent to stdout.


       markdup   samtools markdup [-l length] [-r] [-s] [-T] [-S] in.algsort.bam
                 out.bam

                 Mark duplicate alignments from a coordinate sorted file that
                 has been run through fixmate with the -m option.  This program
                 relies on the MC and ms tags that fixmate provides.


                 -l INT     Expected maximum read length of INT bases.  [300]

                 -r         Remove duplicate reads.

                 -s         Print some basic stats.

                 -T PREFIX  Write temporary files to
                            PREFIX.samtools.nnnn.mmmm.tmp

                 -S         Mark supplementary reads of duplicates as
                            duplicates.


           EXAMPLE

           # The first sort can be omitted if the file is already name ordered
           samtools sort -n -o namesort.bam example.bam

           # Add ms and MC tags for markdup to use later
           samtools fixmate -m namesort.bam fixmate.bam

           # Markdup needs position order
           samtools sort -o positionsort.bam fixmate.bam

           # Finally mark duplicates
           samtools markdup positionsort.bam markdup.bam


       help, --help
                 Display a brief usage message listing the samtools commands
                 available.  If the name of a command is also given, e.g.,
                 samtools help view, the detailed usage message for that
                 particular command is displayed.


       --version Display the version numbers and copyright information for
                 samtools and the important libraries used by samtools.


       --version-only
                 Display the full samtools version number in a machine-readable
                 format.

GLOBAL OPTIONS
       Several long-options are shared between multiple samtools subcommands:
       --input-fmt, --input-fmt-option, --output-fmt, --output-fmt-option, and
       --reference.  The input format is typically auto-detected so specifying
       the format is usually unnecessary and the option is included for
       completeness.  Note that not all subcommands have all options.  Consult
       the subcommand help for more details.

       Format strings recognised are "sam", "bam" and "cram".  They may be
       followed by a comma separated list of options as key or key=value. See
       below for examples.

       The fmt-option arguments accept either a single option or option=value.
       Note that some options only work on some file formats and only on read or
       write streams.  If value is unspecified for a boolean option, the value
       is assumed to be 1.  The valid options are as follows.

       level=INT
           Output only. Specifies the compression level from 1 to 9, or 0 for
           uncompressed.

       nthreads=INT
           Specifies the number of threads to use during encoding and/or
           decoding.  For BAM this will be encoding only.  In CRAM the threads
           are dynamically shared between encoder and decoder.

       reference=fasta_file
           Specifies a FASTA reference file for use in CRAM encoding or
           decoding.  It usually is not required for decoding except in the
           situation of the MD5 not being obtainable via the REF_PATH or
           REF_CACHE environment variables.

       decode_md=0|1
           CRAM input only; defaults to 1 (on).  CRAM does not typically store
           MD and NM tags, preferring to generate them on the fly.  This option
           controls this behaviour.  It can be particularly useful when combined
           with a file encoded using store_md=1 and store_nm=1.

       store_md=0|1
           CRAM output only; defaults to 0 (off).  CRAM normally only stores MD
           tags when no reference is unknown and lets the decoder generate these
           values on-the-fly (see decode_md).

       store_nm=0|1
           CRAM output only; defaults to 0 (off).  CRAM normally only stores NM
           tags when no reference is unknown and lets the decoder generate these
           values on-the-fly (see decode_md).

       ignore_md5=0|1
           CRAM input only; defaults to 0 (off).  When enabled, md5 checksum
           errors on the reference sequence and block checksum errors within
           CRAM are ignored.  Use of this option is strongly discouraged.

       required_fields=bit-field
           CRAM input only; specifies which SAM columns need to be populated.
           By default all fields are used.  Limiting the decode to specific
           columns can have significant performance gains.  The bit-field is a
           numerical value constructed from the following table.

                                       0x1   SAM_QNAME
                                       0x2   SAM_FLAG
                                       0x4   SAM_RNAME
                                       0x8   SAM_POS
                                      0x10   SAM_MAPQ
                                      0x20   SAM_CIGAR
                                      0x40   SAM_RNEXT
                                      0x80   SAM_PNEXT
                                     0x100   SAM_TLEN
                                     0x200   SAM_SEQ
                                     0x400   SAM_QUAL
                                     0x800   SAM_AUX
                                    0x1000   SAM_RGAUX

       name_prefix=string
           CRAM input only; defaults to output filename.  Any sequences with
           auto-generated read names will use string as the name prefix.

       multi_seq_per_slice=0|1
           CRAM output only; defaults to 0 (off).  By default CRAM generates one
           container per reference sequence, except in the case of many small
           references (such as a fragmented assembly).

       version=major.minor
           CRAM output only.  Specifies the CRAM version number.  Acceptable
           values are "2.1" and "3.0".

       seqs_per_slice=INT
           CRAM output only; defaults to 10000.

       slices_per_container=INT
           CRAM output only; defaults to 1.  The effect of having multiple
           slices per container is to share the compression header block between
           multiple slices.  This is unlikely to have any significant impact
           unless the number of sequences per slice is reduced.  (Together these
           two options control the granularity of random access.)

       embed_ref=0|1
           CRAM output only; defaults to 0 (off).  If 1, this will store
           portions of the reference sequence in each slice, permitting decode
           without having requiring an external copy of the reference sequence.

       no_ref=0|1
           CRAM output only; defaults to 0 (off).  If 1, sequences will be
           stored verbatim with no reference encoding.  This can be useful if no
           reference is available for the file.

       use_bzip2=0|1
           CRAM output only; defaults to 0 (off).  Permits use of bzip2 in CRAM
           block compression.

       use_lzma=0|1
           CRAM output only; defaults to 0 (off).  Permits use of lzma in CRAM
           block compression.

       lossy_names=0|1
           CRAM output only; defaults to 0 (off).  If 1, templates with all
           members within the same CRAM slice will have their read names
           removed.  New names will be automatically generated during decoding.
           Also see the name_prefix option.

       For example:

           samtools view --input-fmt-option decode_md=0
               --output-fmt cram,version=3.0 --output-fmt-option embed_ref
               --output-fmt-option seqs_per_slice=2000 -o foo.cram foo.bam


REFERENCE SEQUENCES
       The CRAM format requires use of a reference sequence for both reading and
       writing.

       When reading a CRAM the @SQ headers are interrogated to identify the
       reference sequence MD5sum (M5: tag) and the local reference sequence
       filename (UR: tag).  Note that http:// and ftp:// based URLs in the UR:
       field are not used, but local fasta filenames (with or without file://)
       can be used.

       To create a CRAM the @SQ headers will also be read to identify the
       reference sequences, but M5: and UR: tags may not be present. In this
       case the -T and -t options of samtools view may be used to specify the
       fasta or fasta.fai filenames respectively (provided the .fasta.fai file
       is also backed up by a .fasta file).

       The search order to obtain a reference is:

       1. Use any local file specified by the command line options (eg -T).

       2. Look for MD5 via REF_CACHE environment variable.

       3. Look for MD5 in each element of the REF_PATH environment variable.

       4. Look for a local file listed in the UR: header tag.

ENVIRONMENT VARIABLES
       HTS_PATH
              A colon-separated list of directories in which to search for
              HTSlib plugins.  If $HTS_PATH starts or ends with a colon or
              contains a double colon (::), the built-in list of directories is
              searched at that point in the search.

              If no HTS_PATH variable is defined, the built-in list of
              directories specified when HTSlib was built is used, which
              typically includes /usr/local/libexec/htslib and similar
              directories.


       REF_PATH
              A colon separated (semi-colon on Windows) list of locations in
              which to look for sequences identified by their MD5sums.  This can
              be either a list of directories or URLs. Note that if a URL is
              included then the colon in http:// and ftp:// and the optional
              port number will be treated as part of the URL and not a PATH
              field separator.  For URLs, the text %s will be replaced by the
              MD5sum being read.

              If no REF_PATH has been specified it will default to
              http://www.ebi.ac.uk/ena/cram/md5/%s and if REF_CACHE is also
              unset, it will be set to $XDG_CACHE_HOME/hts-ref/%2s/%2s/%s.  If
              $XDG_CACHE_HOME is unset, $HOME/.cache (or a local system
              temporary directory if no home directory is found) will be used
              similarly.


       REF_CACHE
              This can be defined to a single directory housing a local cache of
              references.  Upon downloading a reference it will be stored in the
              location pointed to by REF_CACHE.  When reading a reference it
              will be looked for in this directory before searching REF_PATH.
              To avoid many files being stored in the same directory, a pathname
              may be constructed using %nums and %s notation, consuming num
              characters of the MD5sum.  For example /local/ref_cache/%2s/%2s/%s
              will create 2 nested subdirectories with the filenames in the
              deepest directory being the last 28 characters of the md5sum.

              The REF_CACHE directory will be searched for before attempting to
              load via the REF_PATH search list.  If no REF_PATH is defined,
              both REF_PATH and REF_CACHE will be automatically set (see above),
              but if REF_PATH is defined and REF_CACHE not then no local cache
              is used.

              To aid population of the REF_CACHE directory a script
              misc/seq_cache_populate.pl is provided in the Samtools
              distribution. This takes a fasta file or a directory of fasta
              files and generates the MD5sum named files.

EXAMPLES
       o Import SAM to BAM when @SQ lines are present in the header:

           samtools view -bS aln.sam > aln.bam

         If @SQ lines are absent:

           samtools faidx ref.fa
           samtools view -bt ref.fa.fai aln.sam > aln.bam

         where ref.fa.fai is generated automatically by the faidx command.


       o Convert a BAM file to a CRAM file using a local reference sequence.

           samtools view -C -T ref.fa aln.bam > aln.cram


       o Attach the RG tag while merging sorted alignments:

           perl -e 'print "@RG\tID:ga\tSM:hs\tLB:ga\tPL:Illumina\n@RG\tID:454\tSM:hs\tLB:454\tPL:454\n"' > rg.txt
           samtools merge -rh rg.txt merged.bam ga.bam 454.bam

         The value in a RG tag is determined by the file name the read is coming
         from. In this example, in the merged.bam, reads from ga.bam will be
         attached RG:Z:ga, while reads from 454.bam will be attached RG:Z:454.


       o Convert a BAM file to a CRAM with NM and MD tags stored verbatim rather
         than calculating on the fly during CRAM decode, so that mixed data sets
         with MD/NM only on some records, or NM calculated using different
         definitions of mismatch, can be decoded without change.  The second
         command demonstrates how to decode such a file.  The request to not
         decode MD here is turning off auto-generation of both MD and NM; it
         will still emit the MD/NM tags on records that had these stored
         verbatim.

           samtools view -C --output-fmt-option store_md=1 --output-fmt-option store_nm=1 -o aln.cram aln.bam
           samtools view --input-fmt-option decode_md=0 -o aln.new.bam aln.cram


       o An alternative way of achieving the above is listing multiple options
         after the --output-fmt or -O option.  The commands below are equivalent
         to the two above.

           samtools view -O cram,store_md=1,store_nm=1 -o aln.cram aln.bam
           samtools view --input-fmt cram,decode_md=0 -o aln.new.bam aln.cram



       o Call SNPs and short INDELs:

           samtools mpileup -uf ref.fa aln.bam | bcftools call -mv > var.raw.vcf
           bcftools filter -s LowQual -e '%QUAL<20 || DP>100' var.raw.vcf  > var.flt.vcf

         The bcftools filter command marks low quality sites and sites with the
         read depth exceeding a limit, which should be adjusted to about twice
         the average read depth (bigger read depths usually indicate problematic
         regions which are often enriched for artefacts).  One may consider to
         add -C50 to mpileup if mapping quality is overestimated for reads
         containing excessive mismatches. Applying this option usually helps
         BWA-short but may not other mappers.

         Individuals are identified from the SM tags in the @RG header lines.
         Individuals can be pooled in one alignment file; one individual can
         also be separated into multiple files. The -P option specifies that
         indel candidates should be collected only from read groups with the
         @RG-PL tag set to ILLUMINA.  Collecting indel candidates from reads
         sequenced by an indel-prone technology may affect the performance of
         indel calling.


       o Generate the consensus sequence for one diploid individual:

           samtools mpileup -uf ref.fa aln.bam | bcftools call -c | vcfutils.pl vcf2fq > cns.fq


       o Phase one individual:

           samtools calmd -AEur aln.bam ref.fa | samtools phase -b prefix - > phase.out

         The calmd command is used to reduce false heterozygotes around INDELs.



       o Dump BAQ applied alignment for other SNP callers:

           samtools calmd -bAr aln.bam > aln.baq.bam

         It adds and corrects the NM and MD tags at the same time. The calmd
         command also comes with the -C option, the same as the one in pileup
         and mpileup.  Apply if it helps.


LIMITATIONS
       o Unaligned words used in bam_import.c, bam_endian.h, bam.c and
         bam_aux.c.

       o Samtools paired-end rmdup does not work for unpaired reads (e.g. orphan
         reads or ends mapped to different chromosomes). If this is a concern,
         please use Picard's MarkDuplicates which correctly handles these cases,
         although a little slower.


AUTHOR
       Heng Li from the Sanger Institute wrote the original C version of
       samtools.  Bob Handsaker from the Broad Institute implemented the BGZF
       library.  James Bonfield from the Sanger Institute developed the CRAM
       implementation.  John Marshall and Petr Danecek contribute to the source
       code and various people from the 1000 Genomes Project have contributed to
       the SAM format specification.


SEE ALSO
       bcftools(1), sam(5), tabix(1)

       Samtools website: <http://www.htslib.org/>
       File format specification of SAM/BAM,CRAM,VCF/BCF:
       <http://samtools.github.io/hts-specs>
       Samtools latest source: <https://github.com/samtools/samtools>
       HTSlib latest source: <https://github.com/samtools/htslib>
       Bcftools website: <http://samtools.github.io/bcftools>



samtools-1.9                      18 July 2018                       samtools(1)