pdsh

pdsh(1)                     General Commands Manual                    pdsh(1)



NAME
       pdsh - issue commands to groups of hosts in parallel


SYNOPSIS
       pdsh [options]... command


DESCRIPTION
       pdsh is a variant of the rsh(1) command. Unlike rsh(1), which runs
       commands on a single remote host, pdsh can run multiple remote commands
       in parallel. pdsh uses a "sliding window" (or fanout) of threads to
       conserve resources on the initiating host while allowing some
       connections to time out.

       When pdsh receives SIGINT (ctrl-C), it lists the status of current
       threads. A second SIGINT within one second terminates the program.
       Pending threads may be canceled by issuing ctrl-Z within one second of
       ctrl-C.  Pending threads are those that have not yet been initiated, or
       are still in the process of connecting to the remote host.


       If a remote command is not specified on the command line, pdsh runs
       interactively, prompting for commands and executing them when
       terminated with a carriage return. In interactive mode, target nodes
       that time out on the first command are not contacted for subsequent
       commands, and commands prefixed with an exclamation point will be
       executed on the local system.

       The core functionality of pdsh may be supplemented by dynamically
       loadable modules. The modules may provide a new connection protocol
       (replacing the standard rcmd(3) protocol used by rsh(1)), filtering
       options (e.g. removing hosts that are "down" from the target list),
       and/or host selection options (e.g., -a selects all hosts from a
       configuration file.). By default, pdsh must have at least one "rcmd"
       module loaded. See the RCMD MODULES section for more information.


RCMD MODULES
       The method by which pdsh runs commands on remote hosts may be selected
       at runtime using the -R option (See OPTIONS below).  This functionality
       is ultimately implemented via dynamically loadable modules, and so the
       list of available options may be different from installation to
       installation. A list of currently available rcmd modules is printed
       when using any of the -h, -V, or -L options. The default rcmd module
       will also be displayed with the -h and -V options.

       A list of rcmd modules currently distributed with pdsh follows.

       rsh     Uses an internal, thread-safe implementation of BSD rcmd(3) to
               run commands using the standard rsh(1) protocol.

       exec    Executes an arbitrary command for each target host. The first
               of the pdsh remote arguments is the local command to execute,
               followed by any further arguments. Some simple parameters are
               substitued on the command line, including %h for the target
               hostname, %u for the remote username, and %n for the remote
               rank [0-n] (To get a literal % use %%).  For example, the
               following would duplicate using the ssh module to run
               hostname(1) across the hosts foo[0-10]:

                  pdsh -R exec -w foo[0-10] ssh -x -l %u %h hostname

               and this command line would run grep(1) in parallel across the
               files console.foo[0-10]:

                  pdsh -R exec -w foo[0-10] grep BUG console.%h


       ssh     Uses a variant of popen(3) to run multiple copies of the ssh(1)
               command.

       mrsh    This module uses the mrsh(1) protocol to execute jobs on remote
               hosts.  The mrsh protocol uses a credential based
               authentication, forgoing the need to allocate reserved ports.
               In other aspects, it acts just like rsh. Remote nodes must be
               running mrshd(8) in order for the mrsh module to work.

       qsh     Allows pdsh to execute MPI jobs over QsNet. Qshell propagates
               the current working directory, pdsh environment, and Elan
               capabilities to the remote process. The following environment
               variable are also appended to the environment: RMS_RANK,
               RMS_NODEID, RMS_PROCID, RMS_NNODES, and RMS_NPROCS. Since pdsh
               needs to run setuid root for qshell support, qshell does not
               directly support propagation of LD_LIBRARY_PATH and LD_PREOPEN.
               Instead the QSHELL_REMOTE_LD_LIBRARY_PATH and
               QSHELL_REMOTE_LD_PREOPEN environment variables will may be used
               and will be remapped to LD_LIBRARY_PATH and LD_PREOPEN by the
               qshell daemon if set.

       mqsh    Similar to qshell, but uses the mrsh protocol instead of the
               rsh protocol.

       krb4    The krb4 module allows users to execute remote commands after
               authenticating with kerberos. Of course, the remote rshd
               daemons must be kerberized.

       xcpu    The xcpu module uses the xcpu service to execute remote
               commands.


OPTIONS
       The list of available options is determined at runtime by supplementing
       the list of standard pdsh options with any options provided by loaded
       rcmd and misc modules.  In some cases, options provided by modules may
       conflict with each other. In these cases, the modules are incompatible
       and the first module loaded wins.


Standard target nodelist options
       -w TARGETS,...
              Target and or filter the specified list of hosts. Do not use
              with any other node selection options (e.g. -a, -g, if they are
              available). No spaces are allowed in the comma-separated list.
              Arguments in the TARGETS list may include normal host names, a
              range of hosts in hostlist format (See HOSTLIST EXPRESSIONS), or
              a single `-' character to read the list of hosts on stdin.

              If a host or hostlist is preceded by a `-' character, this
              causes those hosts to be explicitly excluded. If the argument is
              preceded by a single `^' character, it is taken to be the path
              to file containing a list of hosts, one per line. If the item
              begins with a `/' character, it is taken  as a regular
              expression on which to filter the list of hosts (a regex
              argument may also be optionally trailed by another '/', e.g.
              /node.*/). A regex or file name argument may also be preceeded
              by a minus `-' to exclude instead of include thoses hosts.

              A list of hosts may also be preceded by "user@" to specify a
              remote username other than the default, or "rcmd_type:" to
              specify an alternate rcmd connection type for these hosts. When
              used together, the rcmd type must be specified first, e.g.
              "ssh:user1@host0" would use ssh to connect to host0 as user
              "user1."



       -x host,host,...
              Exclude the specified hosts. May be specified in conjunction
              with other target node list options such as -a and -g (when
              available). Hostlists may also be specified to the -x option
              (see the HOSTLIST EXPRESSIONS section below). Arguments to -x
              may also be preceeded by the filename (`^') and regex ('/')
              characters as described above, in which case the resulting hosts
              are excluded as if they had been given to -w and preceeded with
              the minus `-' character.


Standard pdsh options
       -S     Return the largest of the remote command return values.

       -h     Output usage menu and quit. A list of available rcmd modules
              will also be printed at the end of the usage message.

       -s     Only on AIX, separate remote command stderr and stdout into two
              sockets.

       -q     List option values and the target nodelist and exit without
              action.

       -b     Disable ctrl-C status feature so that a single ctrl-C kills
              parallel job. (Batch Mode)

       -l user
              This option may be used to run remote commands as another user,
              subject to authorization. For BSD rcmd, this means the invoking
              user and system must be listed in the userĀ“s .rhosts file (even
              for root).

       -t seconds
              Set the connect timeout. Default is 10 seconds.

       -u seconds
              Set a limit on the amount of time a remote command is allowed to
              execute.  Default is no limit. See note in LIMITATIONS if using
              -u with ssh.

       -f number
              Set the maximum number of simultaneous remote commands to
              number.  The default is 32.

       -R name
              Set rcmd module to name. This option may also be set via the
              PDSH_RCMD_TYPE environment variable. A list of available rcmd
              modules may be obtained via the -h, -V, or -L options.  The
              default will be listed with -h or -V.

       -M name,...
              When multiple misc modules provide the same options to pdsh, the
              first module initialized "wins" and subsequent modules are not
              loaded.  The -M option allows a list of modules to be specified
              that will be force-initialized before all others, in-effect
              ensuring that they load without conflict (unless they conflict
              with eachother). This option may also be set via the
              PDSH_MISC_MODULES environment variable.

       -L     List info on all loaded pdsh modules and quit.

       -N     Disable hostname: prefix on lines of output.

       -d     Include more complete thread status when SIGINT is received, and
              display connect and command time statistics on stderr when done.

       -V     Output pdsh version information, along with list of currently
              loaded modules, and exit.


qsh/mqsh module options
       -n tasks_per_node
              Set the number of tasks spawned per node. Default is 1.

       -m block | cyclic
              Set block versus cyclic allocation of processes to nodes.
              Default is block.

       -r railmask
              Set the rail bitmask for a job on a multirail system. The
              default railmask is 1, which corresponds to rail 0 only. Each
              bit set in the argument to -r corresponds to a rail on the
              system, so a value of 2 would correspond to rail 1 only, and 3
              would indicate to use both rail 1 and rail 0.


machines module options
       -a     Target all nodes from machines file.


genders module options
       In addition to the genders options presented below, the genders
       attribute pdsh_rcmd_type may also be used in the genders database to
       specify an alternate rcmd connect type than the pdsh default for hosts
       with this attribute. For example, the following line in the genders
       file

         host0 pdsh_rcmd_type=ssh

       would cause pdsh to use ssh to connect to host0, even if rsh were the
       default.  This can be overridden on the commandline with the
       "rcmd_type:host0" syntax.


       -A     Target all nodes in genders database. The -A option will target
              every host listed in genders -- if you want to omit some hosts
              by default, see the -a option below.

       -a     Target all nodes in genders database except those with the
              "pdsh_all_skip" attribute. This is shorthand for running "pdsh
              -A -X pdsh_all_skip ..."

       -g attr[=val][,attr[=val],...]
              Target nodes that match any of the specified genders attributes
              (with optional values). Conflicts with -a and -w options. This
              option targets the alternate hostnames in the genders database
              by default. The -i option provided by the genders module may be
              used to translate these to the canonical genders hostnames. If
              the installed version of genders supports it, attributes
              supplied to -g may also take the form of genders queries.
              Genders queries will query the genders database for the union,
              intersection, difference, or complement of genders attributes
              and values.  The set operation union is represented by two pipe
              symbols ('||'), intersection by two ampersand symbols ('&&'),
              difference by two minus symbols ('--'), and complement by a
              tilde ('~').  Parentheses may be used to change the order of
              operations. See the nodeattr(1) manpage for examples of genders
              queries.

       -X attr[=val][,attr[=val],...]
              Exclude nodes that match any of the specified genders attributes
              (optionally with values).  This option may be used in
              combination with any other of the node selection options (e.g.
              -w, -g, -a, -X may also take the form of genders queries. Please
              see documentation for the genders -g option for more information
              about genders queries.

       -i     Request translation between canonical and alternate hostnames.

       -F filename
              Read genders information from filename instead of the system
              default genders file. If filename doesn't specify an absolute
              path then it is taken to be relative to the directory specified
              by the PDSH_GENDERS_DIR environment variable (/etc by default).
              An alternate genders file may also be specified via the
              PDSH_GENDERS_FILE environment variable.


nodeupdown module options
       -v     Eliminate target nodes that are considered "down" by
              libnodeupdown.


slurm module options
       The slurm module allows pdsh to target nodes based on currently running
       SLURM jobs. The slurm module is typically called after all other node
       selection options have been processed, and if no nodes have been
       selected, the module will attempt to read a running jobid from the
       SLURM_JOBID environment variable (which is set when running under a
       SLURM allocation). If SLURM_JOBID references an invalid job, it will be
       silently ignored.

       -j jobid[,jobid,...]
              Target list of nodes allocated to the SLURM job jobid. This
              option may be used multiple times to target multiple SLURM jobs.
              The special argument "all" can be used to target all nodes
              running SLURM jobs, e.g.  -j all.


torque module options
       The torque module allows pdsh to target nodes based on currently
       running Torque/PBS jobs. Similar to the slurm module, the torque module
       is typically called after all other node selection options have been
       processed, and if no nodes have been selected, the module will attempt
       to read a running jobid from the PBS_JOBID environment variable (which
       is set when running under a Torque allocation).

       -j jobid[,jobid,...]
              Target list of nodes allocated to the Torque job jobid. This
              option may be used multiple times to target multiple Torque
              jobs.


rms module options
       The rms module allows pdsh to target nodes based on an RMS resource.
       The rms module is typically called after all other node selection
       options, and if no nodes have been selected, the module will examine
       the RMS_RESOURCEID environment variable and attempt to set the target
       list of hosts to the nodes in the RMS resource. If an invalid resource
       is denoted, the variable is silently ignored.


SDR module options
       The SDR module supports targeting hosts via the System Data Repository
       on IBM SPs.

       -a     Target all nodes in the SDR. The list is generated from the
              "reliable hostname" in the SDR by default.

       -i     Translate hostnames between reliable and initial in the SDR,
              when applicable.  If the a target hostname matches either the
              initial or reliable hostname in the SDR, the alternate name will
              be substitued. Thus a list composed of initial hostnames will
              instead be replaced with a list of reliable hostnames.  For
              example, when used with -a above, all initial hostnames in the
              SDR are targeted.

       -v     Do not target nodes that are marked as not responding in the SDR
              on the targeted interface. (If a hostname does not appear in the
              SDR, then that name will remain in the target hostlist.)

       -G     In combination with -a, include all partitions.


nodeattr module options
       The nodeattr module supports access to the genders database via the
       nodeattr(1) command. See the genders section above for a list of
       support options with this module. The option usage with the nodeattr
       module is the same as genders, above, with the exception that the -i
       option may only be used with -a or -g. NOTE: This module will only work
       with very old releases of genders where the nodeattr(1) command
       supports the -r option, and before the libgenders API was available.
       Users running newer versions of genders will need to use the genders
       module instead.


dshgroup module options
       The dshgroup module allows pdsh to use dsh (or Dancer's shell) style
       group files from /etc/dsh/group/ or ~/.dsh/group/. The default search
       path may be overridden with the DSHGROUP_PATH environment variable, a
       colon-separated list of directories to search. The default value for
       DSHGROUP_PATH is /etc/dsh/group.

       -g groupname,...
              Target nodes in dsh group file "groupname" found in either
              ~/.dsh/group/groupname or /etc/dsh/group/groupname.

       -X groupname,...
              Exclude nodes in dsh group file "groupname."

       As an enhancement in pdsh, dshgroup files may optionally include other
       dshgroup files via a special #include STRING syntax.  The argument to
       #include may be either a file path, or a group name, in which case the
       path used to search for the group file is the same as if the group had
       been specified to -g.


netgroup module options
       The netgroup module allows pdsh to use standard netgroup entries to
       build lists of target hosts. (/etc/netgroup or NIS)

       -g groupname,...
              Target nodes in netgroup "groupname."

       -X groupname,...
              Exclude nodes in netgroup "groupname."


ENVIRONMENT VARIABLES
       PDSH_RCMD_TYPE
              Equivalent to the -R option, the value of this environment
              variable will be used to set the default rcmd module for pdsh to
              use (e.g. ssh, rsh).

       PDSH_SSH_ARGS
              Override the standard arguments that pdsh passes to the ssh(1)
              command ("-2 -a -x -l%u %h"). The use of the parameters %u, %h,
              and %n (as documented in the rcmd/exec section above) is
              optional. If these parameters are missing, pdsh will append them
              to the ssh commandline because it is assumed they are mandatory.

       PDSH_SSH_ARGS_APPEND
              Append additional options to the ssh(1) command invoked by pdsh.
              For example, PDSH_SSH_ARGS_APPEND="-q" would run ssh in quiet
              mode, or "-v" would increase the verbosity of ssh. (Note: these
              arguments are actually prepended to the ssh commandline to
              ensure they appear before any target hostname argument to ssh.)

       WCOLL  If no other node selection option is used, the WCOLL environment
              variable may be set to a filename from which a list of target
              hosts will be read. The file should contain a list of hosts, one
              per line (though each line may contain a hostlist expression.
              See HOSTLIST EXPRESSIONS section below).

       DSHPATH
              If set, the path in DSHPATH will be used as the PATH for the
              remote processes.

       FANOUT Set the pdsh fanout (See description of -f above).


HOSTLIST EXPRESSIONS
       As noted in sections above pdsh accepts lists of hosts the general
       form: prefix[n-m,l-k,...], where n < m and l < k, etc., as an
       alternative to explicit lists of hosts. This form should not be
       confused with regular expression character classes (also denoted by
       ``[]''). For example, foo[19] does not represent an expression matching
       foo1 or foo9, but rather represents the degenerate hostlist: foo19.

       The hostlist syntax is meant only as a convenience on clusters with a
       "prefixNNN" naming convention and specification of ranges should not be
       considered necessary -- this foo1,foo9 could be specified as such, or
       by the hostlist foo[1,9].

       Some examples of usage follow:


       Run command on foo01,foo02,...,foo05
           pdsh -w foo[01-05] command

       Run command on foo7,foo9,foo10
            pdsh -w foo[7,9-10] command

       Run command on foo0,foo4,foo5
            pdsh -w foo[0-5] -x foo[1-3] command


       A suffix on the hostname is also supported:


       Run command on foo0-eth0,foo1-eth0,foo2-eth0,foo3-eth0
          pdsh -w foo[0-3]-eth0 command


       As a reminder to the reader, some shells will interpret brackets ('['
       and ']') for pattern matching.  Depending on your shell, it may be
       necessary to enclose ranged lists within quotes.  For example, in tcsh,
       the first example above should be executed as:

            pdsh -w "foo[01-05]" command


ORIGIN
       Originally a rewrite of IBM dsh(1) by Jim Garlick <garlick@llnl.gov> on
       LLNL's ASCI Blue-Pacific IBM SP system. It is now used on Linux
       clusters at LLNL.


LIMITATIONS
       When using ssh for remote execution, expect the stderr of ssh to be
       folded in with that of the remote command. When invoked by pdsh, it is
       not possible for ssh to prompt for passwords if RSA/DSA keys are
       configured properly, etc..  For ssh implementations that suppport a
       connect timeout option, pdsh attempts to use that option to enforce the
       timeout (e.g. -oConnectTimeout=T for OpenSSH), otherwise connect
       timeouts are not supported when using ssh.  Finally, there is no
       reliable way for pdsh to ensure that remote commands are actually
       terminated when using a command timeout. Thus if -u is used with ssh
       commands may be left running on remote hosts even after timeout has
       killed local ssh processes.

       Output from multiple processes per node may be interspersed when using
       qshell or mqshell rcmd modules.

       The number of nodes that pdsh can simultaneously execute remote jobs on
       is limited by the maximum number of threads that can be created
       concurrently, as well as the availability of reserved ports in the rsh
       and qshell rcmd modules. On systems that implement Posix threads, the
       limit is typically defined by the constant PTHREADS_THREADS_MAX.


FILES
SEE ALSO
       rsh(1), ssh(1), dshbak(1), pdcp(1)



pdsh-2.27                          linux-gnu                           pdsh(1)