atopgpud

ATOPGPUD(8)                  System Manager's Manual                 ATOPGPUD(8)



NAME
       atopgpud - GPU statistics daemon

SYNOPSIS
       atopgpud [-v]

DESCRIPTION
       The atopgpud daemon gathers statistical information from all Nvidia GPUs
       in the current system. With a sampling rate of one second, it maintains
       the statistics of every GPU, globally (system level) and per process.
       When atopgpud is active on the target system, atop connects to this
       daemon via a TCP socket and obtains all GPU statistics with every
       interval.

       The approach to gather all GPU statistics in a separate daemon is
       required, because the Nvidia driver only offers the GPU busy percentage
       of the last second. Suppose that atop runs with a 10-minute interval and
       would fetch the GPU busy percentage directly from the Nvidia driver, it
       would reflect the busy percentage of the last second instead of the
       average busy percentage during 600 seconds.  Therefore, the atopgpud
       daemon fetches the GPU busy percentage every second and accumulates this
       into a counter that can be retrieved by atop regularly. The same approach
       applies to other GPU statistics.

       When the atopgpud daemon runs with root privileges, more process level
       counters (i.e.  GPU busy and GPU memory busy per process) are provided
       that are otherwise not applicable.

       Notice that certain GPU statistics are only delivered for specific GPU
       types.  For older or less sophisticated GPUs, the value -1 is returned
       for counters that are not maintained. In the output of atop these
       counters are shown as 'N/A'.

       When no (Nvidia) GPUs can be found in the target system, atopgpud
       immediately terminates with exit code 0.

       Log messages are written via the rsyslogd daemon with facility 'daemon'.
       With the -v flag (verbose), atopgpud also logs debug messages.

INSTALLATION
       The atopgpud daemon is written in Python, so a Python interpreter should
       be installed on the target system.  This can  either be Python version 2
       or Python version 3 (the code of atopgpud is written in a generic way).
       Take care that the first line of the atopgpud script contains the proper
       command name to activate a Python interpreter that is installed on the
       target system!

       The atopgpud daemon depends on the Python module pynvml to interface with
       the Nvidia driver.  This module can be installed by the pip or pip3
       command and is usually packaged under the name nvidia-ml-py
       Finally, the pynvml module is a Python wrapper around the libnvidia-ml
       shared library that needs to be installed as well.

       After installing the atop package, the atopgpud is not automatically
       started, nor will the service be enabed by default.  When you want to
       activate this service (permanently), enter the following commands (as
       root):

         systemctl enable atopgpu
         systemctl start atopgpu

INTERFACE DESCRIPTION
       Client processes can connect to the atopgpud daemon on TCP port 59123.
       Subsequently, such client can send a request of two bytes, consisting of
       one byte request code followed by one byte integer being the API version
       number.
       The request code in the first byte can be 'T' to obtain information about
       the GPU types installed in this system (usually only requested once).
       The request code can be 'S' to obtain all statistical counter values
       (requested for every interval).

       The response of the daemon starts with a 4-byte integer. The first byte
       is the API version number that determines the response format while the
       subsequent three bytes indicate the length (big endian order) of the
       response string that follows.
       In the response strings the character '@' introduces system level
       information of one specific GPU and the character '#' introduces process
       level information related to that GPU.
       For further details about the meaning of the counters in a response
       string, please consult the source code.

SEE ALSO
       atop(1), atopsar(1), atoprc(5), netatop(4), netatopd(8), atopacctd(8)
       https://www.atoptool.nl

AUTHOR
       Gerlof Langeveld (gerlof.langeveld@atoptool.nl)



Linux                             November 2019                      ATOPGPUD(8)