Skip to content

Lava Commands and Utilities

The lava utilities are included in the jinlava Python package together with the lava APIs.

Lava includes a number of CLI commands and utilities. Some of these are used by the lava worker itself while others are support tools. Most of them can be used stand-alone or invoked by lava exe, pkg and docker jobs.

All of the utilities support a -h , --help option.

Utility Description
jinja Render a Jinja template with specified parameters.
lava-ami Manage the AMIs used in lava worker CloudFormation stacks.
lava-backup Backup the DynamoDB entries for a specified lava realm.
lava-check Perform some basic health checks on DynamoDB table entries.
lava-checksum Set and validate checksums on lava DynamoDB table entries.
lava-conn-usage Generate an activity map of specified jobs over a defined time period.
lava-dag-gen Generate a DAG specification for lava dag jobs from a dependency matrix.
lava-dispatcher The lava job dispatcher.
lava-dump Extract lava configurations from DynamoDB and dump them to files.
lava-email Send email using lava email connections.
lava-events Query the lava events table.
lava-job-activity Query the lava realm events table to map job activity over a specified time window.
lava-new Create a new lava job framework project.
lava-ps Show lava worker process information on the current host.
lava-schema Perform deep schema validation for lava DynamoDB specification objects.
lava-sharepoint Operate on SharePoint sites using lava sharepoint connections.
lava-slack Send Slack messages using lava slack connections.
lava-smb Operate on SMB file shares using lava smb connections.
lava-sql Run SQL using lava database connections.
lava-state Manipulate lava state items.
lava-stop Perform a controlled shutdown of the lava worker daemons. This also support AWS auto scaling lifecycle hooks.
lava-version Provides version information on the installed lava version.
lava-worker The main lava worker.
lava-ws Show lava worker status based on the worker SQS queues (queue depths, worker backlog etc.). This does not need to run on the worker host.
s3lambda Trigger an AWS Lambda function by generating synthetic S3 event notifications.

Lava Icons

The following are the official lava icons. These used to change based on a whim, same as AWS architecture icons, but are now stable.

PNG

SVG

Lava AMI Utility

The lava-ami utility displays the available, lava compatible, AMIs and the AMI specified in each lava worker CloudFormation stack.

Usage
usage: lava-ami [-h] [-n N] [--sak] [--profile PROFILE] [-U] [-v] [-W]
                [STACK-NAME ...]

Manage the AMIs used in lava worker CloudFormation stacks.

positional arguments:
  STACK-NAME         CloudFormation stack name for a lava worker. Glob style
                     patterns can be used. If not specified, or any of the
                     patterns is *, the -U / --update option is not permitted.

optional arguments:
  -h, --help         show this help message and exit
  -n N               Only include specified number of most recent images of
                     each type in the selection list. Default 5.
  --profile PROFILE  As for AWS CLI.
  -U, --update       Initiate an interactive update process to allow a new AMI
                     to be applied for selected stacks. If specified, one or
                     more stack patterns must be specified (no single *) to
                     make it harder to maniacally update a whole bunch of
                     stacks in one go. You can thank me later.
  -v, --version      show program's version number and exit
  -W, --no-wait      Don't wait for CloudFormation stack updates to complete.

Lava-ami also provides an update mode that allows a (more or less) interactive process to select and apply a different AMI to one or more lava worker stacks. This process is a lot simpler and less error prone than trying to manage the AMI used on multiple workers in the AWS CloudFormation console.

Lava-ami will silently ignore worker stacks that appear to be parasitic workers hosted on another worker instance. These are detected by the absence of a machine type or AMI ID parameter in the CloudFormation stack. If these need to be updated, use the AWS CloudFormation console.

Lava-ami is conservative in its definition of lava compatibility for an AMI. The lava worker itself can run on any Linux machine with the right prerequisite components installed but these two images support the deployment and bootstrapping processes preferred in lava operational environments. Lava-ami will highlight the most recent lava AMI in its output.

Lava Backup Utility

Lava-backup performs a complete extract of all of the configuration tables for a given realm and stores the result in a zip file, either locally or in AWS S3.

Usage
usage: lava-backup [options] realm zip-file

Backup the DynamoDB entries for a specified lava realm. The output is a zip file.

positional arguments:
  realm               Realm name.
  zip-file            Name of the output zip file. Can be on the local machine
                      or in S3 (s3://....). 

optional arguments:
  -h | --help         Print help and exit.
  -y | --yaml         Output the entries in YAML format. The default is JSON.

Lava-backup uses the lava-dump utility under the covers. It can be run as a lava cmd job if required. A lava job specification suitable for backing up the current realm is:

{
  "description": "Backup the DynamoDB entries for the realm",
  "dispatcher": "Sydney",
  "enabled": true,
  "job_id": "lava/dynamo-backup",
  "owner": "lava",
  "parameters": {
    "args": [
      "{{realm.realm}}",
      "{{realm.s3_temp}}/lava/dynamo-backup/{{ustart.strftime('%Y-%m-%d')}}.zip"
    ]
  },
  "payload": "lava-backup",
  "schedule": "0 19 * * *",
  "type": "cmd",
  "worker": "core"
}

Note

Dispatcher, worker and schedule will need to be adjusted in the example.

Lava Check Utility

The lava-check utility performs some basic health checks on DynamoDB table entries.

Usage
usage: lava-check [-h] [-c GLOB] [--profile PROFILE] [-r REALM] [-S] [-v]
                  [--no-colour] [-l LEVEL] [--log LOG] [--tag TAG]

Check lava specifications for problems.

options:
  -h, --help            show this help message and exit
  -c GLOB, --check GLOB
                        Run the health checks with names matching the given
                        glob patterns. Can be used multiple times. If not
                        specified, print a list of available checks.
  --profile PROFILE     As for AWS CLI.
  -r REALM, --realm REALM
                        Lava realm name. If not specified, the environment
                        variable LAVA_REALM must be set.
  -S, --no-suppress     Disable suppression of checks for specific DynamoDB
                        entries via the x-lava-nocheck field. By default
                        suppression of specific checks is permitted for some
                        check types.
  -v, --version         show program's version number and exit

logging arguments:
  --no-colour, --no-color
                        Don't use colour in information messages.
  -l LEVEL, --level LEVEL
                        Print messages of a given severity level or above. The
                        standard logging level names are available but debug,
                        info, warning and error are most useful. The Default
                        is info.
  --log LOG             Log to the specified target. This can be either a file
                        name or a syslog facility with an @ prefix (e.g.
                        @local0).
  --tag TAG             Tag log entries with the specified value. The default
                        is lava-check.

See also lava-schema.

Lava-check supports the following checks:

Check Type Description
conmeta Connection specs with missing metadata (e.g. description, owner).
jobjinja * Job specs with Jinja rendering issues. This includes jobs that use globals for which there is no placeholder entry in the job specification, referred to as undeclared globals. While lava tolerates undeclared globals, it is good practice to declare them with a placeholder value.
jobmeta Job specs with missing metadata (e.g. description, owner).
joborphan * Jobs with no recorded run events.
jobrepo * Job specs that don't appear to have an associated repo (no x-lava-git-repo field).
jobrsu * redshift_unload jobs with insecure set to true.
trigmeta S3trigger specs with missing metadata (e.g. description, owner).

The checks marked with a * can be suppressed on an entry specific basis.

Note

The checks need to perform a full table scan on the relevant table. This is not usually a problem but something to remember. Performing multiple checks on a given table in a single invocation will only do a single table scan though.

Output is in markdown formatted tables on the assumption that these issues may end up in a backlog somewhere for correction.

Suppressing Checks for Specific Entries

Some check types can be suppressed for specific DynamoDB table entries by including an x-lava-nocheck field in the table entry. The value is a string identifying a single check type to suppress, or a list of such strings.

For example, the following would suppress the joborphan check for a given job specification:

{
  "job_id": "rarely-run-job",
  "x-lava-nocheck": "joborphan",
  ...
}

This would suppress the joborphan and jobrsu checks:

{
  "job_id": "yet-another-job",
  "x-lava-nocheck": [
    "joborphan",
    "jobrsu"
  ],
  ...
}

Lava Checksum Utility

The lava-checksum utility verifies, adds and updates checksums on entries in the following lava DynamoDB tables.

Note

The checksums are intended for drift detection only. They are not a code signing mechanism and they are not cryptographically sealed.

Usage
usage: lava-checksum [-h] [-f {txt,tty,html,md}]
                     [--hash-algorithm ALGORITHM]
                     [-i] [--profile PROFILE] [-r REALM] [-t TABLE] [-v]
                     [--version]
                     {check,add,update} ...

Set and validate checksums on lava DynamdoDB entries.

positional arguments:
  {check,add,update}
    check               Validate checksums.
    add                 Add missing checksums.
    update              Update existing checksums.

options:
  -h, --help            show this help message and exit
  -f {txt,tty,html,md}, --format {txt,tty,html,md}
                        Output format. Default is "tty" if stdout is a
                        terminal and "txt" otherwise.
  --hash-algorithm ALGORITHM
                        Algorithm to use for checksums. Default is sha256.
  -i, --ignore-case     Matching of glob patterns is case insensitive.
  --profile PROFILE     As for AWS CLI.
  -r REALM, --realm REALM
                        Lava realm name. If not specified, the environment
                        variable LAVA_REALM must be set.
  -t TABLE, --table TABLE
                        Extract from the specified table. This can be one of
                        jobs, connections, s3triggers (or triggers) or realms.
                        Any unique initial sequence is accepted. The default
                        is "jobs".
  -v, --verbose         Increase verbosity. By default, only checksum errors,
                        updates etc are reported. Can be specified multiple
                        times.
  --version             show program's version number and exit

To get help on a sub-command, use -h / --help on the sub-command. e.g.

lava-checksum check --help

Key points to note:

  • The checksums are stored in the entry in the field x-lava-chk.

  • Checksum calculation ignores any field starting with x- or X-.

  • The lava-job-framework generates compatible checksums when deploying entries to the tables.

  • The checksum structure and format are internal to lava and subject to change at the capricious whim of the developer. The lava-checksum utility will manage backward compatibility.

Arguments for the lava-checksum utility shown above must be placed before the sub-command. Arguments specific to sub-command must be placed after the sub-command.

Note:

  • The add sub-command will only add missing checksums and update will only update existing checksums.

  • If any table entries are modified, a ZIP file will be left in the current directory containing the entries before they were updated. Delete this manually if not required.

Examples

# Check all of the jobs in realm "prod"
lava-checksum --realm prod --table jobs -vv check

# Add missing checksums to connections matching app/* in realm "prod"
lava-checksum --realm prod --table conn -vv update 'app/*'

Lava-conn-usage Utility

The lava-conn-usage utility will find the job IDs of jobs that reference specified connectors.

Usage
usage: lava-conn-usage [-h] [--profile PROFILE] [-i] [-r REALM]
                       connector-glob [connector-glob ...]

Find lava jobs that reference specified connectors.

positional arguments:
  connector-glob        Report jobs that use connectors that match any of the
                        specified glob style patterns.

optional arguments:
  -h, --help            show this help message and exit
  --profile PROFILE     As for AWS CLI.
  -i, --ignore-case     Matching is case insensitive.
  -r REALM, --realm REALM
                        Lava realm name. If not specified, the value of the
                        LAVA_REALM environment variable is used. A value must
                        be specified by one of these mechanisms.

For example, the following will find job IDs that reference connectors with IDs containing the string redshift using a glob-style pattern match:

lava-conn-usage -r my-realm '*redshift*'

This can then be used with the lava-job-activity utility to estimate the load lava is placing on particular resources (e.g. a database). See Estimating Lava Load on a Connection.

Info

Only connections referenced in parameters known by lava to hold connection IDs will be found.

Lava DAG Generator

The lava-dag-gen utility generates a DAG specification for lava dag jobs from a dependency matrix. It is provided as part of the standard lava worker installation and is also included in the bin directory with the lava job framework. The lava job framework also provides support for using the utility to automatically generate DAGs at build time.

Usage
usage: lava-dag-gen [-h] [-c] [-g GROUP] [-o] [-p PREFIX] [-r REALM] [-w KEY]
                    [--table TABLE] [-y]
                    source

Generate a DAG specification for lava dag jobs from a dependency matrix.

positional arguments:
  source                Source data for the DAG dependency matrix. CSV, Excel
                        XLSX files and sqlite3 files are supported. The
                        filename suffix is used to determine file type. If the
                        value is not a recognised file type, it is assumed to
                        be a lava database connection ID. In this case the
                        lava realm must be specified via -r, --realm or the
                        LAVA_REALM environment variable. For CSV and Excel,
                        the first column contains successor job names and the
                        first row contains predecessor job names. Any non-
                        empty value in the intersection of row and column
                        indicates a dependency. For database sources, a table
                        with three columns (job_group, job, depends_on) is
                        required. The "job" and "depends_on" columns each
                        contain a single job name. The "depends_on" column may
                        contain a NULL indicating the "job" must be included
                        but has no dependency. There can be multiple rows
                        containing the same "job".

optional arguments:
  -h, --help            show this help message and exit
  -c, --compact         Use a more compact form for singleton and empty
                        dependencies.
  -g GROUP, --group GROUP
                        Select only the specified group of source entries. For
                        CSV files, this is ignored. For Excel files, this
                        specifies the worksheet name and defaults to the first
                        worksheet. For sqlite3 files, this is used as a filter
                        value on the "job_group" column of the source table
                        and defaults to selecting all entries.
  -o, --order           If specified, just print one possible ordering of the
                        jobs instead of the DAG specification.
  -p PREFIX, --prefix PREFIX
                        Prepend the specified prefix to all job IDs.
  -r REALM, --realm REALM
                        Lava realm. Required if the DAG source is specified as
                        a lava connection ID. Defaults to the value of the
                        LAVA_REALM environment variable.
  -w KEY, --wrap KEY    Wrap the DAG specification in the specified map key.
  --table [SCHEMA.]TABLE 
                        Table name for database sources. Default is dag.
  -y, --yaml            Generate YAML output instead of JSON.

Lava-dag-gen can read the dependency information from any of the following:

Columnar Format

Dependency information in columnar format must contain the following three columns (only):

Column Description
job_group An arbitrary grouping label for sets of jobs.
job The job ID of the successor job. If all the jobs in a DAG have a common prefix in the job ID, this can be omitted here and inserted at run-time in the dag job specification.
depends_on The ID of a predecessor job on which the subject job depends. This may be empty/NULL if the job has no dependencies. Once again, a common prefix can be omitted.

Each row contains a single predecessor/successor pair. If a job has multiple predecessors, there will be multiple rows for that job.

Sample DDL for a database:

CREATE TABLE dag
(
    job_group  VARCHAR(50),
    job        VARCHAR(50) NOT NULL,
    depends_on VARCHAR(50)
);

Matrix Format

In matrix format, the first column contains successor job names and the first row contains predecessor job names. Any non-empty value in the intersection of row and column indicates a dependency. Like so:

Jobs J1 J2 J3 J5
J1 x x
J2
J4 x
J4 x
J5

This would result in the following dag payload:

{
    "J1": [
        "J3",
        "J5"
    ],
    "J2": null,
    "J4": [
        "J1",
        "J3"
    ]
}

Note that J5 doesn't require its own entry as it is present as a predecessor of J1 and has no predecessors of its own.

Lava Dispatcher Utility

The lava dispatcher utility is typically run by cron(8) to dispatch jobs on a schedule. It can also be run as a stand-alone utility to dispatch jobs on demand.

Usage
usage: lava-dispatcher [-h] [--profile PROFILE] [-v] [--check-dispatch]
                       [-d DELAY] [-q QUEUE] [-r REALM] [-w WORKER]
                       [-g name=VALUE] [-p name=VALUE] [-c] [-l LEVEL]
                       [--log-json] [--log LOG] [--tag TAG]
                       job-id [job-id ...]

Lava job dispatcher.

options:
  -h, --help            show this help message and exit
  --profile PROFILE     As for AWS CLI.
  -v, --version         show program's version number and exit
  --check-dispatch      If specified, check for the the existence of a
                        dispatch suppression file "/tmp/lava/__nodispatch__".
                        If the file is present, all dispatches are suppressed.
                        This is typically only used for scheduled dispatches
                        when a dispatcher node is in the process of shutting
                        down.

dispatch control options:
  -d DELAY, --delay DELAY
                        Delay dispatch by the specified duration. Default is
                        0. Maximum is 15 minutes.
  -q QUEUE, --queue QUEUE
                        AWS SQS queue name. If not specified, the queue name
                        is derived from the realm and worker name.
  -r REALM, --realm REALM
                        Lava realm name. Defaults to the value of the LAVA
                        REALM environment variable. A value must be specified
                        by one of these mechnisms.
  -w WORKER, --worker WORKER
                        Lava worker name. The worker must be a member of the
                        specified realm. If specified, the worker name must
                        match the value in the job specification. If not
                        specified, the correct value will be looked up in the
                        jobs table.

job options:
  -g name=VALUE, --global name=VALUE
                        Additional global attribute to include in the job
                        dispatch event. This option can be used multiple
                        times. If global names contain dots, they will be
                        converted into a hierachy using the dots as level
                        separators.
  -p name=VALUE, --param name=VALUE
                        Additional parameter to include in the job dispatch
                        event. This option can be used multiple times. If
                        parameter names contain dots, they will be converted
                        into a hierarchy using the dots as level separators.
  job-id                One or more job IDs for the specified realm.

logging arguments:
  -c, --no-colour, --no-color
                        Don't use colour in information messages.
  -l LEVEL, --level LEVEL
                        Print messages of a given severity level or above. The
                        standard logging level names are available but debug,
                        info, warning and error are most useful. The Default
                        is info.
  --log-json            Log messages in JSON format. This is particularly
                        useful when log messages end up in CloudWatch logs as
                        it simplifies searching.
  --log LOG             Log to the specified target. This can be either a file
                        name or a syslog facility with an @ prefix (e.g.
                        @local0).
  --tag TAG             Tag log entries with the specified value. The default
                        is lava-dispatcher.

See also The Lava Dispatch Process.

Info

To enable JSON format logging when performing scheduled dispatches, add --log-json to the args parameter in the lavasched jobs.

Lava Dump Utility

Lava-dump performs a bulk extract of data from a single table to a local directory. It can extract all entries with keys that match any of a list of GLOB style patterns. By default, all entries are extracted.

Usage
usage: lava-dump [-h] [-d DIR] [--profile PROFILE] [-i] [-n] [-r REALM] [-q]
                 [-t TABLE] [-y]
                 [glob-pattern [glob-pattern ...]]

Extract lava configurations from DynamoDB and dump them to files.

positional arguments:
  glob-pattern          Only extract items with keys that match any of the
                        specified glob style patterns. This test is inverted by
                        the -n / --not-match option.

optional arguments:
  -h, --help            show this help message and exit
  -d DIR, --dir DIR     Store files in the specified directory, which will be
                        created if it does npt exist. Defaults to the current
                        directory.
  --profile PROFILE     As for AWS CLI.
  -i, --ignore-case     Matching is case insensitive.
  -n, --not-match       Only extract items with keys thay do not match any of
                        the specified glob patterns.
  -r REALM, --realm REALM
                        Lava realm name. This is required for all tables
                        except the realms table.
  -q, --quiet           Quiet mode.
  -t TABLE, --table TABLE
                        Extract from the specified table. This can be one of
                        jobs, connections, s3triggers (or triggers) or realms.
                        Any unique initial sequence is accepted.
  -y, --yaml            Dump items in YAML format. The default is JSON.

As well as being useful for backup, it is also useful for importing existing items into the lava job framework.

See also lava-backup.

Lava Email Utility

The lava-email utility uses the lava email connector to send emails.

Usage
usage: lava-email [-h] [--profile PROFILE] [-v] -c CONN_ID [-r REALM]
                  [-a FILE] [--bcc EMAIL] [--cc EMAIL] [--from EMAIL]
                  [--reply-to EMAIL] [--to EMAIL] -s SUBJECT [--html FILENAME]
                  [--text FILENAME] [--no-colour] [-l LEVEL] [--log LOG]
                  [--tag TAG]
                  [FILENAME]

Send email using lava email connections.

optional arguments:
  -h, --help            show this help message and exit
  --profile PROFILE     As for AWS CLI.
  -v, --version         show program's version number and exit

lava arguments:
  -c CONN_ID, --conn-id CONN_ID
                        Lava connection ID. Required.
  -r REALM, --realm REALM
                        Lava realm name. If not specified, the environment
                        variable LAVA_REALM must be set.

email arguments:
  -a FILE, --attach FILE
                        Add the specified file as an attachment. Can be a
                        local file or an object in S3 in the form
                        s3://bucket/key. Can be used multiple times.
  --bcc EMAIL           Recipients to place on the Bcc: line of the message.
                        Can be used multiple times.
  --cc EMAIL            Recipients to place on the Cc: line of the message.
                        Can be used multiple times.
  --from EMAIL          Message sender. If not specified, a value must be
                        available in either the connection specification or
                        the realm specification.
  --reply-to EMAIL      Reply-to address of the message. Can be used multiple
                        times.
  --to EMAIL            Recipients to place on the To: line of the message.
                        Can be used multiple times.
  -s SUBJECT, --subject SUBJECT
                        Message subject. Required.

message source arguments:
  At most one of the following arguments is permitted.

  --html FILENAME       This is a legacy argument for backward compatibility.
  --text FILENAME       This is a legacy argument for backward compatibility.
  FILENAME              Name of file containing the message body. If not
                        specified or "-", the body will be read from stdin. An
                        attempt is made to determine if the message is HTML
                        and send it accordingly. Only the first 2MB is read.

logging arguments:
  --no-colour, --no-color
                        Don't use colour in information messages.
  -l LEVEL, --level LEVEL
                        Print messages of a given severity level or above. The
                        standard logging level names are available but debug,
                        info, warning and error are most useful. The Default
                        is info.
  --log LOG             Log to the specified target. This can be either a file
                        name or a syslog facility with an @ prefix (e.g.
                        @local0).
  --tag TAG             Tag log entries with the specified value. The default
                        is lava-email.

Lava Job Activity Utility

The lava-job-activity utility queries the realm events table to generate an activity map of specified jobs over a defined time period. This is useful to see when specified jobs are running, particularly for event triggered jobs.

Usage
usage: lava-job-activity [-h] [--profile PROFILE] [-q] -r REALM [-s START_DTZ]
                         [-e END_DTZ] [--status STATUS] [--dump FILE]
                         [--load FILE] [-i MINUTES]
                         [job-id ...]

Lava event log query utility

positional arguments:
  job-id                Retrieve records for the specified job-id. Required
                        unless --load is used. If used with --load, this acts
                        as a further filter on event records loaded from the
                        dump file.

optional arguments:
  -h, --help            show this help message and exit
  --profile PROFILE     As for AWS CLI.
  -q, --quiet           Don't print progress messages on stderr.
  -r REALM, --realm REALM
                        Lava realm name.

query arguments:
  -s START_DTZ, --start START_DTZ
                        Start datetime. Preferred format is ISO 8601. If a
                        timezone is not specified, UTC is assumed. When using
                        --load, the default is the value from the source file.
                        Otherwise, the default is the most recent midnight
                        (UTC).
  -e END_DTZ, --end END_DTZ
                        End datetime. Preferred format is ISO 8601. If a
                        timezone is not specified, UTC is assumed. When using
                        --load, the default is the value from the source file.
                        Otherwise the default is 24 hours after the start
                        time.
  --status STATUS       Only include events with the given status.

dump / load arguments:
  --dump FILE           Dump the raw data into the specified file in JSON
                        format. The format is suitable for loading using the
                        --load option. If both --load and --store are used,
                        they must be different files.
  --load FILE           Load the raw data from the specified file instead of
                        reading it from DynamoDB. The file will have been
                        produced by a previous run using the --dump option.
                        This allows a set of data to be reprocessed without
                        re-extracting the same data.

output arguments:
  -i MINUTES, --interval MINUTES
                        Aggregate job activity into intervals of the specified
                        duration (minutes). Stick to divisors or multiples of
                        60. Default is 10.

Tip

See also Lava-conn-usage Utility.

The process of extracting data from the events table can be expensive in usage of DynamoDB table read capacity. Hence the extraction process has two optimisations:

  1. Specific job IDs must be requested. This enables full table scans to be avoided on the, often large, events table.

  2. The extracted event data can be stored in a JSON formatted dump file. This file can be read back in subsequent runs of the utility to alter other parameters, such as the aggregation granularity. See the --dump and --load arguments.

The output to stdout is a CSV file containing two tables:

  1. Run-seconds per time-slice by job ID

  2. Run-seconds per time-slice by lava worker.

For example, the following command will extract the data for a given day and job ID. The output CSV will have activity sliced into 10 minute blocks. The raw data is retained for further analysis:

lava-job-activity -r my-realm --start 2024-02-15T00:00:00+11:00 \
    --dump evdata.json job-id > job-10.csv

Each time-slice column in the output CSV will indicate for how many seconds that job run within that time-slice (i.e. run seconds). This can be greater than the number of seconds in the time-slice if the job ran more than once in that slice.

The extracted data can be reprocessed into 4 hour time-slices by reusing the dump file thus:

lava-job-activity -r my-realm --start 2024-02-15T00:00:00+11:00 \
    --dump evdata.json --interval 240 job-id > job-240.csv

The CSV file will contain information something like this:

Job ID 15/2/2024 0:00 15/2/2024 4:00 15/2/2024 8:00 15/2/2024 12:00 15/2/2024 16:00 15/2/2024 20:00
job-id 288 348 376 335 287 344
Worker 15/2/2024 0:00 15/2/2024 4:00 15/2/2024 8:00 15/2/2024 12:00 15/2/2024 16:00 15/2/2024 20:00
core 288 348 376 335 287 344

Info

It is important to keep and re-use dump files where possible, particularly when extracting data for a large number of job IDs. However, the dump file will only contain raw event data requested for the initial set of job IDs and time window. It is not possible to reuse data that wasn't extracted in the first place.

Estimating Lava Load on a Connection

Sometimes it's useful to estimate how much load lava is placing on a particular resource (such as a database) over a defined time window. This can be done by combining the lava-job-activity and lava-conn-usage utilities.

The following example will produce the time-slice view described above for all jobs that reference connectors with IDs starting with redshift:

lava-job-activity -r my-realm --start 2024-02-15T00:00:00+11:00 \
    --dump evdata.json \
    $(lava-conn-usage -r my-realm 'redshift*') > conn-activity.csv

Note that this does not mean the connection was in active use at all times in the activity window. Lava has no way of knowing that. It does give a proxy view of load intensity.

Lava-new Utility

Note

New in v8.2 (Kīlauea).

The lava-new utility is used to create a new lava job framework project.

Usage
usage: lava-new [-h] [-v] [--no-input] [-p KEY=VALUE] directory

Create a new lava job framework project.

positional arguments:
  directory             Create the template source in the specified directory
                        (which must not already exist).

options:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --no-input            Do not prompt for user input. The -p / --param option
                        should be used to specify parameter values.
  -p KEY=VALUE, --param KEY=VALUE
                        Specify default parameters for the underlying
                        cookiecutter used to create the new lava project. Can
                        be used multiple times. Available parameters are
                        config_checks, description, docker_platform,
                        docker_prefix, environment, git_setup,
                        include_lava_libs, job_prefix, jupyter_support, owner,
                        payload_prefix, pip_index_url, project_dir,
                        project_name, realm, rule_prefix, s3trigger_prefix.

Lava Schema Utility

The lava-schema utility performs deep schema validation for lava DynamoDB specification objects.

Usage
usage: lava-schema.py [-h] [-d] [-r REALM] [-s DIRNAME]
                      [-t {job,s3trigger,connection}] [-v]
                      [SPEC ...]

Deep schema validation for lava DynamoDB specification objects.

positional arguments:
  SPEC                  If specified, specifications are read directly from
                        DynamoDB and any SPEC arguments are treated as GLOB style
                        patterns that the ID of the specifications must match. If
                        the -d / --dynamodb option is not specified, JSON formatted
                        lava object specifications are read from the named files.

optional arguments:
  -h, --help            show this help message and exit
  -d, --dynamodb        Read lava specifications from DynamoDB instead of the local
                        file system. The lava realm must be specified, either via
                        the -r / --realm option or the LAVA_REALM environment
                        variable.
  -r REALM, --realm REALM
                        Lava realm name. If not specified, the environment variable
                        LAVA_REALM will be used. If --d / --dynamodb is specified,
                        a value must be specified by one of these mechanisms.
  -s DIRNAME, --schema-dir DIRNAME
                        Directory containing lava schema specifications. Default is
                        /usr/local/lib/lava/lava/lib/schema.
  -t {job,s3trigger,connection}, --type {job,s3trigger,connection}
                        Use the schema appropriate to the specified lava object
                        type. Options are job, s3trigger, connection. The default
                        is job.
  -v, --verbose         Print results for all specifications. By default, only
                        validation failures are printed.

Lava-schema can read specifications directly from DynamoDB or from the local file system. The latter is useful to check the install components produced in the dist directory by a lava job framework project.

Whereas lava-check is focused on basic configuration management hygiene, lava-schema is focused on strict compliance with lava DynamoDB table specifications using detailed JSON Schema specifications. (They may merge at some point.)

Lava-schema vs Lava Worker Validation

As of v7.1.0 (Pichincha), deep schema validation only manifests in the lava-schema utility. The lava worker doesn't use this. Instead, it uses its traditional process of checking just enough to validate that it can try to run the job.

This will change in a future release and the worker will also perform deep schema validation of the fully resolved augmented job specification and the other DynamoDB object types at run-time. Malformed jobs that could run under the current validation process will be rejected outright.

I'm from the Government and I'm here to help you.

Commonly observed configuration errors that the lava worker will tolerate but lava-schema will not include:

  • Malformed action specifications
    These don't prevent the job from running but the malformed actions will fail, potentially causing operational issues or unreported errors.

  • Optional fields in the wrong place
    For example, the timeout parameter of exe jobs occasionally pops up at the top level of the job specification instead of within the job parameters. The lava worker will silently ignore the incorrectly placed timeout and do its best to run the job -- with the default timeout.

  • Imaginary fields
    For example, the sqlc and sqlv job types support a timeout parameter but the sql and sqli jobs do not. If a job is migrated between these types, it's easy to miss the need to add / remove the timeout parameter. The lava worker will happily run the job -- with the default timeout.

  • Incorrect parameter types
    For example, the args parameter of the exe job type expects a list of strings (although numbers are OK also). Booleans are not. In a YAML job specification file in a lava job framework project, it is easy to mix up an argument value of true (boolean) with "true" (string). The first one will get presented to the job payload at run-time as True and the second one will get presented as true. This could be a problem.

  • It will be alright on the night
    When the lava worker runs a job, it grabs the job specification from DynamoDB and then merges in any parameters received as part of the dispatch message to produce the fully resolved augmented job specification. It's tempting to leave out of the job specification parameters that are going to be replaced at run-time anyway. Problem is, this can make it tricky to work out what the job will do just by looking at the job specification. Lava-schema will complain about this because when it does its static analysis, the job is malformed. The lava worker is happy because it gets everything required at run-time. DevOps folk are less than pleased with a partially specified job in DynamoDB that frustrates attempts to diagnose problems.

    It's good practice to include a placeholder in the job specification for every parameter, global and state item the job requires, even if it's replaced at run-time.

Caveats

Because of the way JSON Schema works, some of the error messages that occur when non-compliance is detected can be rather obscure. It's not uncommon for a single specification error to generate multiple error messages. Generally, the error messages give a pretty good indication of what's wrong.

There are, possibly, (rare) occasions when its legitimate for a job specification in DynamoDB to not comply with lava schema requirements but the fully resolved augmented job specification at run-time does comply.

Did I mention that this is rare?

Trying to think of a situation where this is good design practice ...

Still thinking ...

Lava SQL Utility

The lava-sql utility provides a simple, uniform CLI across different database types with connectivity managed by the lava connection subsystem.

Usage
usage: lava-sql [-h] [--profile PROFILE] [-v] [-a NAME] [-b BATCH_SIZE]
                [--format {csv,jsonl,html,parquet}] [--header] [-o FILENAME]
                [--raw] [--transaction] -c CONN_ID [-r REALM] [--no-colour]
                [-l LEVEL] [--log LOG] [--tag TAG] [--delimiter CHAR]
                [--dialect {excel,excel-tab,unix}] [--doublequote]
                [--escapechar CHAR] [--quotechar CHAR] [--quoting QUOTING]
                [--sort-keys]
                [SQL-FILE ...]

Run SQL using lava database connections.

positional arguments:
  SQL-FILE              SQL files. These can be local or in S3 (s3://...). If
                        not specified or "-", stdin is used.

optional arguments:
  -h, --help            show this help message and exit
  --profile PROFILE     As for AWS CLI.
  -v, --version         show program's version number and exit
  -a NAME, --app-name NAME, --application-name NAME
                        Use the specified application name when connecting to
                        the database. Ignored for database types that don't
                        support this concept.
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        Number of records per batch when processing SELECT
                        querues. Default is 1024.
  --format {csv,jsonl,html,parquet}
                        Output format. Default is csv.
  --header              Print a header for SELECT queries (output format
                        dependent).
  -o FILENAME, --output FILENAME
                        Write output to the specified file which may be local
                        or in S3 (s3://...). If not specified, output is
                        written to stdout.
  --raw                 Don't split SQL source files into individual
                        statements. By default, an attempt will be made to
                        split each source file into individual SQL statements.
  --transaction         Disable auto-commit and run all SQLs in a transaction.

lava arguments:
  -c CONN_ID, --conn-id CONN_ID
                        Lava database connection ID. Required.
  -r REALM, --realm REALM
                        Lava realm name. If not specified, the environment
                        variable LAVA_REALM must be set.

logging arguments:
  --no-colour, --no-color
                        Don't use colour in information messages.
  -l LEVEL, --level LEVEL
                        Print messages of a given severity level or above. The
                        standard logging level names are available but debug,
                        info, warning and error are most useful. The Default
                        is info.
  --log LOG             Log to the specified target. This can be either a file
                        name or a syslog facility with an @ prefix (e.g.
                        @local0).
  --tag TAG             Tag log entries with the specified value. The default
                        is lava-sql.

CSV format arguments:
  --delimiter CHAR      Single character field delimiter. Default |.
  --dialect {excel,excel-tab,unix}
                        CSV dialect (as per the Python csv module). Default is
                        excel.
  --doublequote         See Python csv.writer.
  --escapechar CHAR     See Python csv.writer. Escaping is disabled by
                        default.
  --quotechar CHAR      See Python csv.writer. Default is ".
  --quoting QUOTING     As for csv.writer QUOTE_* parameters (without the
                        QUOTE_ prefix). Default is minimal (i.e.
                        QUOTE_MINIMAL).

JSONL format arguments:
  --sort-keys           Sort keys in JSON objects.

Lava-sql can run one or more queries in a transaction and also capture output from SELECT queries in various formats, either to a local file or to AWS S3.

Info

Do not have more than one SELECT query in the batch unless you are deliberately trying to create a mess.

CSV

Note that the default delimiter for csv format is the pipe symbol |, not a comma. The original rationale for this was for consistency with the Redshift COPY and UNLOAD commands. All I can say is that it seemed to make sense at the time.

HTML Format

The output data is encoded as an HTML table with a class of lava-sql. Only the table HTML is produced to allow the output to be incorporated into a larger HTML document. (i.e. no HTML, BODY tags etc.).

Values will be escaped as needed to ensure HTML correctness.

JSONL Format

Each row of output data is encoded as a single line JSON formatted object.

Parquet Format

Parquet compression will generally benefit from a larger batch size. The default of 1024 is reasonable for many purposes but increasing it will often give a smaller output file. Don't get carried away though. Each batch has to be held entirely in memory.

A word of caution about the Parquet schema ... It's quite difficult to handle schema inference in a predictable or consistent way, particularly with data sourced via a DBAPI 2 connector as the standard does not provide any consistency in how, or if, implementations signal type information in query responses. The approach used by lava-sql is to let PyArrow form an educated guess based on the first record batch. This should be fine for most purposes.

Lava State Utility

The lava state utility provides a CLI to the lava state manager.

Usage
usage: lava-state [-h] [--profile PROFILE] [-r REALM] [-v] [--no-colour]
                  [-l LEVEL] [--log LOG] [--tag TAG]
                  {put,get} ...

Manipulate lava state entries.

positional arguments:
  {put,get}
    put                 Add / replace a state entry.
    get                 Get a state entry.

optional arguments:
  -h, --help            show this help message and exit
  --profile PROFILE     As for AWS CLI.
  -r REALM, --realm REALM
                        Lava realm name. If not specified, the environment
                        variable LAVA_REALM must be set.
  -v, --version         show program's version number and exit

logging arguments:
  --no-colour, --no-color
                        Don't use colour in information messages.
  -l LEVEL, --level LEVEL
                        Print messages of a given severity level or above. The
                        standard logging level names are available but debug,
                        info, warning and error are most useful. The Default
                        is info.
  --log LOG             Log to the specified target. This can be either a file
                        name or a syslog facility with an @ prefix (e.g.
                        @local0).
  --tag TAG             Tag log entries with the specified value. The default
                        is lava-state.

Creating a Lava State Item

State items are created with the put sub-command.

Info

Do not create state items with a state_id starting with lava. This prefix is reserved.

Usage: lava-state put
usage: lava-state put [-h] [-p KEY=VALUE | -v VALUE] [--kms-key KMS_KEY]
                      [--publisher PUBLISHER] [--ttl DURATION]
                      [--type STATE_TYPE]
                      state_id

positional arguments:
  state_id              State ID.

optional arguments:
  -h, --help            show this help message and exit
  -p KEY=VALUE, --param KEY=VALUE
                        Add the specified key/value pair to the state item.
                        Can be repeated to set multiple key/value pairs.
  -v VALUE, --value VALUE
                        Set the value to the specified string.
  --kms-key KMS_KEY     The "secure" state item type supports KMS encryption
                        of the value. This argument specifies the KMS key to
                        use, either as a KMS key ARN or a key alias in the
                        form "alias/key-id". Defaults to the "sys" key for the
                        lava realm. Ignored for other state item types.
  --publisher PUBLISHER
                        Set the state item publisher to the specified value.
                        Default is the contents of the LAVA_JOB_ID environment
                        variable, if set, or else "lava-state CLI".
  --ttl DURATION        Time to live as a duration (e.g. 10m, 2h, 1d).
  --type STATE_TYPE     State item type. Options are json, raw, secure.
                        Default is json.

Retrieving a Lava State Item

State items are retrieved with the get sub-command.

Usage: lava-state get
usage: lava-state get [-h] [-i] state_id [template]

positional arguments:
  state_id              State ID.
  template              An optional Jinja2 template that will be rendered with
                        the retrieved value as the "state" and "s" parameters.
                        e.g if set to "{{ state }}" (the default) the value is
                        printed as is.

optional arguments:
  -h, --help            show this help message and exit
  -i, --ignore-missing  Ignore errors for missing state items and return an
                        empty string. By default, attempting to get a non-
                        existent state item is an error.

Lava Stop Utility

The lava-stop utility initiates a controlled shutdown of the lava worker daemons.

Usage
usage: lava-stop [-h] [-D] [--profile PROFILE] [--signal SIGNAL] [-v]
                 [-w DURATION] [--auto-scaling-group-name NAME]
                 [--instance-id ID] [--lifecycle-action-token UUID]
                 [--lifecycle-hook-name NAME] [--lifecycle-heartbeat DURATION]
                 [-c] [-l LEVEL] [--log LOG] [--tag TAG]

Stop lava worker processes.

options:
  -h, --help            show this help message and exit
  -D, --no-dispatch     Inhibit further scheduled dispatches by creating
                        /tmp/lava/__nodispatch__. This requires the lava-
                        dispatcher utility to check for this file by
                        specifying the --check-dispatch argument.
  --profile PROFILE     As for AWS CLI.
  --signal SIGNAL, --sig SIGNAL
                        Send the specified signal to the lava worker
                        processes. Can be specified as a signal name (e.g.
                        SIGHUP or HUP) or a signal number. The default is 0
                        which only tests if the process exists. SIGHUP is
                        interpreted as a controlled shutdown instruction
                        allowing running jobs to complete. SIGTERM is
                        interpreted as a controlled, but immediate,
                        termination that allows final cleanup tasks but takes
                        no account of running jobs. See --w, --wait.
  -v, --version         show program's version number and exit
  -w DURATION, --wait DURATION
                        Wait for up to the specified duration for the lava
                        workers to finish voluntarily before killing them.
                        This requires the signal to be set to SIGHUP / HUP as
                        this is interpreted by the lava worker daemons as a
                        controlled shutdown request. The duration must be in
                        the form nn[X] where nn is a number and X is one of s
                        (seconds), m (minutes) or h (hours). If X is not
                        specified, seconds are assumed.

AWS auto scaling lifecycle options:
  These arguments are designed to complete an AWS auto scaling "EC2
  Instance-terminate Lifecycle Action". See the AWS CLI or AWS auto scaling
  documentation for meaning and usage. Note that the lifecycle action result
  is always set to CONTINUE which means the auto scaling group _will_
  terminate the instance.

  --auto-scaling-group-name NAME
                        Send a complete-lifecycle-action signal for the
                        specified AWS auto scaing group. If specified,
                        --lifecycle-hook-name is also required.
  --instance-id ID      The ID of the EC2 instance (optional). If specified,
                        --auto-scaling-group-name / --lifecycle-hook-name are
                        required.
  --lifecycle-action-token UUID
                        lifecycle action identifier (optional). If specified,
                        --auto-scaling-group-name / --lifecycle-hook-name are
                        required.
  --lifecycle-hook-name NAME
                        The name of the AWS auto scaling lifecycle hook. If
                        specified, --auto-scaling-group-name is also required.
  --lifecycle-heartbeat DURATION
                        Record a heartbeat for the lifecycle action at
                        specified intervals (optional). If specified, --auto-
                        scaling-group-name / --lifecycle-hook-name are
                        required. THe duration must be in the form nn[X] where
                        nn is a number and X is one of s (seconds), m
                        (minutes) or h (hours). If X is not specified, seconds
                        are assumed. The minimum permitted value is 60
                        seconds.

logging arguments:
  -c, --no-colour, --no-color
                        Don't use colour in information messages.
  -l LEVEL, --level LEVEL
                        Print messages of a given severity level or above. The
                        standard logging level names are available but debug,
                        info, warning and error are most useful. The Default
                        is info.
  --log LOG             Log to the specified target. This can be either a file
                        name or a syslog facility with an @ prefix (e.g.
                        @local0).
  --tag TAG             Tag log entries with the specified value. The default
                        is lava-stop.

The process for stopping a lava worker is:

  1. Send it a SIGHUP signal. This tells the worker to complete any in-flight or queued jobs but not to accept any more jobs.

  2. Wait a while.

  3. Send it another SIGHUP signal. The second SIGHUP is a more aggressive shutdown command and will interrupt in-flight jobs but still allow the worker an opportunity to cleanup.

  4. Give it another 10-20 seconds.

  5. If the worker is still running, kill it with SIGKILL.

Lava-stop will do a process listing to find worker processes. It can be used interactively and is also designed for use within an AWS auto scaling lifecycle hook for terminating worker nodes. This is all built in to a standard lava deployment using the provided CloudFormation templates.

Lava Version Utility

The lava-version utility provides version information on the installed lava version.

Usage
usage: lava-version [-h] [-n | -a | --ge VERSION | --eq VERSION]

Print lava version information.

optional arguments:
  -h, --help    show this help message and exit
  -n, --name    Print version name only.
  -a, --all     Print all version inforamtion.
  --ge VERSION  Exit with zero status if the lava version is greater than or
                equal to the specified version.
  --eq VERSION  Exit with zero status if the lava version is equal to the
                specified version.

If no arguments are specified the lava version number is printed.

Lava Worker Status Utility

The lava-ws utility displays worker status information based on the worker SQS queues (queue depths, worker backlog etc.).

Usage
usage: lava-ws [-h] [-f FORMAT] [-l] -r REALM [-w WORKER] [-v]

Get status info about lava workers.

optional arguments:
  -h, --help            show this help message and exit
  -f FORMAT, --format FORMAT
                        Output table format (see below). The formats supported
                        by tabulate (https://pypi.org/project/tabulate/) can
                        be used. The default is fancy_grid.
  -l                    Show more information. Repeat up to 2 times to get
                        more details.
  -r REALM, --realm REALM
                        Lava realm name.
  -w WORKER, --worker WORKER
                        Lava worker name prefix. If not specified, report on
                        all workers in the realm (assumes lava standard queue
                        naming conventions).
  -v, --version         show program's version number and exit

output columns:
  BCKAVG   Average worker backlog in the last 15 minutes
  BCKMAX   Maximum worker backlog in the last 15 minutes
  BCKNOW   Current backlog
  DELAVG   Average run delay in the last 15 minutes
  DELMAX   Maximum run delay in the last 15 minutes
  EC2      Number of running EC2 instances
  EC2TYPE  EC2 instance type
  MSGS     Messages visible
  NVIS     Messages not visible
  QUEUE    SQS queue name
  RET      Message retention period
  VIS      Visibility timeout

output formats:
  fancy_grid  fancy_outline   github           grid       html        jira
  latex       latex_booktabs  latex_longtable  latex_raw  mediawiki   moinmoin
  orgtbl      pipe            plain            presto     pretty      psql
  rst         simple          textile          tsv        unsafehtml  youtrack

Unlike lava-ps, which displays worker process information, lava-ws does not need to run on the worker host.