Lava Commands and Utilities
The lava utilities are included in the
jinlava Python package together with the lava APIs.
Lava includes a number of CLI commands and utilities. Some of these are used by
the lava worker itself while others are support tools. Most of them can be used
stand-alone or invoked by lava exe, pkg and
docker jobs.
All of the utilities support a -h , --help option.
| Utility |
Description |
| jinja |
Render a Jinja template with specified parameters. |
| lava-ami |
Manage the AMIs used in lava worker CloudFormation stacks. |
| lava-backup |
Backup the DynamoDB entries for a specified lava realm. |
| lava-check |
Perform some basic health checks on DynamoDB table entries. |
| lava-checksum |
Set and validate checksums on lava DynamoDB table entries. |
| lava-conn-usage |
Generate an activity map of specified jobs over a defined time period. |
| lava-dag-gen |
Generate a DAG specification for lava dag jobs from a dependency matrix. |
| lava-dispatcher |
The lava job dispatcher. |
| lava-dump |
Extract lava configurations from DynamoDB and dump them to files. |
| lava-email |
Send email using lava email connections. |
| lava-events |
Query the lava events table. |
| lava-job-activity |
Query the lava realm events table to map job activity over a specified time window. |
| lava-new |
Create a new lava job framework project. |
| lava-ps |
Show lava worker process information on the current host. |
| lava-schema |
Perform deep schema validation for lava DynamoDB specification objects. |
| lava-sharepoint |
Operate on SharePoint sites using lava sharepoint connections. |
| lava-slack |
Send Slack messages using lava slack connections. |
| lava-smb |
Operate on SMB file shares using lava smb connections. |
| lava-sql |
Run SQL using lava database connections. |
| lava-state |
Manipulate lava state items. |
| lava-stop |
Perform a controlled shutdown of the lava worker daemons. This also support AWS auto scaling lifecycle hooks. |
| lava-version |
Provides version information on the installed lava version. |
| lava-worker |
The main lava worker. |
| lava-ws |
Show lava worker status based on the worker SQS queues (queue depths, worker backlog etc.). This does not need to run on the worker host. |
| s3lambda |
Trigger an AWS Lambda function by generating synthetic S3 event notifications. |
Lava Icons
The following are the official lava icons. These used to change based on a whim,
same as AWS architecture icons, but are now stable.
PNG
SVG
Lava AMI Utility
The lava-ami utility displays the available, lava compatible, AMIs and the
AMI specified in each lava worker CloudFormation stack.
Usage
usage: lava-ami [-h] [-n N] [--sak] [--profile PROFILE] [-U] [-v] [-W]
[STACK-NAME ...]
Manage the AMIs used in lava worker CloudFormation stacks.
positional arguments:
STACK-NAME CloudFormation stack name for a lava worker. Glob style
patterns can be used. If not specified, or any of the
patterns is *, the -U / --update option is not permitted.
optional arguments:
-h, --help show this help message and exit
-n N Only include specified number of most recent images of
each type in the selection list. Default 5.
--profile PROFILE As for AWS CLI.
-U, --update Initiate an interactive update process to allow a new AMI
to be applied for selected stacks. If specified, one or
more stack patterns must be specified (no single *) to
make it harder to maniacally update a whole bunch of
stacks in one go. You can thank me later.
-v, --version show program's version number and exit
-W, --no-wait Don't wait for CloudFormation stack updates to complete.
Lava-ami also provides an update mode that allows a (more or less)
interactive process to select and apply a different AMI to one or more lava
worker stacks. This process is a lot simpler and less error prone than trying to
manage the AMI used on multiple workers in the AWS CloudFormation console.
Lava-ami will silently ignore worker stacks that appear to be parasitic
workers hosted on another worker instance. These are detected by the absence of
a machine type or AMI ID parameter in the CloudFormation stack. If these need to
be updated, use the AWS CloudFormation console.
Lava-ami is conservative in its definition of lava compatibility for an AMI.
The lava worker itself can run on any Linux machine with the right prerequisite
components installed but these two images support the deployment and
bootstrapping processes preferred in lava operational environments.
Lava-ami will highlight the most recent lava AMI in its output.
Lava Backup Utility
Lava-backup performs a complete extract of all of the configuration tables
for a given realm and stores the result in a zip file, either locally or in AWS
S3.
Usage
usage: lava-backup [options] realm zip-file
Backup the DynamoDB entries for a specified lava realm. The output is a zip file.
positional arguments:
realm Realm name.
zip-file Name of the output zip file. Can be on the local machine
or in S3 (s3://....).
optional arguments:
-h | --help Print help and exit.
-y | --yaml Output the entries in YAML format. The default is JSON.
Lava-backup uses the lava-dump utility under the
covers. It can be run as a lava cmd job if required. A lava
job specification suitable for backing up the current realm is:
{
"description": "Backup the DynamoDB entries for the realm",
"dispatcher": "Sydney",
"enabled": true,
"job_id": "lava/dynamo-backup",
"owner": "lava",
"parameters": {
"args": [
"{{realm.realm}}",
"{{realm.s3_temp}}/lava/dynamo-backup/{{ustart.strftime('%Y-%m-%d')}}.zip"
]
},
"payload": "lava-backup",
"schedule": "0 19 * * *",
"type": "cmd",
"worker": "core"
}
Note
Dispatcher, worker and schedule will need to be adjusted in the example.
Lava Check Utility
The lava-check utility performs some basic health checks on DynamoDB table
entries.
Usage
usage: lava-check [-h] [-c GLOB] [--profile PROFILE] [-r REALM] [-S] [-v]
[--no-colour] [-l LEVEL] [--log LOG] [--tag TAG]
Check lava specifications for problems.
options:
-h, --help show this help message and exit
-c GLOB, --check GLOB
Run the health checks with names matching the given
glob patterns. Can be used multiple times. If not
specified, print a list of available checks.
--profile PROFILE As for AWS CLI.
-r REALM, --realm REALM
Lava realm name. If not specified, the environment
variable LAVA_REALM must be set.
-S, --no-suppress Disable suppression of checks for specific DynamoDB
entries via the x-lava-nocheck field. By default
suppression of specific checks is permitted for some
check types.
-v, --version show program's version number and exit
logging arguments:
--no-colour, --no-color
Don't use colour in information messages.
-l LEVEL, --level LEVEL
Print messages of a given severity level or above. The
standard logging level names are available but debug,
info, warning and error are most useful. The Default
is info.
--log LOG Log to the specified target. This can be either a file
name or a syslog facility with an @ prefix (e.g.
@local0).
--tag TAG Tag log entries with the specified value. The default
is lava-check.
See also lava-schema.
Lava-check supports the following checks:
| Check Type |
Description |
| conmeta |
Connection specs with missing metadata (e.g. description, owner). |
| jobjinja * |
Job specs with Jinja rendering issues. This includes jobs that use globals for which there is no placeholder entry in the job specification, referred to as undeclared globals. While lava tolerates undeclared globals, it is good practice to declare them with a placeholder value. |
| jobmeta |
Job specs with missing metadata (e.g. description, owner). |
| joborphan * |
Jobs with no recorded run events. |
| jobrepo * |
Job specs that don't appear to have an associated repo (no x-lava-git-repo field). |
| jobrsu * |
redshift_unload jobs with insecure set to true. |
| trigmeta |
S3trigger specs with missing metadata (e.g. description, owner). |
The checks marked with a * can be
suppressed on an entry specific basis.
Note
The checks need to perform a full table scan on the relevant table. This is
not usually a problem but something to remember. Performing multiple checks
on a given table in a single invocation will only do a single table scan
though.
Output is in markdown formatted tables on the assumption that these issues may
end up in a backlog somewhere for correction.
Suppressing Checks for Specific Entries
Some check types can be suppressed for specific DynamoDB table entries by
including an x-lava-nocheck field in the table entry. The value is a string
identifying a single check type to suppress, or a list of such strings.
For example, the following would suppress the joborphan check for a given job
specification:
{
"job_id": "rarely-run-job",
"x-lava-nocheck": "joborphan",
...
}
This would suppress the joborphan and jobrsu checks:
{
"job_id": "yet-another-job",
"x-lava-nocheck": [
"joborphan",
"jobrsu"
],
...
}
Lava Checksum Utility
The lava-checksum utility verifies, adds and updates checksums on entries in
the following lava DynamoDB tables.
Note
The checksums are intended for drift detection only. They are not a code
signing mechanism and they are not cryptographically sealed.
Usage
usage: lava-checksum [-h] [-f {txt,tty,html,md}]
[--hash-algorithm ALGORITHM]
[-i] [--profile PROFILE] [-r REALM] [-t TABLE] [-v]
[--version]
{check,add,update} ...
Set and validate checksums on lava DynamdoDB entries.
positional arguments:
{check,add,update}
check Validate checksums.
add Add missing checksums.
update Update existing checksums.
options:
-h, --help show this help message and exit
-f {txt,tty,html,md}, --format {txt,tty,html,md}
Output format. Default is "tty" if stdout is a
terminal and "txt" otherwise.
--hash-algorithm ALGORITHM
Algorithm to use for checksums. Default is sha256.
-i, --ignore-case Matching of glob patterns is case insensitive.
--profile PROFILE As for AWS CLI.
-r REALM, --realm REALM
Lava realm name. If not specified, the environment
variable LAVA_REALM must be set.
-t TABLE, --table TABLE
Extract from the specified table. This can be one of
jobs, connections, s3triggers (or triggers) or realms.
Any unique initial sequence is accepted. The default
is "jobs".
-v, --verbose Increase verbosity. By default, only checksum errors,
updates etc are reported. Can be specified multiple
times.
--version show program's version number and exit
To get help on a sub-command, use -h / --help on the sub-command. e.g.
lava-checksum check --help
Key points to note:
-
The checksums are stored in the entry in the field x-lava-chk.
-
Checksum calculation ignores any field starting with x- or X-.
-
The lava-job-framework generates compatible checksums
when deploying entries to the tables.
-
The checksum structure and format are internal to lava and subject to change
at the capricious whim of the developer. The lava-checksum utility will
manage backward compatibility.
Arguments for the lava-checksum utility shown above must be placed before
the sub-command. Arguments specific to sub-command must be placed after the
sub-command.
Note:
-
The add sub-command will only add missing checksums and update will only
update existing checksums.
-
If any table entries are modified, a ZIP file will be left in the current
directory containing the entries before they were updated. Delete this
manually if not required.
Examples
# Check all of the jobs in realm "prod"
lava-checksum --realm prod --table jobs -vv check
# Add missing checksums to connections matching app/* in realm "prod"
lava-checksum --realm prod --table conn -vv update 'app/*'
Lava-conn-usage Utility
The lava-conn-usage utility will find the job IDs of jobs that reference
specified connectors.
Usage
usage: lava-conn-usage [-h] [--profile PROFILE] [-i] [-r REALM]
connector-glob [connector-glob ...]
Find lava jobs that reference specified connectors.
positional arguments:
connector-glob Report jobs that use connectors that match any of the
specified glob style patterns.
optional arguments:
-h, --help show this help message and exit
--profile PROFILE As for AWS CLI.
-i, --ignore-case Matching is case insensitive.
-r REALM, --realm REALM
Lava realm name. If not specified, the value of the
LAVA_REALM environment variable is used. A value must
be specified by one of these mechanisms.
For example, the following will find job IDs that reference connectors with IDs
containing the string redshift using a glob-style pattern match:
lava-conn-usage -r my-realm '*redshift*'
This can then be used with the lava-job-activity
utility to estimate the load lava is placing on particular resources (e.g. a
database). See Estimating Lava Load on a
Connection.
Info
Only connections referenced in parameters known by lava to hold connection IDs
will be found.
Lava DAG Generator
The lava-dag-gen utility generates a DAG specification for lava
dag jobs from a dependency matrix. It is
provided as part of the standard lava worker installation and is also included
in the bin directory with the
lava job framework. The lava job
framework also provides support for using the utility to automatically generate
DAGs at build time.
Usage
usage: lava-dag-gen [-h] [-c] [-g GROUP] [-o] [-p PREFIX] [-r REALM] [-w KEY]
[--table TABLE] [-y]
source
Generate a DAG specification for lava dag jobs from a dependency matrix.
positional arguments:
source Source data for the DAG dependency matrix. CSV, Excel
XLSX files and sqlite3 files are supported. The
filename suffix is used to determine file type. If the
value is not a recognised file type, it is assumed to
be a lava database connection ID. In this case the
lava realm must be specified via -r, --realm or the
LAVA_REALM environment variable. For CSV and Excel,
the first column contains successor job names and the
first row contains predecessor job names. Any non-
empty value in the intersection of row and column
indicates a dependency. For database sources, a table
with three columns (job_group, job, depends_on) is
required. The "job" and "depends_on" columns each
contain a single job name. The "depends_on" column may
contain a NULL indicating the "job" must be included
but has no dependency. There can be multiple rows
containing the same "job".
optional arguments:
-h, --help show this help message and exit
-c, --compact Use a more compact form for singleton and empty
dependencies.
-g GROUP, --group GROUP
Select only the specified group of source entries. For
CSV files, this is ignored. For Excel files, this
specifies the worksheet name and defaults to the first
worksheet. For sqlite3 files, this is used as a filter
value on the "job_group" column of the source table
and defaults to selecting all entries.
-o, --order If specified, just print one possible ordering of the
jobs instead of the DAG specification.
-p PREFIX, --prefix PREFIX
Prepend the specified prefix to all job IDs.
-r REALM, --realm REALM
Lava realm. Required if the DAG source is specified as
a lava connection ID. Defaults to the value of the
LAVA_REALM environment variable.
-w KEY, --wrap KEY Wrap the DAG specification in the specified map key.
--table [SCHEMA.]TABLE
Table name for database sources. Default is dag.
-y, --yaml Generate YAML output instead of JSON.
Lava-dag-gen can read the dependency information from any of the following:
Dependency information in columnar format must contain the following three
columns (only):
| Column |
Description |
| job_group |
An arbitrary grouping label for sets of jobs. |
| job |
The job ID of the successor job. If all the jobs in a DAG have a common prefix in the job ID, this can be omitted here and inserted at run-time in the dag job specification. |
| depends_on |
The ID of a predecessor job on which the subject job depends. This may be empty/NULL if the job has no dependencies. Once again, a common prefix can be omitted. |
Each row contains a single predecessor/successor pair. If a job has multiple
predecessors, there will be multiple rows for that job.
Sample DDL for a database:
CREATE TABLE dag
(
job_group VARCHAR(50),
job VARCHAR(50) NOT NULL,
depends_on VARCHAR(50)
);
In matrix format, the first column contains successor job names and the first
row contains predecessor job names. Any non-empty value in the intersection of
row and column indicates a dependency. Like so:
| Jobs |
J1 |
J2 |
J3 |
J5 |
| J1 |
|
|
x |
x |
| J2 |
|
|
|
|
| J4 |
x |
|
|
|
| J4 |
|
|
x |
|
| J5 |
|
|
|
|
This would result in the following dag payload:
{
"J1": [
"J3",
"J5"
],
"J2": null,
"J4": [
"J1",
"J3"
]
}
Note that J5 doesn't require its own entry as it is present as a predecessor
of J1 and has no predecessors of its own.
Lava Dispatcher Utility
The lava dispatcher utility is typically run by cron(8) to dispatch jobs on
a schedule. It can also be run as a stand-alone utility to dispatch jobs on
demand.
Usage
usage: lava-dispatcher [-h] [--profile PROFILE] [-v] [--check-dispatch]
[-d DELAY] [-q QUEUE] [-r REALM] [-w WORKER]
[-g name=VALUE] [-p name=VALUE] [-c] [-l LEVEL]
[--log-json] [--log LOG] [--tag TAG]
job-id [job-id ...]
Lava job dispatcher.
options:
-h, --help show this help message and exit
--profile PROFILE As for AWS CLI.
-v, --version show program's version number and exit
--check-dispatch If specified, check for the the existence of a
dispatch suppression file "/tmp/lava/__nodispatch__".
If the file is present, all dispatches are suppressed.
This is typically only used for scheduled dispatches
when a dispatcher node is in the process of shutting
down.
dispatch control options:
-d DELAY, --delay DELAY
Delay dispatch by the specified duration. Default is
0. Maximum is 15 minutes.
-q QUEUE, --queue QUEUE
AWS SQS queue name. If not specified, the queue name
is derived from the realm and worker name.
-r REALM, --realm REALM
Lava realm name. Defaults to the value of the LAVA
REALM environment variable. A value must be specified
by one of these mechnisms.
-w WORKER, --worker WORKER
Lava worker name. The worker must be a member of the
specified realm. If specified, the worker name must
match the value in the job specification. If not
specified, the correct value will be looked up in the
jobs table.
job options:
-g name=VALUE, --global name=VALUE
Additional global attribute to include in the job
dispatch event. This option can be used multiple
times. If global names contain dots, they will be
converted into a hierachy using the dots as level
separators.
-p name=VALUE, --param name=VALUE
Additional parameter to include in the job dispatch
event. This option can be used multiple times. If
parameter names contain dots, they will be converted
into a hierarchy using the dots as level separators.
job-id One or more job IDs for the specified realm.
logging arguments:
-c, --no-colour, --no-color
Don't use colour in information messages.
-l LEVEL, --level LEVEL
Print messages of a given severity level or above. The
standard logging level names are available but debug,
info, warning and error are most useful. The Default
is info.
--log-json Log messages in JSON format. This is particularly
useful when log messages end up in CloudWatch logs as
it simplifies searching.
--log LOG Log to the specified target. This can be either a file
name or a syslog facility with an @ prefix (e.g.
@local0).
--tag TAG Tag log entries with the specified value. The default
is lava-dispatcher.
See also The Lava Dispatch Process.
Info
To enable JSON format logging when performing scheduled dispatches, add
--log-json to the args parameter in the lavasched
jobs.
Lava Dump Utility
Lava-dump performs a bulk extract of data from a single table to a local
directory. It can extract all entries with keys that match any of a list of GLOB
style patterns. By default, all entries are extracted.
Usage
usage: lava-dump [-h] [-d DIR] [--profile PROFILE] [-i] [-n] [-r REALM] [-q]
[-t TABLE] [-y]
[glob-pattern [glob-pattern ...]]
Extract lava configurations from DynamoDB and dump them to files.
positional arguments:
glob-pattern Only extract items with keys that match any of the
specified glob style patterns. This test is inverted by
the -n / --not-match option.
optional arguments:
-h, --help show this help message and exit
-d DIR, --dir DIR Store files in the specified directory, which will be
created if it does npt exist. Defaults to the current
directory.
--profile PROFILE As for AWS CLI.
-i, --ignore-case Matching is case insensitive.
-n, --not-match Only extract items with keys thay do not match any of
the specified glob patterns.
-r REALM, --realm REALM
Lava realm name. This is required for all tables
except the realms table.
-q, --quiet Quiet mode.
-t TABLE, --table TABLE
Extract from the specified table. This can be one of
jobs, connections, s3triggers (or triggers) or realms.
Any unique initial sequence is accepted.
-y, --yaml Dump items in YAML format. The default is JSON.
As well as being useful for backup, it is also useful for importing existing
items into the lava job framework.
See also lava-backup.
Lava Email Utility
The lava-email utility uses the lava email
connector to send emails.
Usage
usage: lava-email [-h] [--profile PROFILE] [-v] -c CONN_ID [-r REALM]
[-a FILE] [--bcc EMAIL] [--cc EMAIL] [--from EMAIL]
[--reply-to EMAIL] [--to EMAIL] -s SUBJECT [--html FILENAME]
[--text FILENAME] [--no-colour] [-l LEVEL] [--log LOG]
[--tag TAG]
[FILENAME]
Send email using lava email connections.
optional arguments:
-h, --help show this help message and exit
--profile PROFILE As for AWS CLI.
-v, --version show program's version number and exit
lava arguments:
-c CONN_ID, --conn-id CONN_ID
Lava connection ID. Required.
-r REALM, --realm REALM
Lava realm name. If not specified, the environment
variable LAVA_REALM must be set.
email arguments:
-a FILE, --attach FILE
Add the specified file as an attachment. Can be a
local file or an object in S3 in the form
s3://bucket/key. Can be used multiple times.
--bcc EMAIL Recipients to place on the Bcc: line of the message.
Can be used multiple times.
--cc EMAIL Recipients to place on the Cc: line of the message.
Can be used multiple times.
--from EMAIL Message sender. If not specified, a value must be
available in either the connection specification or
the realm specification.
--reply-to EMAIL Reply-to address of the message. Can be used multiple
times.
--to EMAIL Recipients to place on the To: line of the message.
Can be used multiple times.
-s SUBJECT, --subject SUBJECT
Message subject. Required.
message source arguments:
At most one of the following arguments is permitted.
--html FILENAME This is a legacy argument for backward compatibility.
--text FILENAME This is a legacy argument for backward compatibility.
FILENAME Name of file containing the message body. If not
specified or "-", the body will be read from stdin. An
attempt is made to determine if the message is HTML
and send it accordingly. Only the first 2MB is read.
logging arguments:
--no-colour, --no-color
Don't use colour in information messages.
-l LEVEL, --level LEVEL
Print messages of a given severity level or above. The
standard logging level names are available but debug,
info, warning and error are most useful. The Default
is info.
--log LOG Log to the specified target. This can be either a file
name or a syslog facility with an @ prefix (e.g.
@local0).
--tag TAG Tag log entries with the specified value. The default
is lava-email.
Lava Job Activity Utility
The lava-job-activity utility queries the realm
events table to generate an activity map of specified jobs
over a defined time period. This is useful to see when specified jobs are
running, particularly for event triggered jobs.
Usage
usage: lava-job-activity [-h] [--profile PROFILE] [-q] -r REALM [-s START_DTZ]
[-e END_DTZ] [--status STATUS] [--dump FILE]
[--load FILE] [-i MINUTES]
[job-id ...]
Lava event log query utility
positional arguments:
job-id Retrieve records for the specified job-id. Required
unless --load is used. If used with --load, this acts
as a further filter on event records loaded from the
dump file.
optional arguments:
-h, --help show this help message and exit
--profile PROFILE As for AWS CLI.
-q, --quiet Don't print progress messages on stderr.
-r REALM, --realm REALM
Lava realm name.
query arguments:
-s START_DTZ, --start START_DTZ
Start datetime. Preferred format is ISO 8601. If a
timezone is not specified, UTC is assumed. When using
--load, the default is the value from the source file.
Otherwise, the default is the most recent midnight
(UTC).
-e END_DTZ, --end END_DTZ
End datetime. Preferred format is ISO 8601. If a
timezone is not specified, UTC is assumed. When using
--load, the default is the value from the source file.
Otherwise the default is 24 hours after the start
time.
--status STATUS Only include events with the given status.
dump / load arguments:
--dump FILE Dump the raw data into the specified file in JSON
format. The format is suitable for loading using the
--load option. If both --load and --store are used,
they must be different files.
--load FILE Load the raw data from the specified file instead of
reading it from DynamoDB. The file will have been
produced by a previous run using the --dump option.
This allows a set of data to be reprocessed without
re-extracting the same data.
output arguments:
-i MINUTES, --interval MINUTES
Aggregate job activity into intervals of the specified
duration (minutes). Stick to divisors or multiples of
60. Default is 10.
The process of extracting data from the events table can be expensive in usage
of DynamoDB table read capacity. Hence the extraction process has two
optimisations:
-
Specific job IDs must be requested. This enables full table scans to be
avoided on the, often large, events table.
-
The extracted event data can be stored in a JSON formatted dump file. This
file can be read back in subsequent runs of the utility to alter other
parameters, such as the aggregation granularity. See the --dump and
--load arguments.
The output to stdout is a CSV file containing two tables:
-
Run-seconds per time-slice by job ID
-
Run-seconds per time-slice by lava worker.
For example, the following command will extract the data for a given day and
job ID. The output CSV will have activity sliced into 10 minute blocks. The raw
data is retained for further analysis:
lava-job-activity -r my-realm --start 2024-02-15T00:00:00+11:00 \
--dump evdata.json job-id > job-10.csv
Each time-slice column in the output CSV will indicate for how many seconds that
job run within that time-slice (i.e. run seconds). This can be greater than
the number of seconds in the time-slice if the job ran more than once in that
slice.
The extracted data can be reprocessed into 4 hour time-slices by reusing the
dump file thus:
lava-job-activity -r my-realm --start 2024-02-15T00:00:00+11:00 \
--dump evdata.json --interval 240 job-id > job-240.csv
The CSV file will contain information something like this:
| Job ID |
15/2/2024 0:00 |
15/2/2024 4:00 |
15/2/2024 8:00 |
15/2/2024 12:00 |
15/2/2024 16:00 |
15/2/2024 20:00 |
| job-id |
288 |
348 |
376 |
335 |
287 |
344 |
| Worker |
15/2/2024 0:00 |
15/2/2024 4:00 |
15/2/2024 8:00 |
15/2/2024 12:00 |
15/2/2024 16:00 |
15/2/2024 20:00 |
| core |
288 |
348 |
376 |
335 |
287 |
344 |
Info
It is important to keep and re-use dump files where possible, particularly
when extracting data for a large number of job IDs. However, the dump file
will only contain raw event data requested for the initial set of job IDs
and time window. It is not possible to reuse data that wasn't extracted in
the first place.
Estimating Lava Load on a Connection
Sometimes it's useful to estimate how much load lava is placing on a particular
resource (such as a database) over a defined time window. This can be done by
combining the lava-job-activity and lava-conn-usage
utilities.
The following example will produce the time-slice view described above for all
jobs that reference connectors with IDs starting with redshift:
lava-job-activity -r my-realm --start 2024-02-15T00:00:00+11:00 \
--dump evdata.json \
$(lava-conn-usage -r my-realm 'redshift*') > conn-activity.csv
Note that this does not mean the connection was in active use at all times in
the activity window. Lava has no way of knowing that. It does give a proxy view
of load intensity.
Lava-new Utility
Note
New in v8.2 (Kīlauea).
The lava-new utility is used to create a new
lava job framework project.
Usage
usage: lava-new [-h] [-v] [--no-input] [-p KEY=VALUE] directory
Create a new lava job framework project.
positional arguments:
directory Create the template source in the specified directory
(which must not already exist).
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--no-input Do not prompt for user input. The -p / --param option
should be used to specify parameter values.
-p KEY=VALUE, --param KEY=VALUE
Specify default parameters for the underlying
cookiecutter used to create the new lava project. Can
be used multiple times. Available parameters are
config_checks, description, docker_platform,
docker_prefix, environment, git_setup,
include_lava_libs, job_prefix, jupyter_support, owner,
payload_prefix, pip_index_url, project_dir,
project_name, realm, rule_prefix, s3trigger_prefix.
Lava Schema Utility
The lava-schema utility performs deep schema validation for lava DynamoDB
specification objects.
Usage
usage: lava-schema.py [-h] [-d] [-r REALM] [-s DIRNAME]
[-t {job,s3trigger,connection}] [-v]
[SPEC ...]
Deep schema validation for lava DynamoDB specification objects.
positional arguments:
SPEC If specified, specifications are read directly from
DynamoDB and any SPEC arguments are treated as GLOB style
patterns that the ID of the specifications must match. If
the -d / --dynamodb option is not specified, JSON formatted
lava object specifications are read from the named files.
optional arguments:
-h, --help show this help message and exit
-d, --dynamodb Read lava specifications from DynamoDB instead of the local
file system. The lava realm must be specified, either via
the -r / --realm option or the LAVA_REALM environment
variable.
-r REALM, --realm REALM
Lava realm name. If not specified, the environment variable
LAVA_REALM will be used. If --d / --dynamodb is specified,
a value must be specified by one of these mechanisms.
-s DIRNAME, --schema-dir DIRNAME
Directory containing lava schema specifications. Default is
/usr/local/lib/lava/lava/lib/schema.
-t {job,s3trigger,connection}, --type {job,s3trigger,connection}
Use the schema appropriate to the specified lava object
type. Options are job, s3trigger, connection. The default
is job.
-v, --verbose Print results for all specifications. By default, only
validation failures are printed.
Lava-schema can read specifications directly from DynamoDB or from the local
file system. The latter is useful to check the install components produced in
the dist directory by a lava job framework project.
Whereas lava-check is focused on basic configuration
management hygiene, lava-schema is focused on strict compliance with
lava DynamoDB table specifications using detailed
JSON Schema specifications. (They may merge at some
point.)
Lava-schema vs Lava Worker Validation
As of v7.1.0 (Pichincha), deep schema validation only manifests in the
lava-schema utility. The lava worker doesn't use this. Instead, it uses
its traditional process of checking just enough to validate that it
can try to run the job.
This will change in a future release and the worker will also perform deep
schema validation of the fully resolved augmented job
specification and the other DynamoDB object
types at run-time. Malformed jobs that could run under the
current validation process will be rejected outright.
I'm from the Government and I'm here to help you.
Commonly observed configuration errors that the lava worker will tolerate but
lava-schema will not include:
-
Malformed action specifications
These don't prevent the job from running but the malformed actions will
fail, potentially causing operational issues or unreported errors.
-
Optional fields in the wrong place
For example, the timeout parameter of exe jobs
occasionally pops up at the top level of the job specification instead of
within the job parameters. The lava worker will silently ignore the
incorrectly placed timeout and do its best to run the job -- with the
default timeout.
-
Imaginary fields
For example, the sqlc and sqlv job types
support a timeout parameter but the sql and
sqli jobs do not. If a job is migrated between these
types, it's easy to miss the need to add / remove the timeout parameter.
The lava worker will happily run the job -- with the default timeout.
-
Incorrect parameter types
For example, the args parameter of the exe job type
expects a list of strings (although numbers are OK also). Booleans
are not. In a YAML job specification file in a
lava job framework project, it is easy to mix up
an argument value of true (boolean) with "true" (string). The first one
will get presented to the job payload at run-time as True and the second
one will get presented as true. This could be a problem.
-
It will be alright on the night
When the lava worker runs a job, it grabs the job specification from
DynamoDB and then merges in any parameters received as part of the dispatch
message to produce the fully resolved
augmented job specification. It's
tempting to leave out of the job specification parameters that are going to
be replaced at run-time anyway. Problem is, this can make it tricky to work
out what the job will do just by looking at the job specification.
Lava-schema will complain about this because when it does its static
analysis, the job is malformed. The lava worker is happy because it gets
everything required at run-time. DevOps folk are less than pleased with a
partially specified job in DynamoDB that frustrates attempts to diagnose
problems.
It's good practice to include a placeholder in the job specification for
every parameter, global and state item the job requires, even if it's
replaced at run-time.
Caveats
Because of the way JSON Schema works, some of the error messages that occur when
non-compliance is detected can be rather obscure. It's not uncommon for a
single specification error to generate multiple error messages. Generally, the
error messages give a pretty good indication of what's wrong.
There are, possibly, (rare) occasions when its legitimate for a job
specification in DynamoDB to not comply with lava schema requirements but the
fully resolved augmented job specification
at run-time does comply.
Did I mention that this is rare?
Trying to think of a situation where this is good design practice ...
Still thinking ...
Lava SQL Utility
The lava-sql utility provides a simple, uniform CLI across different
database types with connectivity managed by the lava connection subsystem.
Usage
usage: lava-sql [-h] [--profile PROFILE] [-v] [-a NAME] [-b BATCH_SIZE]
[--format {csv,jsonl,html,parquet}] [--header] [-o FILENAME]
[--raw] [--transaction] -c CONN_ID [-r REALM] [--no-colour]
[-l LEVEL] [--log LOG] [--tag TAG] [--delimiter CHAR]
[--dialect {excel,excel-tab,unix}] [--doublequote]
[--escapechar CHAR] [--quotechar CHAR] [--quoting QUOTING]
[--sort-keys]
[SQL-FILE ...]
Run SQL using lava database connections.
positional arguments:
SQL-FILE SQL files. These can be local or in S3 (s3://...). If
not specified or "-", stdin is used.
optional arguments:
-h, --help show this help message and exit
--profile PROFILE As for AWS CLI.
-v, --version show program's version number and exit
-a NAME, --app-name NAME, --application-name NAME
Use the specified application name when connecting to
the database. Ignored for database types that don't
support this concept.
-b BATCH_SIZE, --batch-size BATCH_SIZE
Number of records per batch when processing SELECT
querues. Default is 1024.
--format {csv,jsonl,html,parquet}
Output format. Default is csv.
--header Print a header for SELECT queries (output format
dependent).
-o FILENAME, --output FILENAME
Write output to the specified file which may be local
or in S3 (s3://...). If not specified, output is
written to stdout.
--raw Don't split SQL source files into individual
statements. By default, an attempt will be made to
split each source file into individual SQL statements.
--transaction Disable auto-commit and run all SQLs in a transaction.
lava arguments:
-c CONN_ID, --conn-id CONN_ID
Lava database connection ID. Required.
-r REALM, --realm REALM
Lava realm name. If not specified, the environment
variable LAVA_REALM must be set.
logging arguments:
--no-colour, --no-color
Don't use colour in information messages.
-l LEVEL, --level LEVEL
Print messages of a given severity level or above. The
standard logging level names are available but debug,
info, warning and error are most useful. The Default
is info.
--log LOG Log to the specified target. This can be either a file
name or a syslog facility with an @ prefix (e.g.
@local0).
--tag TAG Tag log entries with the specified value. The default
is lava-sql.
CSV format arguments:
--delimiter CHAR Single character field delimiter. Default |.
--dialect {excel,excel-tab,unix}
CSV dialect (as per the Python csv module). Default is
excel.
--doublequote See Python csv.writer.
--escapechar CHAR See Python csv.writer. Escaping is disabled by
default.
--quotechar CHAR See Python csv.writer. Default is ".
--quoting QUOTING As for csv.writer QUOTE_* parameters (without the
QUOTE_ prefix). Default is minimal (i.e.
QUOTE_MINIMAL).
JSONL format arguments:
--sort-keys Sort keys in JSON objects.
Lava-sql can run one or more queries in a transaction and also capture
output from SELECT queries in various formats, either to a local file or to
AWS S3.
Info
Do not have more than one SELECT query in the batch unless you are
deliberately trying to create a mess.
CSV
Note that the default delimiter for csv format is the pipe symbol |, not a
comma. The original rationale for this was for consistency with the Redshift
COPY and UNLOAD commands. All I can say is that it seemed to make sense at
the time.
The output data is encoded as an HTML table with a class of lava-sql. Only
the table HTML is produced to allow the output to be incorporated into a larger
HTML document. (i.e. no HTML, BODY tags etc.).
Values will be escaped as needed to ensure HTML correctness.
Each row of output data is encoded as a single line JSON formatted object.
Parquet compression will generally benefit from a larger batch size. The default
of 1024 is reasonable for many purposes but increasing it will often give a
smaller output file. Don't get carried away though. Each batch has to be held
entirely in memory.
A word of caution about the Parquet schema ... It's quite difficult to handle
schema inference in a predictable or consistent way, particularly with data
sourced via a DBAPI 2 connector as the standard does not provide any
consistency in how, or if, implementations signal type information in query
responses. The approach used by lava-sql is to let
PyArrow form an educated
guess based on the first record batch. This should be fine for most purposes.
Lava State Utility
The lava state utility provides a CLI to the
lava state manager.
Usage
usage: lava-state [-h] [--profile PROFILE] [-r REALM] [-v] [--no-colour]
[-l LEVEL] [--log LOG] [--tag TAG]
{put,get} ...
Manipulate lava state entries.
positional arguments:
{put,get}
put Add / replace a state entry.
get Get a state entry.
optional arguments:
-h, --help show this help message and exit
--profile PROFILE As for AWS CLI.
-r REALM, --realm REALM
Lava realm name. If not specified, the environment
variable LAVA_REALM must be set.
-v, --version show program's version number and exit
logging arguments:
--no-colour, --no-color
Don't use colour in information messages.
-l LEVEL, --level LEVEL
Print messages of a given severity level or above. The
standard logging level names are available but debug,
info, warning and error are most useful. The Default
is info.
--log LOG Log to the specified target. This can be either a file
name or a syslog facility with an @ prefix (e.g.
@local0).
--tag TAG Tag log entries with the specified value. The default
is lava-state.
Creating a Lava State Item
State items are created with the put sub-command.
Info
Do not create state items with a state_id starting with lava.
This prefix is reserved.
Usage: lava-state put
usage: lava-state put [-h] [-p KEY=VALUE | -v VALUE] [--kms-key KMS_KEY]
[--publisher PUBLISHER] [--ttl DURATION]
[--type STATE_TYPE]
state_id
positional arguments:
state_id State ID.
optional arguments:
-h, --help show this help message and exit
-p KEY=VALUE, --param KEY=VALUE
Add the specified key/value pair to the state item.
Can be repeated to set multiple key/value pairs.
-v VALUE, --value VALUE
Set the value to the specified string.
--kms-key KMS_KEY The "secure" state item type supports KMS encryption
of the value. This argument specifies the KMS key to
use, either as a KMS key ARN or a key alias in the
form "alias/key-id". Defaults to the "sys" key for the
lava realm. Ignored for other state item types.
--publisher PUBLISHER
Set the state item publisher to the specified value.
Default is the contents of the LAVA_JOB_ID environment
variable, if set, or else "lava-state CLI".
--ttl DURATION Time to live as a duration (e.g. 10m, 2h, 1d).
--type STATE_TYPE State item type. Options are json, raw, secure.
Default is json.
Retrieving a Lava State Item
State items are retrieved with the get sub-command.
Usage: lava-state get
usage: lava-state get [-h] [-i] state_id [template]
positional arguments:
state_id State ID.
template An optional Jinja2 template that will be rendered with
the retrieved value as the "state" and "s" parameters.
e.g if set to "{{ state }}" (the default) the value is
printed as is.
optional arguments:
-h, --help show this help message and exit
-i, --ignore-missing Ignore errors for missing state items and return an
empty string. By default, attempting to get a non-
existent state item is an error.
Lava Stop Utility
The lava-stop utility initiates a controlled shutdown of the lava worker
daemons.
Usage
usage: lava-stop [-h] [-D] [--profile PROFILE] [--signal SIGNAL] [-v]
[-w DURATION] [--auto-scaling-group-name NAME]
[--instance-id ID] [--lifecycle-action-token UUID]
[--lifecycle-hook-name NAME] [--lifecycle-heartbeat DURATION]
[-c] [-l LEVEL] [--log LOG] [--tag TAG]
Stop lava worker processes.
options:
-h, --help show this help message and exit
-D, --no-dispatch Inhibit further scheduled dispatches by creating
/tmp/lava/__nodispatch__. This requires the lava-
dispatcher utility to check for this file by
specifying the --check-dispatch argument.
--profile PROFILE As for AWS CLI.
--signal SIGNAL, --sig SIGNAL
Send the specified signal to the lava worker
processes. Can be specified as a signal name (e.g.
SIGHUP or HUP) or a signal number. The default is 0
which only tests if the process exists. SIGHUP is
interpreted as a controlled shutdown instruction
allowing running jobs to complete. SIGTERM is
interpreted as a controlled, but immediate,
termination that allows final cleanup tasks but takes
no account of running jobs. See --w, --wait.
-v, --version show program's version number and exit
-w DURATION, --wait DURATION
Wait for up to the specified duration for the lava
workers to finish voluntarily before killing them.
This requires the signal to be set to SIGHUP / HUP as
this is interpreted by the lava worker daemons as a
controlled shutdown request. The duration must be in
the form nn[X] where nn is a number and X is one of s
(seconds), m (minutes) or h (hours). If X is not
specified, seconds are assumed.
AWS auto scaling lifecycle options:
These arguments are designed to complete an AWS auto scaling "EC2
Instance-terminate Lifecycle Action". See the AWS CLI or AWS auto scaling
documentation for meaning and usage. Note that the lifecycle action result
is always set to CONTINUE which means the auto scaling group _will_
terminate the instance.
--auto-scaling-group-name NAME
Send a complete-lifecycle-action signal for the
specified AWS auto scaing group. If specified,
--lifecycle-hook-name is also required.
--instance-id ID The ID of the EC2 instance (optional). If specified,
--auto-scaling-group-name / --lifecycle-hook-name are
required.
--lifecycle-action-token UUID
lifecycle action identifier (optional). If specified,
--auto-scaling-group-name / --lifecycle-hook-name are
required.
--lifecycle-hook-name NAME
The name of the AWS auto scaling lifecycle hook. If
specified, --auto-scaling-group-name is also required.
--lifecycle-heartbeat DURATION
Record a heartbeat for the lifecycle action at
specified intervals (optional). If specified, --auto-
scaling-group-name / --lifecycle-hook-name are
required. THe duration must be in the form nn[X] where
nn is a number and X is one of s (seconds), m
(minutes) or h (hours). If X is not specified, seconds
are assumed. The minimum permitted value is 60
seconds.
logging arguments:
-c, --no-colour, --no-color
Don't use colour in information messages.
-l LEVEL, --level LEVEL
Print messages of a given severity level or above. The
standard logging level names are available but debug,
info, warning and error are most useful. The Default
is info.
--log LOG Log to the specified target. This can be either a file
name or a syslog facility with an @ prefix (e.g.
@local0).
--tag TAG Tag log entries with the specified value. The default
is lava-stop.
The process for stopping a lava worker is:
-
Send it a SIGHUP signal. This tells the worker to complete any in-flight
or queued jobs but not to accept any more jobs.
-
Wait a while.
-
Send it another SIGHUP signal. The second SIGHUP is a more aggressive
shutdown command and will interrupt in-flight jobs but still allow the
worker an opportunity to cleanup.
-
Give it another 10-20 seconds.
-
If the worker is still running, kill it with SIGKILL.
Lava-stop will do a process listing to find worker processes. It can be used
interactively and is also designed for use within an AWS auto scaling
lifecycle hook for terminating worker nodes. This is all built in to a standard
lava deployment using the provided
CloudFormation templates.
Lava Version Utility
The lava-version utility provides version information on the installed lava
version.
Usage
usage: lava-version [-h] [-n | -a | --ge VERSION | --eq VERSION]
Print lava version information.
optional arguments:
-h, --help show this help message and exit
-n, --name Print version name only.
-a, --all Print all version inforamtion.
--ge VERSION Exit with zero status if the lava version is greater than or
equal to the specified version.
--eq VERSION Exit with zero status if the lava version is equal to the
specified version.
If no arguments are specified the lava version number is printed.
Lava Worker Status Utility
The lava-ws utility displays worker status information based on the worker
SQS queues (queue depths, worker backlog etc.).
Usage
usage: lava-ws [-h] [-f FORMAT] [-l] -r REALM [-w WORKER] [-v]
Get status info about lava workers.
optional arguments:
-h, --help show this help message and exit
-f FORMAT, --format FORMAT
Output table format (see below). The formats supported
by tabulate (https://pypi.org/project/tabulate/) can
be used. The default is fancy_grid.
-l Show more information. Repeat up to 2 times to get
more details.
-r REALM, --realm REALM
Lava realm name.
-w WORKER, --worker WORKER
Lava worker name prefix. If not specified, report on
all workers in the realm (assumes lava standard queue
naming conventions).
-v, --version show program's version number and exit
output columns:
BCKAVG Average worker backlog in the last 15 minutes
BCKMAX Maximum worker backlog in the last 15 minutes
BCKNOW Current backlog
DELAVG Average run delay in the last 15 minutes
DELMAX Maximum run delay in the last 15 minutes
EC2 Number of running EC2 instances
EC2TYPE EC2 instance type
MSGS Messages visible
NVIS Messages not visible
QUEUE SQS queue name
RET Message retention period
VIS Visibility timeout
output formats:
fancy_grid fancy_outline github grid html jira
latex latex_booktabs latex_longtable latex_raw mediawiki moinmoin
orgtbl pipe plain presto pretty psql
rst simple textile tsv unsafehtml youtrack
Unlike lava-ps, which displays worker process information, lava-ws does
not need to run on the worker host.