DynamoDB Tables¶

Each lava realm uses a number DynamoDB tables to hold configuration and status information.

The tables are created by the CloudFormation templates.

The Realms Table¶

The realms table is named lava.realms. This is a global lava table and is the only object shared across realms.

Field	Type	Required	Description
config	Map	No	An optional map of configuration values that will be applied to all workers in the realm. Refer to Lava Worker Configuration for more information.
on_fail	List[Map]	No	The default on_fail actions for jobs in the realm.
on_success	List[Map]	No	The default on_success actions for jobs in the realm.
realm	String	Yes	A unique identifier for the realm. Keep it simple.
s3_key	String	Yes	A KMS key identifier used when objects are written to S3 by a worker. Typically, either a key ARN or `alias/<KEY-NAME>`.
s3_payloads	String	Yes	A location in S3 where payloads are stored for the realm in the form `s3://<BUCKET>/<PREFIX>.`
s3_temp	String	Yes	A location in S3 where job outputs are stored for the realm in the form `s3://<BUCKET>/<PREFIX>.`
X-*	String	No	Any fields beginning with `x-` or `X-` are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes). A number of these fields are used as part of the boot process for EC2 based lava workers for configuration control.
*	*	*	Other fields in the realms table may be present to set defaults for other lava subcomponents. These are described in the relevant section.

The Jobs Table¶

The jobs table for a given <REALM> is named lava.<REALM>.jobs. It contains information about jobs and their associated run information.

Field	Type	Required	Description
cw_metrics	Boolean	No	If specified, enable/disable generation of CloudWatch custom metrics for this job. If present, overrides any value set at the worker/realm level.
description	String	Not yet	A short description of the job. This field will be mandatory in a future release.
dispatcher	String	No	An identifier specifying the dispatcher for the job.
enabled	Boolean\|String	No	Whether or not the job is enabled. Defaults to `false`. String values are Jinja rendered, providing a means to dynamically enable / disable jobs at run-time. More information.
event_log	*	No	The specified value is Jinja rendered and recorded as part of the job run event information in the events table. More information.
globals	Map[String,*]	No	A map of named values that are made available for Jinja rendering of job actions and job parameters for those job types that use Jinja parameter rendering. Names beginning with `lava` (case insensitive) are reserved for lava's use. More information on parameters and globals.
iteration_delay	String	No	The delay between attempts to run the job in the form nnX where nn is a number and X is s (seconds) or m (minutes). Default is `0s`. The maximum allowed value is specified by the ITERATION_MAX_DELAY configuration parameter. See Job Retries for more information.
iteration_limit	Integer	No	The number of attempts that will be made to run the job. Default is 1. The maximum allowed value is specified by the ITERATION_MAX_LIMIT configuration parameter. This is unrelated to the SQS related `max_tries` parameter. See Job Retries for more information.
job_id	String	Yes	The unique job identifier for the realm. It is possible to have some grouping of jobs using a path like structure. e.g. `job_group/myjob_01`.
max_run_delay	String	No	The maximum allowed delay between when a job is dispatched and when it is run in the form `nnX` where `nn` is a number and `X` is `s` (seconds), `m` (minutes), `h` (hours) or `d` (days). If this limit is exceeded, the job run is discarded with an error. If not specified, no limit is imposed other than the message retention period of the worker SQS job queue.
max_tries	Number	No	By default, if a lava worker fails mid-job, SQS will resubmit the dispatch request at the end of the visibility timeout. If `max_tries` is set to a positive integer value, then the dispatch message will be discarded when the SQS message `ApproximateReceiveCount` exceeds the specified value. Note that the minimum of the value of the `Maximum Receives` value for the worker SQS queue (if set) and any limit specified by the `--retries` worker option is still the upper limit.
on_fail	List[Map]	No	Job specific on_fail actions for the job. Overrides any realm level setting.
on_retry	List[Map]	No	Job specific on_retry actions for the job. Overrides any realm level setting.
on_success	List[Map]	No	Job specific on_success actions for the job. Overrides any realm level setting.
owner	String	Not yet	Name or email address of the job owner. This field will be mandatory in a future release.
parameters	Map[String,*]	No	A map of parameters that will be passed to the job. The parameter structure is dependent on the job type.
payload	*	Yes	The job payload. The type and format is job type dependent. Currently, a value is required even for job types that do not need it. In this case set the value to `null`.
schedule	String	No	A cron schedule that specifies when the job will run. Refer to the section Schedule Specifications for more information. If not specified, the job can be dispatched on demand but will not be scheduled.
state	Map[String,*]	No	A map of state items. For each item, the key is the `state_id` and the value is a default value. The default values are replaced at run-time by the current value of the specified state item in the state table, if it exists.
type	String	Yes	The name of the job handler to run.
worker	String	Yes	The name of a worker that can run the job.
X-*	String	No	Any fields beginning with `x-` or `X-` are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes). The lava job framework uses a number of these fields for various purposes.

Warning

Currently, unknown fields in the job specification will result in a deprecation warning being written to the worker's log but the job is permitted to run. A future release will reject jobs that have unknown fields.

The enabled Field¶

Prior to v8.0 (Incahuasi), the enabled field was a simple, static boolean value. If false, the job would be skipped. This behaviour is unchanged in v8+ if the value is boolean.

If the value is a string, it is Jinja rendered. If the resulting value is the string true (case and surrounding whitespace are ignored), the job is enabled to run. Any other value will result in the job being skipped. This allows job execution to be conditional on run-time values.

The following variables are made available to the Jinja renderer.

Name	Type	Description
globals	dict[str,*]	The `globals` from the job specification updated with any globals received in the job dispatch.
job	dict[str,*]	The augmented job specification.
realm	dict[str,*]	The realm specification.
start	datetime	The local time when the job run started.
state	dict[str,*]	A dictionary of the state items imported into the job, keyed on state_id. The default values are updated at run-time with any current values obtainable from the state table.
ustart	datetime	The UTC time when the job run started.
utils	dict[str,runnable]	A dictionary of utility functions that can be used in the Jinja markup.

As an example, the following job specification fragment will only enable the job if the value of the global a is palindromic (which it is, in the example):

{
  "enabled": "{% set word = globals.a | lower %}{{ word == word | reverse }}",
  "globals": {
    "#": "Global a is a palindrome.",
    "a": "Tattarrattat"
  }
}

In this example, the job is only enabled if the value of the x field in state item sid is odd:

{
  "enabled": "{% set val = state['sid'].x | int %}{{ val % 2 == 1 }}",
  "state": {
    "sid": "-- set at run-time --"
  }
}

As a more complex example, this job specification fragment only enables a job if it has not run successfully in the last 4 hours:

{
  "schedule": "30 * * * *",
  "enabled": "{% set ts=utils.parsedate.parse(state['sid']) %}{{ ustart-ts > utils.timedelta(hours=4) }}",
  "on_success": [
    {
      "action": "state",
      "state_id": "sid",
      "value": "{{ ustart.isoformat() }}"
    }
  ],
  "state": {
    "sid": "2000-01-01T00:00:00Z"
  },
  "event_log": "Time since last run: {% set ts=utils.parsedate.parse(state['sid']) %}{{ ustart-ts }}"

}

Some points to note:

A state variable is declared to hold an ISO 8601 datetime containing the last successful run-time for the job. This has an initial value defined in the job specification.
The enabled field checks the start time of the current run against the previous run start time, to ensure the required period of time has passed (4 hours in the example).
Once the job runs, an on_success action creates a state item that records the start time of the successful run.
An event_log field records for posterity the time since the previous run.

The event_log Field¶

Jobs may receive critical configuration as part of the dispatch process via parameters, globals, and state items. It can be tricky to determine from the events table exactly what a job run was doing, particularly in the event that a job run fails.

While it would be possible to record the entire augmented job specification, this is not safe in all cases as sensitive values may be exposed in the events table.

The event_log field allows the job specification to specify that certain information should be recorded in the job run record in the events table. The value of the field is an arbitrary object that will be Jinja rendered and added to the events table before the first iteration of the job begins. this information is added in a new entry with a logging status in the log field of the event record.

The following variables are made available to the Jinja renderer.

Name	Type	Description
globals	dict[str,*]	The `globals` from the job specification updated with any globals received in the job dispatch.
job	dict[str,*]	The augmented job specification.
realm	dict[str,*]	The realm specification.
start	datetime	The local time when the job run started.
state	dict[str,*]	A dictionary of the state items imported into the job, keyed on state_id. The default values are updated at run-time with any current values obtainable from the state table.
ustart	datetime	The UTC time when the job run started.
utils	dict[str,runnable]	A dictionary of utility functions that can be used in the Jinja markup.

As an example, the following job specification fragment would record the value of a specific global.

{
  "event_log": "The value of global g1 is '{{ globals.g1 }}'"
}

This fragment would record two globals in a map format:

{
  "event_log": {
    "g1": "{{ globals.g1 }}",
    "g2": "{{ globals.g2 }}"
  }
}

This fragment would record all of the globals (as a Python object converted to a string):

{
  "event_log": "{{ globals }}'"
}

Warning

DO NOT be cavalier with this. Take care to avoid logging sensitive information.

The Connections Table¶

The connections table for a given <REALM> is named lava.<REALM>.connections. It contains information to assist job handlers make connections to external resources, typically databases.

Field	Type	Required	Description
conn_id	String	Yes	Connection identifier.
description	String	No	A short description of the connection.
enabled	Boolean	Yes	Whether or not the connection is enabled. Defaults to `false`
owner	String	Not yet	Name or email address of the connection owner. This field will be mandatory in a future release.
type	String	Yes	The connection type. This is used to identify a connector plugin to establish the connection.
X-*	String	No	Any fields beginning with `x-` or `X-` are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes).
*	*	*	All other fields are connection type specific. For more information see the section on connectors.

The Events Table¶

The events table for a given <REALM> is named lava.<REALM>.events. It is populated by lava workers as they start and finish jobs runs.

Querying the table via the DynamoDB console can be a bit tedious so a utility lava-events is provided to assist with this. Get help thus:

lava-events --help

The lava GUI also provides the ability to query the events table.

Field	Type	Required	Description
hostname	String	Yes	The hostname of the worker that ran the job.
instance_id	String	No	The AWS EC2 instance ID. Not present when the worker is not an EC2 instance.
job_id	String	Yes	The unique job identifier for the realm.
log	List[Map]	Yes	A list of events that have occurred for this run of the job. Each entry in the list is a map which will contain `info`, `status` and `ts_event` fields. The contents of the `info` field are job type dependent.
run_id	String	Yes	The UUID for the job run. This is used in the naming of job outputs.
status	String	Yes	The most recent `status` value for this job run. It will reflect the `status` value of the latest entry in the `log` list.
ts_dispatch	String	Yes	A timezone aware ISO 8601 format timestamp for the time the job was dispatched.
ts_event	String	Yes	A timezone aware ISO 8601 format timestamp for the most recent event for this job run. It will reflect the `ts_event` value for this job run.
ttl	Number	Yes	The epoch timestamp when the event record will expire. DynamoDB manages expiry automatically provided the TTL attribute for the table is set to `ttl`.
tu_event	String	Yes	A timezone naive ISO 8601 format timestamp for the UTC time for the most recent event for this job run. It will reflect the `tu_event` value for this job run.
worker_id	String	No	If the worker is an AWS EC2 instance, the instance ID.
worker	String	Yes	The name of the worker that ran the job.

The S3triggers Table¶

The s3triggers table for a given <REALM> is named lava.<REALM>.s3triggers. It is used to map S3 bucket events to jobs. When an S3 event occurs with a bucket and object prefix matching an entry in the table, the corresponding lava job is dispatched. See Triggering Jobs from S3 Events for more information.

Field	Type	Required	Description
bucket	String	Yes	Bucket name.
delay	String	No	Dispatch message sending delay in the form `nnX` where `nn` is a number and `X` is `s` (seconds) or `m` (minutes). The maximum allowed value is 15 minutes.
description	String	No	A short description of the trigger.
enabled	Boolean	Yes	Whether or not the s3trigger is enabled.
globals	Map[String,*]	No	A map of named values that are included in the dispatch request. These are Jinja rendered. Names beginning with `lava` (case insensitive) are reserved for lava's use. More information on parameters and globals.
jinja	Boolean	No	If `false`, disable Jinja rendering of the parameters and globals. Defaults to `true`.
job_id	String \| List[String]	Yes	Job identifier for the lava job that will be dispatched, or a list of job identifiers. Each is Jinja rendered before use.
owner	String	Not yet	Name or email address of the trigger owner. This field will be mandatory in a future release.
parameters	Map[String,*]	No	A map of parameters for the job that will be included in the dispatch. These will be Jinja rendered.
prefix	String	Yes	Object prefix. Do not include a trailing `/` or matches will fail. To indicate the root of the bucket, use a prefix value of `*`.
trigger_id	String	Yes	A unique identifier within the realm for the trigger entry.
X-*	String	No	Any fields beginning with `x-` or `X-` are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes).

In addition to the fields described above, entries in the S3 triggers tab may also contain fields starting with if_ and if_not_. These cause a test to be applied to the S3 object event, The dispatch is only performed if the test passes in the case of if_ fields, or fails in the case of if_not_ fields. The available if_ tests are described below. There is a corresponding if_not_ test for each.

Field	Type	Required	Description
if_fnmatch	String \| List[String]	No	Perform a glob style match against the S3 object key. If a list of patterns is provided, returns true if any of the patterns match the key. The matching rules defined by Python's fnmatch apply.
if_size_gt	Integer \| String	No	Check that the S3 object is larger than the specified size. Values can be specified as an integer or a string in the form `nnX`, where `n` is a number and `X` is an optional unit such as 'K', 'KB', 'KiB', 'MiB' etc. Default for `X` is bytes.
if_event_type	String	No	Check that the S3 event type matches the specified value (e.g. `ObjectCreated:Put`). Glob style patterns are accepted (e.g. `ObjectCreated:*`).

The State Table¶

The state table for a given <REALM> is named lava.<REALM>.state. It is used to allow lava jobs to save state information for limited periods of time that can be accessed by authorised external actors or other lava jobs.

Creation and reading of entries in the state table are managed by the lava state manager. Other tools should not be used for this purpose.

Field	Type	Required	Description
state_id	String	Yes	The unique identifier for the state item.
publisher	String	No	An arbitrary identifier for the entity posting the event item. Not used by lava itself.
timestamp	String	No	An ISO 8601 format timestamp for the state item creation. Not used by lava itself.
ttl	Number	Yes	The epoch timestamp when the state record will expire. DynamoDB manages expiry automatically provided the TTL attribute for the table is set to `ttl`. The default and maximum time-to-live for entries in the table can be controlled by worker configuration parameters.
type	String	Yes	The state value type. This tells the lava worker how to decode the value. See State Item Types.
value	*	Yes	The state value. The structure depends on the state type.

State Item Types¶

Each state record has a specified type that tells the worker how to decode the value. Within lava itself, this is largely transparent as the worker handles all the necessary encoding and decoding.

The following types are supported:

Type	Description
json	The value is stored as a JSON encoded object. This is the default as it provides the most fidelity in the encoding / decoding process. Lava does this automatically within its own universe. External actors should use the lava state API or the lava state utility rather than attempt to reproduce this process natively.
raw	The value is stored as a DynamoDB object. This can sometimes do unhelpful type conversions on numbers.
secure	This uses the same value encoding mechanism as `json` with the addition of KMS encryption. Once again, external actors should use the lava state API or the lava state utility rather than attempt to reproduce this process natively. KMS encryption imposes a maximum size limit of 4096 bytes on the JSON encoded state item value.