Skip to content

DynamoDB Tables

Each lava realm uses a number DynamoDB tables to hold configuration and status information.

The tables are created by the CloudFormation templates.

See also Maintaining DynamoDB Table Entries.

The Realms Table

The realms table is named lava.realms. This is a global lava table and is the only object shared across realms.

Field Type Required Description
config Map No An optional map of configuration values that will be applied to all workers in the realm. Refer to Lava Worker Configuration for more information.
on_fail List[Map] No The default on_fail actions for jobs in the realm.
on_success List[Map] No The default on_success actions for jobs in the realm.
realm String Yes A unique identifier for the realm. Keep it simple.
s3_key String Yes A KMS key identifier used when objects are written to S3 by a worker. Typically, either a key ARN or alias/<KEY-NAME>.
s3_payloads String Yes A location in S3 where payloads are stored for the realm in the form s3://<BUCKET>/<PREFIX>.
s3_temp String Yes A location in S3 where job outputs are stored for the realm in the form s3://<BUCKET>/<PREFIX>.
X-* String No Any fields beginning with x- or X- are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes). A number of these fields are used as part of the boot process for EC2 based lava workers for configuration control.
* * * Other fields in the realms table may be present to set defaults for other lava subcomponents. These are described in the relevant section.

The Jobs Table

The jobs table for a given <REALM> is named lava.<REALM>.jobs. It contains information about jobs and their associated run information.

Field Type Required Description
cw_metrics Boolean No If specified, enable/disable generation of CloudWatch custom metrics for this job. If present, overrides any value set at the worker/realm level.
description String Not yet A short description of the job. This field will be mandatory in a future release.
dispatcher String No An identifier specifying the dispatcher for the job.
enabled Boolean|String No Whether or not the job is enabled. Defaults to false. String values are Jinja rendered, providing a means to dynamically enable / disable jobs at run-time. More information.
event_log * No The specified value is Jinja rendered and recorded as part of the job run event information in the events table. More information.
globals Map[String,*] No A map of named values that are made available for Jinja rendering of job actions and job parameters for those job types that use Jinja parameter rendering. Names beginning with lava (case insensitive) are reserved for lava's use. More information on parameters and globals.
iteration_delay String No The delay between attempts to run the job in the form nnX where nn is a number and X is s (seconds) or m (minutes). Default is 0s. The maximum allowed value is specified by the ITERATION_MAX_DELAY configuration parameter. See Job Retries for more information.
iteration_limit Integer No The number of attempts that will be made to run the job. Default is 1. The maximum allowed value is specified by the ITERATION_MAX_LIMIT configuration parameter. This is unrelated to the SQS related max_tries parameter. See Job Retries for more information.
job_id String Yes The unique job identifier for the realm. It is possible to have some grouping of jobs using a path like structure. e.g. job_group/myjob_01.
max_run_delay String No The maximum allowed delay between when a job is dispatched and when it is run in the form nnX where nn is a number and X is s (seconds), m (minutes), h (hours) or d (days). If this limit is exceeded, the job run is discarded with an error. If not specified, no limit is imposed other than the message retention period of the worker SQS job queue.
max_tries Number No By default, if a lava worker fails mid-job, SQS will resubmit the dispatch request at the end of the visibility timeout. If max_tries is set to a positive integer value, then the dispatch message will be discarded when the SQS message ApproximateReceiveCount exceeds the specified value. Note that the minimum of the value of the Maximum Receives value for the worker SQS queue (if set) and any limit specified by the --retries worker option is still the upper limit.
on_fail List[Map] No Job specific on_fail actions for the job. Overrides any realm level setting.
on_retry List[Map] No Job specific on_retry actions for the job. Overrides any realm level setting.
on_success List[Map] No Job specific on_success actions for the job. Overrides any realm level setting.
owner String Not yet Name or email address of the job owner. This field will be mandatory in a future release.
parameters Map[String,*] No A map of parameters that will be passed to the job. The parameter structure is dependent on the job type.
payload * Yes The job payload. The type and format is job type dependent. Currently, a value is required even for job types that do not need it. In this case set the value to null.
schedule String No A cron schedule that specifies when the job will run. Refer to the section Schedule Specifications for more information. If not specified, the job can be dispatched on demand but will not be scheduled.
state Map[String,*] No A map of state items. For each item, the key is the state_id and the value is a default value. The default values are replaced at run-time by the current value of the specified state item in the state table, if it exists.
type String Yes The name of the job handler to run.
worker String Yes The name of a worker that can run the job.
X-* String No Any fields beginning with x- or X- are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes). The lava job framework uses a number of these fields for various purposes.

Warning

Currently, unknown fields in the job specification will result in a deprecation warning being written to the worker's log but the job is permitted to run. A future release will reject jobs that have unknown fields.

The enabled Field

Prior to v8.0 (Incahuasi), the enabled field was a simple, static boolean value. If false, the job would be skipped. This behaviour is unchanged in v8+ if the value is boolean.

If the value is a string, it is Jinja rendered. If the resulting value is the string true (case and surrounding whitespace are ignored), the job is enabled to run. Any other value will result in the job being skipped. This allows job execution to be conditional on run-time values.

The following variables are made available to the Jinja renderer.

Name Type Description
globals dict[str,*] The globals from the job specification updated with any globals received in the job dispatch.
job dict[str,*] The augmented job specification.
realm dict[str,*] The realm specification.
start datetime The local time when the job run started.
state dict[str,*] A dictionary of the state items imported into the job, keyed on state_id. The default values are updated at run-time with any current values obtainable from the state table.
ustart datetime The UTC time when the job run started.
utils dict[str,runnable] A dictionary of utility functions that can be used in the Jinja markup.

As an example, the following job specification fragment will only enable the job if the value of the global a is palindromic (which it is, in the example):

{
  "enabled": "{% set word = globals.a | lower %}{{ word == word | reverse }}",
  "globals": {
    "#": "Global a is a palindrome.",
    "a": "Tattarrattat"
  }
}

In this example, the job is only enabled if the value of the x field in state item sid is odd:

{
  "enabled": "{% set val = state['sid'].x | int %}{{ val % 2 == 1 }}",
  "state": {
    "sid": "-- set at run-time --"
  }
}

As a more complex example, this job specification fragment only enables a job if it has not run successfully in the last 4 hours:

{
  "schedule": "30 * * * *",
  "enabled": "{% set ts=utils.parsedate.parse(state['sid']) %}{{ ustart-ts > utils.timedelta(hours=4) }}",
  "on_success": [
    {
      "action": "state",
      "state_id": "sid",
      "value": "{{ ustart.isoformat() }}"
    }
  ],
  "state": {
    "sid": "2000-01-01T00:00:00Z"
  },
  "event_log": "Time since last run: {% set ts=utils.parsedate.parse(state['sid']) %}{{ ustart-ts }}"

}

Some points to note:

  • A state variable is declared to hold an ISO 8601 datetime containing the last successful run-time for the job. This has an initial value defined in the job specification.

  • The enabled field checks the start time of the current run against the previous run start time, to ensure the required period of time has passed (4 hours in the example).

  • Once the job runs, an on_success action creates a state item that records the start time of the successful run.

  • An event_log field records for posterity the time since the previous run.

The event_log Field

Jobs may receive critical configuration as part of the dispatch process via parameters, globals, and state items. It can be tricky to determine from the events table exactly what a job run was doing, particularly in the event that a job run fails.

While it would be possible to record the entire augmented job specification, this is not safe in all cases as sensitive values may be exposed in the events table.

The event_log field allows the job specification to specify that certain information should be recorded in the job run record in the events table. The value of the field is an arbitrary object that will be Jinja rendered and added to the events table before the first iteration of the job begins. this information is added in a new entry with a logging status in the log field of the event record.

The following variables are made available to the Jinja renderer.

Name Type Description
globals dict[str,*] The globals from the job specification updated with any globals received in the job dispatch.
job dict[str,*] The augmented job specification.
realm dict[str,*] The realm specification.
start datetime The local time when the job run started.
state dict[str,*] A dictionary of the state items imported into the job, keyed on state_id. The default values are updated at run-time with any current values obtainable from the state table.
ustart datetime The UTC time when the job run started.
utils dict[str,runnable] A dictionary of utility functions that can be used in the Jinja markup.

As an example, the following job specification fragment would record the value of a specific global.

{
  "event_log": "The value of global g1 is '{{ globals.g1 }}'"
}

This fragment would record two globals in a map format:

{
  "event_log": {
    "g1": "{{ globals.g1 }}",
    "g2": "{{ globals.g2 }}"
  }
}

This fragment would record all of the globals (as a Python object converted to a string):

{
  "event_log": "{{ globals }}'"
}

Warning

DO NOT be cavalier with this. Take care to avoid logging sensitive information.

The Connections Table

The connections table for a given <REALM> is named lava.<REALM>.connections. It contains information to assist job handlers make connections to external resources, typically databases.

Field Type Required Description
conn_id String Yes Connection identifier.
description String No A short description of the connection.
enabled Boolean Yes Whether or not the connection is enabled. Defaults to false
owner String Not yet Name or email address of the connection owner. This field will be mandatory in a future release.
type String Yes The connection type. This is used to identify a connector plugin to establish the connection.
X-* String No Any fields beginning with x- or X- are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes).
* * * All other fields are connection type specific. For more information see the section on connectors.

The Events Table

The events table for a given <REALM> is named lava.<REALM>.events. It is populated by lava workers as they start and finish jobs runs.

Querying the table via the DynamoDB console can be a bit tedious so a utility lava-events is provided to assist with this. Get help thus:

lava-events --help

The lava GUI also provides the ability to query the events table.

Field Type Required Description
hostname String Yes The hostname of the worker that ran the job.
instance_id String No The AWS EC2 instance ID. Not present when the worker is not an EC2 instance.
job_id String Yes The unique job identifier for the realm.
log List[Map] Yes A list of events that have occurred for this run of the job. Each entry in the list is a map which will contain info, status and ts_event fields. The contents of the info field are job type dependent.
run_id String Yes The UUID for the job run. This is used in the naming of job outputs.
status String Yes The most recent status value for this job run. It will reflect the status value of the latest entry in the log list.
ts_dispatch String Yes A timezone aware ISO 8601 format timestamp for the time the job was dispatched.
ts_event String Yes A timezone aware ISO 8601 format timestamp for the most recent event for this job run. It will reflect the ts_event value for this job run.
ttl Number Yes The epoch timestamp when the event record will expire. DynamoDB manages expiry automatically provided the TTL attribute for the table is set to ttl.
tu_event String Yes A timezone naive ISO 8601 format timestamp for the UTC time for the most recent event for this job run. It will reflect the tu_event value for this job run.
worker_id String No If the worker is an AWS EC2 instance, the instance ID.
worker String Yes The name of the worker that ran the job.

The S3triggers Table

The s3triggers table for a given <REALM> is named lava.<REALM>.s3triggers. It is used to map S3 bucket events to jobs. When an S3 event occurs with a bucket and object prefix matching an entry in the table, the corresponding lava job is dispatched. See Triggering Jobs from S3 Events for more information.

Field Type Required Description
bucket String Yes Bucket name.
delay String No Dispatch message sending delay in the form nnX where nn is a number and X is s (seconds) or m (minutes). The maximum allowed value is 15 minutes.
description String No A short description of the trigger.
enabled Boolean Yes Whether or not the s3trigger is enabled.
globals Map[String,*] No A map of named values that are included in the dispatch request. These are Jinja rendered. Names beginning with lava (case insensitive) are reserved for lava's use. More information on parameters and globals.
jinja Boolean No If false, disable Jinja rendering of the parameters and globals. Defaults to true.
job_id String | List[String] Yes Job identifier for the lava job that will be dispatched, or a list of job identifiers. Each is Jinja rendered before use.
owner String Not yet Name or email address of the trigger owner. This field will be mandatory in a future release.
parameters Map[String,*] No A map of parameters for the job that will be included in the dispatch. These will be Jinja rendered.
prefix String Yes Object prefix. Do not include a trailing / or matches will fail. To indicate the root of the bucket, use a prefix value of *.
trigger_id String Yes A unique identifier within the realm for the trigger entry.
X-* String No Any fields beginning with x- or X- are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes).

In addition to the fields described above, entries in the S3 triggers tab may also contain fields starting with if_ and if_not_. These cause a test to be applied to the S3 object event, The dispatch is only performed if the test passes in the case of if_ fields, or fails in the case of if_not_ fields. The available if_ tests are described below. There is a corresponding if_not_ test for each.

Field Type Required Description
if_fnmatch String | List[String] No Perform a glob style match against the S3 object key. If a list of patterns is provided, returns true if any of the patterns match the key. The matching rules defined by Python's fnmatch apply.
if_size_gt Integer | String No Check that the S3 object is larger than the specified size. Values can be specified as an integer or a string in the form nnX, where n is a number and X is an optional unit such as 'K', 'KB', 'KiB', 'MiB' etc. Default for X is bytes.
if_event_type String No Check that the S3 event type matches the specified value (e.g. ObjectCreated:Put). Glob style patterns are accepted (e.g. ObjectCreated:*).

The State Table

The state table for a given <REALM> is named lava.<REALM>.state. It is used to allow lava jobs to save state information for limited periods of time that can be accessed by authorised external actors or other lava jobs.

Creation and reading of entries in the state table are managed by the lava state manager. Other tools should not be used for this purpose.

Field Type Required Description
state_id String Yes The unique identifier for the state item.
publisher String No An arbitrary identifier for the entity posting the event item. Not used by lava itself.
timestamp String No An ISO 8601 format timestamp for the state item creation. Not used by lava itself.
ttl Number Yes The epoch timestamp when the state record will expire. DynamoDB manages expiry automatically provided the TTL attribute for the table is set to ttl. The default and maximum time-to-live for entries in the table can be controlled by worker configuration parameters.
type String Yes The state value type. This tells the lava worker how to decode the value. See State Item Types.
value * Yes The state value. The structure depends on the state type.

State Item Types

Each state record has a specified type that tells the worker how to decode the value. Within lava itself, this is largely transparent as the worker handles all the necessary encoding and decoding.

The following types are supported:

Type Description
json The value is stored as a JSON encoded object. This is the default as it provides the most fidelity in the encoding / decoding process. Lava does this automatically within its own universe. External actors should use the lava state API or the lava state utility rather than attempt to reproduce this process natively.
raw The value is stored as a DynamoDB object. This can sometimes do unhelpful type conversions on numbers.
secure This uses the same value encoding mechanism as json with the addition of KMS encryption. Once again, external actors should use the lava state API or the lava state utility rather than attempt to reproduce this process natively. KMS encryption imposes a maximum size limit of 4096 bytes on the JSON encoded state item value.