DynamoDB Tables¶
Each lava realm uses a number DynamoDB tables to hold configuration and status information.
The tables are created by the CloudFormation templates.
See also Maintaining DynamoDB Table Entries.
The Realms Table¶
The realms table is named lava.realms. This is a global lava table and is the
only object shared across realms.
| Field | Type | Required | Description |
|---|---|---|---|
| config | Map | No | An optional map of configuration values that will be applied to all workers in the realm. Refer to Lava Worker Configuration for more information. |
| on_fail | List[Map] | No | The default on_fail actions for jobs in the realm. |
| on_success | List[Map] | No | The default on_success actions for jobs in the realm. |
| realm | String | Yes | A unique identifier for the realm. Keep it simple. |
| s3_key | String | Yes | A KMS key identifier used when objects are written to S3 by a worker. Typically, either a key ARN or alias/<KEY-NAME>. |
| s3_payloads | String | Yes | A location in S3 where payloads are stored for the realm in the form s3://<BUCKET>/<PREFIX>. |
| s3_temp | String | Yes | A location in S3 where job outputs are stored for the realm in the form s3://<BUCKET>/<PREFIX>. |
| X-* | String | No | Any fields beginning with x- or X- are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes). A number of these fields are used as part of the boot process for EC2 based lava workers for configuration control. |
| * | * | * | Other fields in the realms table may be present to set defaults for other lava subcomponents. These are described in the relevant section. |
The Jobs Table¶
The jobs table for a given <REALM> is named lava.<REALM>.jobs. It contains
information about jobs and their associated run information.
| Field | Type | Required | Description |
|---|---|---|---|
| cw_metrics | Boolean | No | If specified, enable/disable generation of CloudWatch custom metrics for this job. If present, overrides any value set at the worker/realm level. |
| description | String | Not yet | A short description of the job. This field will be mandatory in a future release. |
| dispatcher | String | No | An identifier specifying the dispatcher for the job. |
| enabled | Boolean|String | No | Whether or not the job is enabled. Defaults to false. String values are Jinja rendered, providing a means to dynamically enable / disable jobs at run-time. More information. |
| event_log | * | No | The specified value is Jinja rendered and recorded as part of the job run event information in the events table. More information. |
| globals | Map[String,*] | No | A map of named values that are made available for Jinja rendering of job actions and job parameters for those job types that use Jinja parameter rendering. Names beginning with lava (case insensitive) are reserved for lava's use. More information on parameters and globals. |
| iteration_delay | String | No | The delay between attempts to run the job in the form nnX where nn is a number and X is s (seconds) or m (minutes). Default is 0s. The maximum allowed value is specified by the ITERATION_MAX_DELAY configuration parameter. See Job Retries for more information. |
| iteration_limit | Integer | No | The number of attempts that will be made to run the job. Default is 1. The maximum allowed value is specified by the ITERATION_MAX_LIMIT configuration parameter. This is unrelated to the SQS related max_tries parameter. See Job Retries for more information. |
| job_id | String | Yes | The unique job identifier for the realm. It is possible to have some grouping of jobs using a path like structure. e.g. job_group/myjob_01. |
| max_run_delay | String | No | The maximum allowed delay between when a job is dispatched and when it is run in the form nnX where nn is a number and X is s (seconds), m (minutes), h (hours) or d (days). If this limit is exceeded, the job run is discarded with an error. If not specified, no limit is imposed other than the message retention period of the worker SQS job queue. |
| max_tries | Number | No | By default, if a lava worker fails mid-job, SQS will resubmit the dispatch request at the end of the visibility timeout. If max_tries is set to a positive integer value, then the dispatch message will be discarded when the SQS message ApproximateReceiveCount exceeds the specified value. Note that the minimum of the value of the Maximum Receives value for the worker SQS queue (if set) and any limit specified by the --retries worker option is still the upper limit. |
| on_fail | List[Map] | No | Job specific on_fail actions for the job. Overrides any realm level setting. |
| on_retry | List[Map] | No | Job specific on_retry actions for the job. Overrides any realm level setting. |
| on_success | List[Map] | No | Job specific on_success actions for the job. Overrides any realm level setting. |
| owner | String | Not yet | Name or email address of the job owner. This field will be mandatory in a future release. |
| parameters | Map[String,*] | No | A map of parameters that will be passed to the job. The parameter structure is dependent on the job type. |
| payload | * | Yes | The job payload. The type and format is job type dependent. Currently, a value is required even for job types that do not need it. In this case set the value to null. |
| schedule | String | No | A cron schedule that specifies when the job will run. Refer to the section Schedule Specifications for more information. If not specified, the job can be dispatched on demand but will not be scheduled. |
| state | Map[String,*] | No | A map of state items. For each item, the key is the state_id and the value is a default value. The default values are replaced at run-time by the current value of the specified state item in the state table, if it exists. |
| type | String | Yes | The name of the job handler to run. |
| worker | String | Yes | The name of a worker that can run the job. |
| X-* | String | No | Any fields beginning with x- or X- are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes). The lava job framework uses a number of these fields for various purposes. |
Warning
Currently, unknown fields in the job specification will result in a deprecation warning being written to the worker's log but the job is permitted to run. A future release will reject jobs that have unknown fields.
The enabled Field¶
Prior to v8.0 (Incahuasi), the enabled field was a simple, static boolean
value. If false, the job would be skipped. This behaviour is unchanged in
v8+ if the value is boolean.
If the value is a string, it is Jinja rendered. If the resulting value is the
string true (case and surrounding whitespace are ignored), the job is enabled
to run. Any other value will result in the job being skipped. This allows job
execution to be conditional on run-time values.
The following variables are made available to the Jinja renderer.
| Name | Type | Description |
|---|---|---|
| globals | dict[str,*] | The globals from the job specification updated with any globals received in the job dispatch. |
| job | dict[str,*] | The augmented job specification. |
| realm | dict[str,*] | The realm specification. |
| start | datetime | The local time when the job run started. |
| state | dict[str,*] | A dictionary of the state items imported into the job, keyed on state_id. The default values are updated at run-time with any current values obtainable from the state table. |
| ustart | datetime | The UTC time when the job run started. |
| utils | dict[str,runnable] | A dictionary of utility functions that can be used in the Jinja markup. |
As an example, the following job specification fragment will only enable the
job if the value of the global a is palindromic (which it is, in the example):
{
"enabled": "{% set word = globals.a | lower %}{{ word == word | reverse }}",
"globals": {
"#": "Global a is a palindrome.",
"a": "Tattarrattat"
}
}
In this example, the job is only enabled if the value of the x field in
state item sid is odd:
{
"enabled": "{% set val = state['sid'].x | int %}{{ val % 2 == 1 }}",
"state": {
"sid": "-- set at run-time --"
}
}
As a more complex example, this job specification fragment only enables a job if it has not run successfully in the last 4 hours:
{
"schedule": "30 * * * *",
"enabled": "{% set ts=utils.parsedate.parse(state['sid']) %}{{ ustart-ts > utils.timedelta(hours=4) }}",
"on_success": [
{
"action": "state",
"state_id": "sid",
"value": "{{ ustart.isoformat() }}"
}
],
"state": {
"sid": "2000-01-01T00:00:00Z"
},
"event_log": "Time since last run: {% set ts=utils.parsedate.parse(state['sid']) %}{{ ustart-ts }}"
}
Some points to note:
-
A state variable is declared to hold an ISO 8601 datetime containing the last successful run-time for the job. This has an initial value defined in the job specification.
-
The
enabledfield checks the start time of the current run against the previous run start time, to ensure the required period of time has passed (4 hours in the example). -
Once the job runs, an
on_successaction creates a state item that records the start time of the successful run. -
An
event_logfield records for posterity the time since the previous run.
The event_log Field¶
Jobs may receive critical configuration as part of the dispatch process via parameters, globals, and state items. It can be tricky to determine from the events table exactly what a job run was doing, particularly in the event that a job run fails.
While it would be possible to record the entire augmented job specification, this is not safe in all cases as sensitive values may be exposed in the events table.
The event_log field allows the job specification to specify that certain
information should be recorded in the job run record in the
events table. The value of the field is an arbitrary
object that will be Jinja rendered and added to the
events table before the first iteration of the job begins.
this information is added in a new entry with a logging status in the log
field of the event record.
The following variables are made available to the Jinja renderer.
| Name | Type | Description |
|---|---|---|
| globals | dict[str,*] | The globals from the job specification updated with any globals received in the job dispatch. |
| job | dict[str,*] | The augmented job specification. |
| realm | dict[str,*] | The realm specification. |
| start | datetime | The local time when the job run started. |
| state | dict[str,*] | A dictionary of the state items imported into the job, keyed on state_id. The default values are updated at run-time with any current values obtainable from the state table. |
| ustart | datetime | The UTC time when the job run started. |
| utils | dict[str,runnable] | A dictionary of utility functions that can be used in the Jinja markup. |
As an example, the following job specification fragment would record the value of a specific global.
{
"event_log": "The value of global g1 is '{{ globals.g1 }}'"
}
This fragment would record two globals in a map format:
{
"event_log": {
"g1": "{{ globals.g1 }}",
"g2": "{{ globals.g2 }}"
}
}
This fragment would record all of the globals (as a Python object converted to a string):
{
"event_log": "{{ globals }}'"
}
Warning
DO NOT be cavalier with this. Take care to avoid logging sensitive information.
The Connections Table¶
The connections table for a given <REALM> is named lava.<REALM>.connections.
It contains information to assist job handlers make connections to external
resources, typically databases.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| description | String | No | A short description of the connection. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. Defaults to false |
| owner | String | Not yet | Name or email address of the connection owner. This field will be mandatory in a future release. |
| type | String | Yes | The connection type. This is used to identify a connector plugin to establish the connection. |
| X-* | String | No | Any fields beginning with x- or X- are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes). |
| * | * | * | All other fields are connection type specific. For more information see the section on connectors. |
The Events Table¶
The events table for a given <REALM> is named lava.<REALM>.events. It is
populated by lava workers as they start and finish jobs runs.
Querying the table via the DynamoDB console can be a bit tedious so a
utility lava-events is provided to assist with this. Get help thus:
lava-events --help
The lava GUI also provides the ability to query the events table.
| Field | Type | Required | Description |
|---|---|---|---|
| hostname | String | Yes | The hostname of the worker that ran the job. |
| instance_id | String | No | The AWS EC2 instance ID. Not present when the worker is not an EC2 instance. |
| job_id | String | Yes | The unique job identifier for the realm. |
| log | List[Map] | Yes | A list of events that have occurred for this run of the job. Each entry in the list is a map which will contain info, status and ts_event fields. The contents of the info field are job type dependent. |
| run_id | String | Yes | The UUID for the job run. This is used in the naming of job outputs. |
| status | String | Yes | The most recent status value for this job run. It will reflect the status value of the latest entry in the log list. |
| ts_dispatch | String | Yes | A timezone aware ISO 8601 format timestamp for the time the job was dispatched. |
| ts_event | String | Yes | A timezone aware ISO 8601 format timestamp for the most recent event for this job run. It will reflect the ts_event value for this job run. |
| ttl | Number | Yes | The epoch timestamp when the event record will expire. DynamoDB manages expiry automatically provided the TTL attribute for the table is set to ttl. |
| tu_event | String | Yes | A timezone naive ISO 8601 format timestamp for the UTC time for the most recent event for this job run. It will reflect the tu_event value for this job run. |
| worker_id | String | No | If the worker is an AWS EC2 instance, the instance ID. |
| worker | String | Yes | The name of the worker that ran the job. |
The S3triggers Table¶
The s3triggers table for a given <REALM> is named lava.<REALM>.s3triggers.
It is used to map S3 bucket events to jobs. When an S3 event occurs with a
bucket and object prefix matching an entry in the table, the corresponding lava
job is dispatched. See
Triggering Jobs from S3 Events
for more information.
| Field | Type | Required | Description |
|---|---|---|---|
| bucket | String | Yes | Bucket name. |
| delay | String | No | Dispatch message sending delay in the form nnX where nn is a number and X is s (seconds) or m (minutes). The maximum allowed value is 15 minutes. |
| description | String | No | A short description of the trigger. |
| enabled | Boolean | Yes | Whether or not the s3trigger is enabled. |
| globals | Map[String,*] | No | A map of named values that are included in the dispatch request. These are Jinja rendered. Names beginning with lava (case insensitive) are reserved for lava's use. More information on parameters and globals. |
| jinja | Boolean | No | If false, disable Jinja rendering of the parameters and globals. Defaults to true. |
| job_id | String | List[String] | Yes | Job identifier for the lava job that will be dispatched, or a list of job identifiers. Each is Jinja rendered before use. |
| owner | String | Not yet | Name or email address of the trigger owner. This field will be mandatory in a future release. |
| parameters | Map[String,*] | No | A map of parameters for the job that will be included in the dispatch. These will be Jinja rendered. |
| prefix | String | Yes | Object prefix. Do not include a trailing / or matches will fail. To indicate the root of the bucket, use a prefix value of *. |
| trigger_id | String | Yes | A unique identifier within the realm for the trigger entry. |
| X-* | String | No | Any fields beginning with x- or X- are ignored by lava. These can be used as required for other purposes (e.g. CI/CD, versioning or other related purposes). |
In addition to the fields described above, entries in the S3 triggers tab may
also contain fields starting with if_ and if_not_. These cause a test
to be applied to the S3 object event, The dispatch is only performed if the test
passes in the case of if_ fields, or fails in the case of if_not_ fields.
The available if_ tests are described below. There is a corresponding
if_not_ test for each.
| Field | Type | Required | Description |
|---|---|---|---|
| if_fnmatch | String | List[String] | No | Perform a glob style match against the S3 object key. If a list of patterns is provided, returns true if any of the patterns match the key. The matching rules defined by Python's fnmatch apply. |
| if_size_gt | Integer | String | No | Check that the S3 object is larger than the specified size. Values can be specified as an integer or a string in the form nnX, where n is a number and X is an optional unit such as 'K', 'KB', 'KiB', 'MiB' etc. Default for X is bytes. |
| if_event_type | String | No | Check that the S3 event type matches the specified value (e.g. ObjectCreated:Put). Glob style patterns are accepted (e.g. ObjectCreated:*). |
The State Table¶
The state table for a given <REALM> is named lava.<REALM>.state. It is used
to allow lava jobs to save state information for limited periods of time that
can be accessed by authorised external actors or other lava jobs.
Creation and reading of entries in the state table are managed by the lava state manager. Other tools should not be used for this purpose.
| Field | Type | Required | Description |
|---|---|---|---|
| state_id | String | Yes | The unique identifier for the state item. |
| publisher | String | No | An arbitrary identifier for the entity posting the event item. Not used by lava itself. |
| timestamp | String | No | An ISO 8601 format timestamp for the state item creation. Not used by lava itself. |
| ttl | Number | Yes | The epoch timestamp when the state record will expire. DynamoDB manages expiry automatically provided the TTL attribute for the table is set to ttl. The default and maximum time-to-live for entries in the table can be controlled by worker configuration parameters. |
| type | String | Yes | The state value type. This tells the lava worker how to decode the value. See State Item Types. |
| value | * | Yes | The state value. The structure depends on the state type. |
State Item Types¶
Each state record has a specified type that tells the worker how to decode the
value. Within lava itself, this is largely transparent as the worker handles
all the necessary encoding and decoding.
The following types are supported:
| Type | Description |
|---|---|
| json | The value is stored as a JSON encoded object. This is the default as it provides the most fidelity in the encoding / decoding process. Lava does this automatically within its own universe. External actors should use the lava state API or the lava state utility rather than attempt to reproduce this process natively. |
| raw | The value is stored as a DynamoDB object. This can sometimes do unhelpful type conversions on numbers. |
| secure | This uses the same value encoding mechanism as json with the addition of KMS encryption. Once again, external actors should use the lava state API or the lava state utility rather than attempt to reproduce this process natively. KMS encryption imposes a maximum size limit of 4096 bytes on the JSON encoded state item value. |