Skip to content

Frequently Asked Questions

Scheduling and Dispatch FAQ

Scheduling

Can a job be scheduled to run when a worker reboots?

Yes … and no.

The @reboot crontab specification can be used in a job schedule but what does that actually mean? Given that it will occur on the dispatcher node, any job scheduled on a reboot will only get dispatched when the dispatcher itself reboots. This may be useful, but it's not obvious how.

Jobs FAQ

This section covers questions related to jobs in general as well as those specific to particular job types.

General Questions

Job failed: [Errno 26] Text file busy: '/tmp/lava/...'

This is the result of an S3 race condition. Essentially it's a lava internal error that should not happen.

The PAYLOAD_SETTLING_TIME can help control this.

No payload files downloaded from S3

The payload prefix specified does not refer to any objects in S3. It generally means the payload value in the job specification is incorrect. Note that this value is always relative to the payloads area in S3, not absolute.

Recursive download not supported

This error can occur for multiple job types that involve downloading a payload from the payload area in S3.

It occurs if the specified payload points to an S3 prefix that has nested folders underneath (in the sense that S3 fakes the notion of folder).

Overlapping runs for the same job

When a lava job is dispatched, a run ID (UUID) is assigned at the time of dispatch and a dispatch message is placed onto the job SQS queue for the target worker.

The lava worker must complete the job, successfully or otherwise, before the SQS queue message visibility timeout expires, otherwise the job will be re-queued by SQS for another run. The re-run will have the same run configuration, including run ID.

If multiple instances of a job run are active at the same time, it is likely that the job is running longer than the SQS worker queue visibility timeout.

Options in this situation are (in order of preference):

  1. Redesign the job to run more efficiently so it completes within the message visibility period.

  2. Use the timeout parameter on the job where available so that lava will kill the job before the message visibility timeout expires.

  3. Set the max_tries field in the job specification.

  4. Increase the visibility timeout on the worker SQS queue. This should be done by updating the CloudFormation stack for the worker. SQS limits the visibility timeout to a maximum of 12 hours.

What timezone applies when a job is run?

Lava controls the timezone when a job is dispatched as it's obviously important that a job is run at the correct time.

Lava does not control the timezone in which a job executes and lava makes no promises about the timezone of the execution environment.

If a job runs natively on a lava worker, it will generally inherit the timezone of the host, which can be anything. If it runs as a docker job it may have a different timezone to the host.

The bottom line is that jobs need to either:

  1. Avoid any timezone dependency in the execution environment (e.g. by always working with UTC or timezone aware timestamps); or

  2. Explicitly control the execution timezone. For cmd, docker, exe and pkg jobs, this can be done by specifying the TZ environment variable in the env parameter of the job specification. e.g.

{
    "type": "exe",
    "payload": "...",
    "env": {
        "TZ": "Australia/Darwin"
    }
}

Danger signs in a job payload are usage of the date(1) command or use of datetime.datetime.now() in a Python script.

Why are timezones all over the place in event records?

Records in the events table contain a bunch of different timestamps that may appear to have a random selection of timezones associated with them.

Timezone aware timestamps are not all rendered into UTC for two reasons:

  1. The timezone information can sometimes be useful when investigating issues.

  2. There is no particular need to convert to UTC as the timestamp is unambiguous.

This why they have the timezones that they do.

Field Timezone Explanation
ts_event This timezone aware timestamp is generated by the job execution environment. For most jobs, the timezone is that of the host running the lava worker. For docker jobs, it is the timezone of the container, which may be different from the host.
ts_dispatch This timezone aware timestamp is generated by the job dispatch environment. For scheduled jobs, this is the timezone of the host running the dispatch command via cron(8) and is unrelated to the timezone associated with the job schedule. For jobs dispatched by the dispatch helper or s3trigger, the dispatch event comes from an AWS lambda function and the associated timezone is UTC. For directly dispatched jobs, the timezone will be whatever timezone the dispatching entity happens to have.
tu_event This is the timezone naive equivalent of ts_event in UTC. It's present purely as a convenience.
ttl This is a lava internal field. Its the UNIX epoch timestamp used by DynamoDB to automatically expire and remove old event records. Don't touch it or bother trying to interpret it.

db_from_s3

Error: invalid byte sequence for encoding “UTF8”

This typically means either:

  1. The source data contains non-UTF8 characters; or

  2. The source is actually a gzip file but the GZIP option has not been specified in the job parameters.

Error: HTTP 412. The file has been modified since the import call started

This is probably the result of an S3 race condition. It shouldn't happen. Report it if it does.

db_to_s3

There is no db_to_s3 job type. What should I do instead?

Yes. Sorry about that.

There are a number of alternatives.

For Redshift, use redshift_unload which is optimised to extract data from Redshift efficiently.

For other database types, the lava-sql utility can cover many use cases. It can extract data from all supported database types into CSV, JSONL and Parquet format. It can be called directly from within the payload of exe, pkg and docker jobs.

It can also be used directly in a cmd job as a DIY version of a db_to_s3 job. This is what such a job might look like in the YAML format supported by the lava job framework:

description: Ersatz db_to_s3 type of job

type: cmd

job_id: <{ prefix.job }>/ersatz-db-to-s3

worker: <{ worker.main }>
enabled: true
owner: <{ owner }>

payload: >-
  /bin/sh -c
  "
  echo 'SELECT a, b, c FROM my_schema.my_table' |
  lava-sql
  --format csv
  --header
  --conn-id '<{ db.conn_id }>'
  --output '{{ realm.s3_temp }}/{{ job.job_id }}/output.csv'
  -
  "

event_log: Output file is {{ realm.s3_temp }}/{{ job.job_id }}/output.csv

Note the final - argument to lava-sql tells it to read the query from stdin.

We can also load the query from S3 instead of embedding it in the job, thus:

description: Ersatz db_to_s3 type of job

type: cmd

job_id: <{ prefix.job }>/ersatz-db-to-s3

worker: <{ worker.main }>
enabled: true
owner: <{ owner }>

payload: >-
  /bin/sh -c
  "
  lava-sql
  --format csv
  --header
  --conn-id '<{ db.conn_id }>'
  --output '{{ realm.s3_temp }}/{{ job.job_id }}/output.csv'
  '{{ realm.s3_payloads }}/<{ prefix.payload }>/xyz.rsc/query.sql'
  "

event_log: Output file is {{ realm.s3_temp }}/{{ job.job_id }}/output.csv

docker

You must specify a region

If a docker job requires access to AWS resources (e.g. S3, the lava connections manager etc.) it will be using the boto3 Python module to do so. This requires the AWS region in which it runs to be specified. Rather than hard-wire that into the container, the easiest way to set this is by adding the following environment variable to the parameters in the job specification:

{
    "parameters": {
        "env": {
         "AWS_DEFAULT_REGION": "ap-southeast-2"
        }
    }
}

Note that the IAM role for the worker should provide the authentication requirements for docker containers running on that worker.

exe

Error: job ... (...) failed: [Errno 8] Exec format error: '/tmp/lava/...'

This error indicates that the operating system tried to run a script as a binary when it isn't. It generally means that the hashbang line is missing from the beginning of a script file indicating what interpreter to use. For a Python script it will look exactly like this:

#!/usr/bin/env python3

This error can also occur when an executable has been edited on a DOS system and has acquired DOS CRLF line endings instead of UNIX LF line endings.

lavasched

Failing lavasched jobs

This is the result of a bad crontab spec in a newly added job.

Prior to version 5.0.0 (Cotopaxi), it could be tricky to localise the bad job specification. Since Cotopaxi, output from lavasched jobs now includes a context diff of the new crontab vs the old one to simplify tracking of changes and problem diagnosis.

pkg

Error: job ... (...) failed: [Errno 8] Exec format error: '/tmp/lava/...'

This error indicates that the operating system tried to run a script as a binary when it isn't. It generally means that the hashbang line is missing from the beginning of a script file indicating what interpreter to use. For a Python script it will look exactly like this:

#!/usr/bin/env python3

This error can also occur when an executable has been edited on a DOS system and has acquired DOS CRLF line endings instead of UNIX LF line endings.

In what timezone will a pkg job run?

By default, all jobs run in the timezone setting of the worker itself. This is unrelated to the dispatch timezone.

For a pkg job, the timezone for the job run can be specified by adding the TZ environment variable to the job specification, thus:

{
    "type": "pkg",
    "payload": "...",
    "env": {
        "TZ": "Australia/Darwin"
    }
}

sql

Error: need to escape, but no escapechar set

When lava collects the output of a query from an sql job, it uses the standard Python CSV writer to format the data for output. If it detects that the data requires an escape character to be specified but this has not been done in the job specification, this error will result.

Connectors FAQ

This section covers questions related to connectors in general as well as those specific to particular job types.

General Questions

Why is psycopg2 not supported?

In a word, licensing. Psycopg2 is LGPL.

Why is pygresql not supported?

As of v5.0.0 (Cotopaxi), pygresql can be specified as the driver to use for Postgres family connectors using the subtype field in the connection specification.

It is strongly recommended to stick with the default of pg8000 for the db_from_s3 and redshift_unload job types.

Why would I use pygresql instead of pg8000?

Insert performance of pg8000 is poor. For Python based jobs inserting a lot of data, pygresql should give better performance.

Is SQLAlchemy supported?

As of v5.0.0 (Cotopaxi), the SQL based connectors provide native support for SQLAlchemy. An SQLAlchemy engine can be created using a lava connector to manage the underlying connection process.

ssh, scp, sftp

Authentication Failures

A possible cause of authentication failures is the default behaviour of ssh, scp and sftp in situations where it cannot verify the fingerprint of the remote host's public key. When this occurs, the client will drop into interactive mode to ask for confirmation to accept the key, like so:

The authenticity of host 'sftp.xyzzy.com (192.219.1.1)' can't be established.
RSA key fingerprint is SHA256:TXYwAhoSEfm6Me6RtFHJRUEGL9lTuHqySI6GyxVe//M.
RSA key fingerprint is MD5:52:b4:70:1d:c1:0e:aa:4d:32:8e:f8:7a:cb:f9:b8:7e.
Are you sure you want to continue connecting (yes/no)?

Lava cannot handle this.

To avoid this problem, add -o StrictHostKeyChecking=no to the arguments when invoking the connector script provided by lava.

Installation and Operation FAQ

The Lava Worker

ModuleNotFoundError: No module named 'psutil'

This error appears to be unique to the Amazon Linux 1 AMI due to a bug in the Python configuration on that AMI. Basically, binary modules, such as psutil, are getting installed in a lib64 directory by pip instead of lib and Python is not looking in lib64.

The fix is:

unset PYTHON_INSTALL_LAYOUT
python3 -m pip install psutil --upgrade
Note that this problem can also affect other modules including jinja2 and cx_Oracle. The fix is the same in each case.

The Lava Docker Images

Docker build fails : ERROR [internal] load metadata for docker.io...

There is a bug in some versions of docker buildkit which seems to occur when connecting to docker hub via a proxy. If this message occurs, try building from location that does not require a proxy connection.

Info

As of v8.1.0 (Kīlauea), lava docker images must be built with buildkit enabled.

The Lava Framework FAQ

General Questions

Retrospectively adding Jupyter notebook support

If Jupyter support was not requested when the project was originally created using cookiecutter, it can be enabled afterwards using the following process:

# Got to project root
cd  <PROJECT-ROOT>

# Add jupyter to requirements file
echo jupyter >> etc/requirements.txt

# Install jupyter support
make init

Retrospectively adding lava libraries

If the lava libraries were not requested when the project was originally created using cookiecutter, they can be added afterwards using the following process:

# Got to project root
cd  <PROJECT-ROOT>

# Add lava to requirements file
echo jinlava >> etc/requirements.txt

# Install
make init

Jinja Rendering FAQ

Does lava support recursive rendering?

No.

For any job type that supports Jinja rendering, lava performs only a single rendering pass. So, for example, it is not possible to include Jinja syntax in a vars variable that references an element in globals as this would require two rendering passes to first inject the global into the var and then render the Jinja content of the var.

You are in a maze of twisty little passages, all alike.