Frequently Asked Questions¶
Scheduling and Dispatch FAQ¶
Scheduling¶
Can a job be scheduled to run when a worker reboots?¶
Yes … and no.
The @reboot crontab specification can be used in a job schedule
but what does that actually mean? Given that it will occur on the
dispatcher node, any job scheduled on a reboot will only get dispatched
when the dispatcher itself reboots. This may be useful, but it's not
obvious how.
Jobs FAQ¶
This section covers questions related to jobs in general as well as those specific to particular job types.
General Questions¶
Job failed: [Errno 26] Text file busy: '/tmp/lava/...'¶
This is the result of an S3 race condition. Essentially it's a lava internal error that should not happen.
The PAYLOAD_SETTLING_TIME can help control this.
No payload files downloaded from S3¶
The payload prefix specified does not refer to any objects in S3. It generally means the payload value in the job specification is incorrect. Note that this value is always relative to the payloads area in S3, not absolute.
Recursive download not supported¶
This error can occur for multiple job types that involve downloading a payload from the payload area in S3.
It occurs if the specified payload points to an S3 prefix that has nested folders underneath (in the sense that S3 fakes the notion of folder).
Overlapping runs for the same job¶
When a lava job is dispatched, a run ID (UUID) is assigned at the time of dispatch and a dispatch message is placed onto the job SQS queue for the target worker.
The lava worker must complete the job, successfully or otherwise, before the SQS queue message visibility timeout expires, otherwise the job will be re-queued by SQS for another run. The re-run will have the same run configuration, including run ID.
If multiple instances of a job run are active at the same time, it is likely that the job is running longer than the SQS worker queue visibility timeout.
Options in this situation are (in order of preference):
-
Redesign the job to run more efficiently so it completes within the message visibility period.
-
Use the
timeoutparameter on the job where available so that lava will kill the job before the message visibility timeout expires. -
Set the max_tries field in the job specification.
-
Increase the visibility timeout on the worker SQS queue. This should be done by updating the CloudFormation stack for the worker. SQS limits the visibility timeout to a maximum of 12 hours.
What timezone applies when a job is run?¶
Lava controls the timezone when a job is dispatched as it's obviously important that a job is run at the correct time.
Lava does not control the timezone in which a job executes and lava makes no promises about the timezone of the execution environment.
If a job runs natively on a lava worker, it will generally inherit the timezone of the host, which can be anything. If it runs as a docker job it may have a different timezone to the host.
The bottom line is that jobs need to either:
-
Avoid any timezone dependency in the execution environment (e.g. by always working with UTC or timezone aware timestamps); or
-
Explicitly control the execution timezone. For cmd, docker, exe and pkg jobs, this can be done by specifying the
TZenvironment variable in theenvparameter of the job specification. e.g.
{
"type": "exe",
"payload": "...",
"env": {
"TZ": "Australia/Darwin"
}
}
Danger signs in a job payload are usage of the date(1) command or use of
datetime.datetime.now() in a Python script.
Why are timezones all over the place in event records?¶
Records in the events table contain a bunch of different timestamps that may appear to have a random selection of timezones associated with them.
Timezone aware timestamps are not all rendered into UTC for two reasons:
-
The timezone information can sometimes be useful when investigating issues.
-
There is no particular need to convert to UTC as the timestamp is unambiguous.
This why they have the timezones that they do.
| Field | Timezone Explanation |
|---|---|
| ts_event | This timezone aware timestamp is generated by the job execution environment. For most jobs, the timezone is that of the host running the lava worker. For docker jobs, it is the timezone of the container, which may be different from the host. |
| ts_dispatch | This timezone aware timestamp is generated by the job dispatch environment. For scheduled jobs, this is the timezone of the host running the dispatch command via cron(8) and is unrelated to the timezone associated with the job schedule. For jobs dispatched by the dispatch helper or s3trigger, the dispatch event comes from an AWS lambda function and the associated timezone is UTC. For directly dispatched jobs, the timezone will be whatever timezone the dispatching entity happens to have. |
| tu_event | This is the timezone naive equivalent of ts_event in UTC. It's present purely as a convenience. |
| ttl | This is a lava internal field. Its the UNIX epoch timestamp used by DynamoDB to automatically expire and remove old event records. Don't touch it or bother trying to interpret it. |
db_from_s3¶
Error: invalid byte sequence for encoding “UTF8”¶
This typically means either:
-
The source data contains non-UTF8 characters; or
-
The source is actually a gzip file but the
GZIPoption has not been specified in the job parameters.
Error: HTTP 412. The file has been modified since the import call started¶
This is probably the result of an S3 race condition. It shouldn't happen. Report it if it does.
db_to_s3¶
There is no db_to_s3 job type. What should I do instead?¶
Yes. Sorry about that.
There are a number of alternatives.
For Redshift, use redshift_unload which is optimised to extract data from Redshift efficiently.
For other database types, the lava-sql utility can cover many use cases. It can extract data from all supported database types into CSV, JSONL and Parquet format. It can be called directly from within the payload of exe, pkg and docker jobs.
It can also be used directly in a cmd job as a DIY version of a
db_to_s3 job. This is what such a job might look like in the YAML format
supported by the lava job framework:
description: Ersatz db_to_s3 type of job
type: cmd
job_id: <{ prefix.job }>/ersatz-db-to-s3
worker: <{ worker.main }>
enabled: true
owner: <{ owner }>
payload: >-
/bin/sh -c
"
echo 'SELECT a, b, c FROM my_schema.my_table' |
lava-sql
--format csv
--header
--conn-id '<{ db.conn_id }>'
--output '{{ realm.s3_temp }}/{{ job.job_id }}/output.csv'
-
"
event_log: Output file is {{ realm.s3_temp }}/{{ job.job_id }}/output.csv
Note the final - argument to lava-sql tells it to
read the query from stdin.
We can also load the query from S3 instead of embedding it in the job, thus:
description: Ersatz db_to_s3 type of job
type: cmd
job_id: <{ prefix.job }>/ersatz-db-to-s3
worker: <{ worker.main }>
enabled: true
owner: <{ owner }>
payload: >-
/bin/sh -c
"
lava-sql
--format csv
--header
--conn-id '<{ db.conn_id }>'
--output '{{ realm.s3_temp }}/{{ job.job_id }}/output.csv'
'{{ realm.s3_payloads }}/<{ prefix.payload }>/xyz.rsc/query.sql'
"
event_log: Output file is {{ realm.s3_temp }}/{{ job.job_id }}/output.csv
docker¶
You must specify a region¶
If a docker job requires access to AWS resources (e.g. S3, the lava connections manager etc.) it will be using the boto3 Python module to do so. This requires the AWS region in which it runs to be specified. Rather than hard-wire that into the container, the easiest way to set this is by adding the following environment variable to the parameters in the job specification:
{
"parameters": {
"env": {
"AWS_DEFAULT_REGION": "ap-southeast-2"
}
}
}
Note that the IAM role for the worker should provide the authentication requirements for docker containers running on that worker.
exe¶
Error: job ... (...) failed: [Errno 8] Exec format error: '/tmp/lava/...'¶
This error indicates that the operating system tried to run a script as a binary when it isn't. It generally means that the hashbang line is missing from the beginning of a script file indicating what interpreter to use. For a Python script it will look exactly like this:
#!/usr/bin/env python3
This error can also occur when an executable has been edited on a DOS system and has acquired DOS CRLF line endings instead of UNIX LF line endings.
lavasched¶
Failing lavasched jobs¶
This is the result of a bad crontab spec in a newly added job.
Prior to version 5.0.0 (Cotopaxi), it could be tricky to localise the bad job specification. Since Cotopaxi, output from lavasched jobs now includes a context diff of the new crontab vs the old one to simplify tracking of changes and problem diagnosis.
pkg¶
Error: job ... (...) failed: [Errno 8] Exec format error: '/tmp/lava/...'¶
This error indicates that the operating system tried to run a script as a binary when it isn't. It generally means that the hashbang line is missing from the beginning of a script file indicating what interpreter to use. For a Python script it will look exactly like this:
#!/usr/bin/env python3
This error can also occur when an executable has been edited on a DOS system and has acquired DOS CRLF line endings instead of UNIX LF line endings.
In what timezone will a pkg job run?¶
By default, all jobs run in the timezone setting of the worker itself. This is unrelated to the dispatch timezone.
For a pkg job, the timezone for the job run
can be specified by adding the TZ environment variable to the job
specification, thus:
{
"type": "pkg",
"payload": "...",
"env": {
"TZ": "Australia/Darwin"
}
}
sql¶
Error: need to escape, but no escapechar set¶
When lava collects the output of a query from an sql job, it uses the standard Python CSV writer to format the data for output. If it detects that the data requires an escape character to be specified but this has not been done in the job specification, this error will result.
Connectors FAQ¶
This section covers questions related to connectors in general as well as those specific to particular job types.
General Questions¶
Why is psycopg2 not supported?¶
In a word, licensing. Psycopg2 is LGPL.
Why is pygresql not supported?¶
As of v5.0.0 (Cotopaxi), pygresql can be specified
as the driver to use for Postgres family connectors using the subtype field in
the connection specification.
It is strongly recommended to stick with the default of pg8000 for the db_from_s3 and redshift_unload job types.
Why would I use pygresql instead of pg8000?¶
Insert performance of pg8000 is poor. For Python based jobs inserting a lot of data, pygresql should give better performance.
Is SQLAlchemy supported?¶
As of v5.0.0 (Cotopaxi), the SQL based connectors provide native support for SQLAlchemy. An SQLAlchemy engine can be created using a lava connector to manage the underlying connection process.
ssh, scp, sftp¶
Authentication Failures¶
A possible cause of authentication failures is the default behaviour of ssh, scp and sftp in situations where it cannot verify the fingerprint of the remote host's public key. When this occurs, the client will drop into interactive mode to ask for confirmation to accept the key, like so:
The authenticity of host 'sftp.xyzzy.com (192.219.1.1)' can't be established.
RSA key fingerprint is SHA256:TXYwAhoSEfm6Me6RtFHJRUEGL9lTuHqySI6GyxVe//M.
RSA key fingerprint is MD5:52:b4:70:1d:c1:0e:aa:4d:32:8e:f8:7a:cb:f9:b8:7e.
Are you sure you want to continue connecting (yes/no)?
Lava cannot handle this.
To avoid this problem, add -o StrictHostKeyChecking=no to the arguments when
invoking the connector script provided by lava.
Installation and Operation FAQ¶
The Lava Worker¶
ModuleNotFoundError: No module named 'psutil'¶
This error appears to be unique to the Amazon Linux 1 AMI due to a bug in the
Python configuration on that AMI. Basically, binary modules, such as psutil,
are getting installed in a lib64 directory by pip instead of lib and Python
is not looking in lib64.
The fix is:
unset PYTHON_INSTALL_LAYOUT
python3 -m pip install psutil --upgrade
jinja2 and
cx_Oracle. The fix is the same in each case.
The Lava Docker Images¶
Docker build fails : ERROR [internal] load metadata for docker.io...¶
There is a bug in some versions of docker buildkit which seems to occur when connecting to docker hub via a proxy. If this message occurs, try building from location that does not require a proxy connection.
Info
As of v8.1.0 (Kīlauea), lava docker images must be built with buildkit enabled.
The Lava Framework FAQ¶
General Questions¶
Retrospectively adding Jupyter notebook support¶
If Jupyter support was not requested when the project was originally created using cookiecutter, it can be enabled afterwards using the following process:
# Got to project root
cd <PROJECT-ROOT>
# Add jupyter to requirements file
echo jupyter >> etc/requirements.txt
# Install jupyter support
make init
Retrospectively adding lava libraries¶
If the lava libraries were not requested when the project was originally created using cookiecutter, they can be added afterwards using the following process:
# Got to project root
cd <PROJECT-ROOT>
# Add lava to requirements file
echo jinlava >> etc/requirements.txt
# Install
make init
Jinja Rendering FAQ¶
Does lava support recursive rendering?¶
No.
For any job type that supports Jinja rendering, lava performs only a single
rendering pass. So, for example, it is not possible to include Jinja syntax in
a vars variable that references an element in globals as this would require
two rendering passes to first inject the global into the var and then render the
Jinja content of the var.