Connectors¶
Lava provides a mechanism to assist with connections to external resources to minimise the need for individual jobs to manage connectivity details and credentials. This also simplifies the process of migrating jobs from one environment or lava realm to another.
Configuration information for connection handlers (aka connectors) is stored in the connections table. The required fields are dependent on the connector type.
Note
Loosely speaking, a connector is a handler for a specific type of resource and a connection is an instance of a connector for a specific instance of a resource.
It probably doesn't matter that much and, historically, the user guide has played fast and loose with this distinction.
The underlying implementation of connectors is specific to the type of target resource and the job type. Lava attempts to provide a connection handle to jobs in a form that is relatively native to the job type. For example, a database connection for an sql job is provided as a Python DBAPI 2.0 connection instance. For an exe or pkg job, it is provided as a command line wrapper that handles credential management and connectivity behind the scenes.
Connector credentials are typically stored in the AWS SSM Parameter Store to
provide a level of isolation and security. SSM parameters for a given realm
should be stored with parameter names starting with /lava/<REALM>/.
Connectors are implemented using a simple plugin architecture and new ones can be added relatively easily.
Database Connectors¶
Lava provides database connectors for a number of common database types, including MySQL, Postgres and Oracle.
If used with sql, sqlc, sqli, sqlv, db_from_s3 and redshift_unload jobs, lava manages the connection process in the background.
If used with exe, pkg and docker jobs, lava provides an environment variable pointing to a script that will connect to the database to run SQL. The executable in the job payload can run the script to access the database without worrying about managing database connectivity.
Python programs in job payloads can access the lava connector subsystem directly to obtain either a DBAPI 2.0 connection object or an SQLAlchemy engine object. Refer to Developing Lava Jobs for more information.
Database Authentication Using AWS SSM Parameter Store¶
The database connectors typically require a number of connection and authentication parameters to be specified, such as:
- host name
- port
- user name
- password.
These can be defined explicitly in the connection specification, except for the password. By default, the value of this field is interpreted as the name of an encrypted SSM parameter that contains the actual password.
The standard lava worker IAM policies will provide read access to SSM parameters
with names of the form /lava/<REALM>/*. These must be encrypted with the realm
KMS key lava-<REALM>-sys.
Database Authentication Using AWS Secrets Manager¶
The lava database connectors support the AWS Secrets Manager as an alternative source for some of the connection specification parameters where they are not provided directly in the specification.
If the connection specification contains a secret_id field, a field in the
named secret will be used to populate a missing component in the connector
specification.
Note that Secrets Manager and lava use slightly different naming conventions for fields. Lava will map Secrets Manager fields to lava fields automatically using the following translation:
| Secrets Manager Field | Lava Field |
|---|---|
| dbClusterIdentifier | description |
| dbname | database |
| host | host |
| password | password |
| port | port |
| serviceName | service_name |
| sid | sid |
| username | user |
The standard lava worker IAM policies will provide read access to secrets with
names of the form /lava/<REALM>/*. These must be encrypted with the realm KMS
key lava-<REALM>-sys.
Database Authentication Using IAM Credential Generation¶
Some AWS database types provide an IAM based mechanism for obtaining temporary database credentials. Lava supports this mechanism for some connectors. The mechanism will be used where the connection specification (after inclusion of any AWS Secrets Manager components) does not contain a password.
Refer to individual connector details for more information.
Database Client Application Identification¶
Some database types support a mechanism for the client to identify itself when connecting, in addition to the user authentication. This information may then be available in things such as connection logs, activity logs etc. The mechanism used is database dependent and not all databases provide a mechanism.
Lava attempts to provide a uniform interface to the underlying database client identification mechanism where possible.
For most of the built in database related job types, lava will automatically
provide a client identifier when connecting. By default, this is in the form
lv-<REALM>-<JOB-ID>. (See the CONN_APP_NAME worker configuration
parameter.)
Support in sqlc jobs is dependent on the capabilities of the database specific CLI tool used to support the connection. Likewise for executable job types (e.g. exe and pkg) using a CLI based connector. See also Connection Handling for Executable Jobs.
When using the lava API get_pysql_connection(), a new, optional
application_name parameter is available. If a value is not provided, a value
in the form described above is used if the lava job ID can be determined from
the presence of a LAVA_JOB_ID environment variable. This should work whenever
the API is being used within a lava job. See also
Connection Handling for Python Based Jobs.
In short, in most normal usage patterns for databases for which lava supports client identification, it will, more or less, do the right thing without modifying jobs or additional configuration.
Lava's support for a client identification mechanism is summarised in the following table:
| Job Type | MS SQL | MySQL | Oracle | Postgres | Redshift | SQLite |
|---|---|---|---|---|---|---|
| sql | Yes | Yes | Yes | Yes | ||
| sqli | Yes | Yes | Yes | Yes | ||
| sqlc | Yes | Yes | ||||
| sqlv | Yes | Yes | Yes | Yes | ||
| db_from_s3 | Yes | Yes | Yes | Yes | ||
| redshift_unload | Yes | |||||
| lava-sql CLI | (1) | (1) | (1) | (1) | ||
| Lava API | (2) | (2) | (2) | (2) |
Notes:
-
The lava-sql utility will automatically populate a client connection identifier when used as part of a lava job payload. In other usages, the
-a/--app-nameargument will need to be specified. -
The
get_pysql_connection()API will automatically populate a client connection identifier when used as part of a lava job payload. In other usages, the otherwise optionalapplication_nameparameter will need to be specified.
Note
This article by Andy Grunwald was very helpful when implementing database client identification in lava: your database connection deserves a name
Client Application Identification for Postgres¶
Postgres flavoured databases use the application_name connection parameter to
identify client connections. Postgres will truncate the supplied value to 63
characters.
The following sample query will display connected application names.
SELECT usename, application_name, client_addr, backend_type
FROM pg_stat_activity;
Client Application Identification for Redshift¶
Redshift, like Postgres, uses the application_name connection parameter to
identify client connections. Redshift allows application names up to 250
characters.
The following sample query can display application names:
SELECT RTRIM(username) AS user,
sessionid,
SUBSTRING(event, 1, 20) AS event,
recordtime,
RTRIM(authmethod) AS auth,
RTRIM(sslversion) AS ssl,
RTRIM(application_name) AS app_name
FROM stl_connection_log
ORDER BY recordtime DESC;
Client Application Identification for MySQL¶
MySQL use the program_name connection parameter to identify client
connections.
The performance schema must be enabled to run queries that access the
program_name parameter. For AWS Aurora instances, see
Turning on the Performance Schema for Performance Insights on Aurora MySQL
for information on enabling the performance schema.
The following sample query, when run as an admin user, shows currently active connections:
SELECT
session_connect_attrs.ATTR_VALUE AS program_name,
processlist.*
FROM information_schema.processlist
LEFT JOIN performance_schema.session_connect_attrs ON (
processlist.ID = session_connect_attrs.PROCESSLIST_ID
AND session_connect_attrs.ATTR_NAME = "program_name"
)
The following query shows active connections for the current user:
SELECT
session_account_connect_attrs.ATTR_VALUE AS program_name,
processlist.*
FROM information_schema.processlist
LEFT JOIN performance_schema.session_account_connect_attrs ON (
processlist.ID = session_account_connect_attrs.PROCESSLIST_ID
AND session_account_connect_attrs.ATTR_NAME = "program_name";
Client Application Identification for SQL Server (MS SQL)¶
SQL Server uses the program_name connection parameter to identify client
connections.
The following sample query, when run as an admin user, shows currently active connections:
SELECT hostname, program_name, loginame, cmd
FROM sys.sysprocesses
WHERE loginame != 'rdsa';
Other Connectors¶
Lava also provides connectors for various other types of resource, including sFTP servers, SharePoint sites, SMB fileshares and the AWS CLI. These are typically used either by a job type that is specific to the target resource or in exe, pkg and docker jobs.
Connector type: aws¶
The aws connector manages access to AWS access keys. It supports static access keys as well as session credentials obtained by assuming an IAM role in either the current AWS account or another account.
Note
IAM assumed role session credentials are new in version 8.1 (Kīlauea).
When used with redshift_unload jobs,
this connector provides the access keys that are used in the S3 AUTHORIZATION
parameters in the UNLOAD command.
When used with db_from_s3 jobs, this connector provides the access keys that are used to provide the database the required access to S3 to load the data.
When used with exe and pkg jobs, it provides an environment variable pointing to a script that will run the AWS CLI with an appropriate AWS authentication profile.
| Field | Type | Required | Description |
|---|---|---|---|
| access_keys | String | Note 1 | The name of an encrypted SSM parameter containing the access keys. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/.... The value must be in the format access_key_id,access_secret_key and must be a secure string encrypted using the lava-<REALM>-sys KMS key. |
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| duration | Duration | No | The duration of a session created when an IAM role is assumed. Defaults to the value of the AWS_CONN_DURATION configuration parameter. As AWS credentials are cached, it is critical that this is significantly longer than the cache duration as specified by the AWS_ACCESS_KEY_CACHE_TTL parameter. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| external_id | String | No | Name of an SSM parameter containing an external ID to use when assuming a role to obtain session credentials. While AWS does not consider this to be a sensitive security parameter, it is stored in the SSM parameter store for ease of management. It is still recommended to use a secure parameter. Can't hurt. |
| policy_arns | String | List[String] | No | The ARNs of IAM managed policies to use as managed session policies. The policies must exist in the same account as the role. The session permissions are the intersection of these policies and the policies of the role being assumed. It is not possible to expand the underlying role permissions. |
| policy | Map[String,*] | No | An IAM policy to use as an inline session policy. The value must be a fully-formed AWS IAM policy. The session permissions are the intersection of the specified policy and the policies of the role being assumed. It is not possible to expand the underlying role permissions. |
| region | String | No | The AWS region name. If not specified, the current region is assumed. |
| role_arn | String | Note 1 | The ARN of an IAM role to assume to obtain session credentials. |
| tags | Map[String,String] | No | A map of session tags to pass. See Tagging Amazon Web Services STS Sessions. |
| type | String | Yes | aws. |
Notes:
-
One of
access_keysorrole_arnmust be specified. -
If a
role_arnis specified, The trust policy on the role must allow it to be assumed by the lava worker. If session tags are specified using thetagsfield, the trust policy must also permit this. -
When assuming a role, the lava worker will set the role session name. By default, this is in the form
lv-<REALM>-<JOB-ID>, cleansed as necessary to satisfy the requirements for session names. (See theCONN_APP_NAMEworker configuration parameter.)
Using the AWS Connector in Shell Scripts¶
The aws connector creates a small shell script that is a wrapper around the AWS CLI that handles the access keys. The shell script is a drop in replacement for the AWS CLI when used in lava jobs.
Consider the following exe job:
{
"description": "Show usage of aws CLI connector in a shell script",
"enabled": true,
"job_id": "aws-cli-example",
"parameters": {
"connections": {
"aws1": "aws-conn-id-1",
"aws2": "aws-conn-id-2"
}
},
"payload": "example/aws-cli-conn.sh",
"type": "exe",
"worker": "core"
}
The connections element in the parameters will result in lava preparing
connector shell scripts whose names are placed in the environment variables
LAVA_CONN_AWS1 and LAVA_CONN_AWS2 respectively.
The payload (example/aws-cli-conn.sh in this example) can then use these
scripts just like the AWS CLI. For example:
#!/bin/bash
$LAVA_CONN_AWS1 sts get-caller-identity
$LAVA_CONN_AWS2 s3 ls
Using the AWS Connector in Python¶
Note
New in version 8.1 (Kīlauea).
Python jobs can call the lava connector subsystem directly via the lava API.
Consider the following exe job:
{
"description": "Show usage of aws CLI connector in a Python program",
"enabled": true,
"job_id": "aws-python-example",
"parameters": {
"connections": {
"aws3": "aws-conn-id-3"
}
},
"payload": "example/aws-cli-conn.py",
"type": "exe",
"worker": "core"
}
Once again, lava will create a shell script accessed via the LAVA_CONN_AWS3
environment variable. It will also populate the LAVA_CONNID_AWS3 environment
variable with the connection ID. This can be used with the lava connector API to
obtain a boto3
Session
object, thus:
import os
from lava.connection import get_aws_session
realm = os.environ['LAVA_REALM']
# Note we want the connection ID, not the CLI script here.
conn_id = os.environ['LAVA_CONNID_AWS3']
# Use the lava API to the connection subsystem to obtain a boto3 Session.
aws_session = get_aws_session(conn_id, realm)
sts = aws_session.client('sts')
print(sts.get_caller_identity())
Note
A Python script can use the CLI script as well (e.g. via the subprocess module) but why would you want to?
Connector type: docker¶
The docker connector manages access to a docker daemon and docker registry for use with docker jobs.
Lava supports the following registry options:
- AWS ECR
- Private docker registries
- The standard docker public registry.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| String | No | Email address for registry login. | |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| password | String | No | Name of the SSM parameter containing the password for authenticating to the registry. Required for private docker repositories. Ignored for ECR registries. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. |
| registry | String | No | Either the URL for a standard registry or ecr[:account-id]. In the latter case, lava will connect to the AWS ECR registry in the specified AWS account or the current account if no account-id is specified. If no registry is specified, the default public docker registry is used. |
| server | String | No | URL for the docker server. If not specified, then the normal docker environment variables are used. Generally, this means using the local docker daemon accessed via the UNIX socket. |
| timeout | Number | No | Timeout on docker API calls in seconds. |
| tls | Boolean | No | Use TLS when connecting to the docker server. Default True. |
| type | String | Yes | docker. |
| user | String | No | User name for authenticating to the registry. Required for private docker repositories. Ignored for ECR registries. |
Accessing External Registries¶
Lava prefers to obtain its docker images from the local AWS ECR. It's safer, simpler and more robust than relying on external registries to provide safe, secure code at run-time, particularly for a production environment.
Tip
If you need to use an external image, copy it to the local AWS ECR and use it from there. The lava job framework will place the built payloads for docker jobs in ECR. A trivial Dockerfile can copy an external image as part of the build process.
If you must do this damn fool thing, lava permits it. There are some considerations:
-
Private registries (i.e. requiring authentication to access) will require a connection specification as described above, including the registry identifier and credentials. The registry will also be part of the image name as usual.
-
Public registries, such as Docker Hub and public repositories on GitHub Container Registry (GHCR), can be addressed by a common connection specification containing neither registry, nor credentials. The registry will be part of the image name as usual (except for Docker Hub which is the default registry).
-
Proxies can be a problem. Lava will not help you here. The docker daemon proxy configuration will need to be handled at the platform level, however that is done.
Examples¶
This is the standard connection specification for the local AWS ECR.
{
"type": "docker",
"conn_id": "docker/ecr",
"description": "Docker ECR connection",
"enabled": true,
"registry": "ecr",
}
This connection specification should handle most public registries.
{
"type": "docker",
"conn_id": "docker/public",
"description": "Docker basic connection (covers public repos)",
"enabled": true
}
This connection specification is for a private registry on the Github Container Registry:
{
"type": "docker",
"conn_id": "docker/ghcr/xyzzy",
"description": "Github Container Registry for user xyzzy",
"enabled": true,
"registry": "ghcr.io"
"user": "not-used-by-ghcr",
"password": "/lava/my-realm/ghcr/xyzzy/access-token"
}
Connector type: email¶
The email connector provides a generic interface for an email sending
subsystem. It is implemented by one or more actual email handlers. The email
subsystem type is selected by the subtype field in the connection
specification. Each subtype may have extra field requirements of its own.
Currently supported email handler subtypes are:
-
ses: AWS Simple Email Service (SES) -
smtp: SMTP, including optional TLS support.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| from | String | No | The email address that is sending the email. While this is not a mandatory field in the connector, there must be a value available at the time an email is sent, either from the job itself, the connection specification or an email handler specific mechanism. It is strongly recommended to include a default value in the connection specification. |
| reply_to | String or List[String] | No | The default reply-to email address(es) for messages. |
| subtype | String | No | Specifies the underlying email handler. If not specified, ses is assumed, in which case the field requirements for this subtype must be met. |
| type | String | Yes | email. |
Subtype: ses¶
The ses subtype uses AWS Simple Email Service to send email.
The following fields are specific to the ses subtype.
| Field | Type | Required | Description |
|---|---|---|---|
| configuration_set | String | No | Use the specified SES Configuration Set when sending an email. If not specified, the value specified by the SES_CONFIGURATION_SET realm configuration parameter is used. |
| from | String | No | The email address that is sending the email. This email address must be either individually verified with Amazon SES, or from a domain that has been verified with Amazon SES. If not specified, the value specified by the SES_FROM realm configuration parameter is used. A value must be specified by one of these mechanisms. |
| region | String | No | The AWS region name for the SES service. If not specified, the value specified by the SES_REGION realm configuration parameter is used, which itself defaults to us-east-1. |
| subtype | String | No | Either ses or missing. |
Subtype: smtp¶
The smtp subtype uses standard SMTP to send email. SMTP over TLS is also
supported.
The following fields are specific to the smtp subtype.
| Field | Type | Required | Description |
|---|---|---|---|
| host | String | Yes | The SMTP server host DNS name or IP address. |
| password | String | Sometimes | The name of an encrypted SSM parameter containing the SMTP server password. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. This field is required if the host field is specified. |
| port | Number | No | The SMTP port number. If not specified, the default is 25 without TLS and 465 with TLS. Note that Gmail requires TLS on port 587. |
| subtype | String | Yes | smtp |
| tls | Boolean | No | If true, use SMTP over TLS. Default is false. |
| user | String | No | SMTP server user name. If specified, the password field must also be specified. If not specified, the connection will be unauthenticated. |
Using the Email Connector¶
The email connector provides two distinct interfaces:
Python Interface for Email Connectors¶
Python scripts can directly access the underlying Python interface of an email
connector. In this case, the connector returns a lava.lib.email.Emailer
object as described in the lava API documentation.
As an example, consider an exe job specification that looks something like this:
{
"job_id": "...",
"parameters": {
"connections": {
"email": "email-connection-id"
}
},
"payload": "my-payload.py ..."
}
A Python program can use the email connector like this:
import os
from lava.connection import get_email_connection
# If running as a lava exe/pkg/docker, get some info provided by lava in the
# environment. Assume our connector is labeled `email` in the job spec.
realm = os.environ['LAVA_REALM']
conn_id = os.environ['LAVA_CONNID_EMAIL']
# We can use the email connection as a context manager
with get_email_connection(conn_id, realm) as emailer:
emailer.send(
subject='Oh no',
message='Your oscillation overthruster has malfunctioned',
to='Buckaroo.Banzai@dimension8.com',
cc=[
'Professor.Hikita@dimension8.com',
'Sidney Zweibel@dimension8.com'
]
)
Executable Interface for Email Connectors¶
When used with exe,
pkg and
docker job types (e.g. shell scripts), the
connection is implemented by the lava-email command.
When used as a connection script within a lava job, the -r REALM and
-c CONN_ID arguments don't need to be provided by the job as these are
provided by lava in the connection script.
Also, values for the --from and --reply-to options will be provided by lava
if it has values available from the connection specification or other
configuration data. These values can be overridden by providing the appropriate
options to then connection script.
lava-email Usage
As an example, consider an exe job specification that looks something like this:
{
"job_id": "...",
"parameters": {
"connections": {
"email": "email-connection-id"
}
},
"payload": "my-payload.sh ..."
}
Note the email connection. This will provide the job with an environment
variable LAVA_CONN_EMAIL which points to the executable handling the
connection.
If the job payload is a shell script, the connector would be invoked thus:
# Send an email with a text message body.
$LAVA_CONN_EMAIL --to Buckaroo.Banzai@dimension8.com --subject "Oh no" <<!
Dear Buckaroo,
Your oscillation overthruster has malfunctioned.
-- John Bigbooté
!
# But wait -- we can do HTML as well
$LAVA_CONN_EMAIL --to Buckaroo.Banzai@dimension8.com --subject "Oh no" <<!
<HTML>
<BODY>
<P>Dear Buckaroo,</P>
<P>Your oscillation overthruster has malfunctioned</P>
<P>-- John Bigbooté</P>
</BODY>
</HTML>
!
Connector type: generic¶
The generic connector provides a general purpose mechanism to group a set of associated attributes together and have them made available to lava jobs at run-time. Lava doesn't actually connect to any external resources other than to obtain attribute values.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| attributes | Map[String,*] | Yes | A map comprising the attributes for the connector. The keys are the attribute names and the values are either simple scalars or another map specifying how to obtain the value. See below for more information. |
| type | String | Yes | generic. |
Specifying Generic Connector Attribute Values¶
The attributes field of the generic connector specifies the names of the
connector attributes and how the attribute values are obtained. The following
variants are supported.
Simple Scalar Attributes¶
Simple scalar attributes are specified thus:
{
"attributes": {
"name": "value"
}
}
In addition to string values, integer and float values are also supported.
Local Parameters¶
This is an alternative syntax to the simple scalar attribute syntax described above.
{
"attributes": {
"name": {
"type": "local",
"value": "value"
}
}
}
SSM Parameters¶
Values from SSM parameters are specified thus:
{
"attributes": {
"name": {
"type": "ssm",
"parameter": "SSM parameter name"
}
}
}
Lava will obtain the value from the SSM parameter store, decrypting as required.
Example Generic Connector Specification¶
{
"conn_id": "widget-conn-id",
"description": "Sample generic connector",
"enabled": true,
"type": "generic",
"attributes": {
"a": "a string",
"b": {
"type": "local",
"value": 30
},
"c": {
"type": "ssm",
"parameter": "/lava/<REALM>/my_var"
}
}
}
Using the Generic Connector¶
The generic connector provides two distinct interfaces:
Python Interface for Generic Connectors¶
Python scripts can directly access the underlying Python interface of a generic connector. In this case, the connector returns a dictionary of resolved attribute values.
As an example, consider an exe job specification that looks something like this:
{
"job_id": "...",
"parameters": {
"connections": {
"widget": "widget-connection-id"
}
},
"payload": "my-payload.py ..."
}
A Python program can use the generic connector like this:
import os
from lava.connection import get_generic_connection
# If running as a lava exe/pkg/docker, get some info provided by lava in the
# environment. Assume our connector is labeled `widget` in the job spec.
realm = os.environ['LAVA_REALM']
conn_id = os.environ['LAVA_CONNID_WIDGET']
attributes = get_generic_connection(conn_id, realm)
The attributes dictionary would then look like:
{
'a': 'a string',
'b': 30,
'c': 'Value of SSM parameter /lava/<REALM>/my_var'
}
Executable Interface for Generic Connectors¶
When used with exe, pkg and docker job types (e.g. shell scripts), the connection is implemented by a simple script that can be used to obtain the value of individual attributes.
As an example, consider an exe job specification that looks something like this:
{
"job_id": "...",
"parameters": {
"connections": {
"widget": "widget-connection-id"
}
},
"payload": "my-payload.sh ..."
}
Note the widget connection. This will provide the job with an environment
variable LAVA_CONN_WIDGET which points to the executable handling the
connection.
If the job payload is a shell script, the connector would be invoked thus:
# Get the values of the attributes
ATTR_A=$($LAVA_CONN_WIDGET a)
ATTR_B=$($LAVA_CONN_WIDGET b)
ATTR_C=$($LAVA_CONN_WIDGET c)
Connector type: git¶
The git connector manages access to Git repositories by providing support for managing SSH private keys.
When used with exe and pkg jobs, it provides an environment variable pointing to a script that will run the Git CLI with SSH keys managed in the background.
Note that only SSH access to repositories is supported. HTTPS is not supported.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| ssh_key | String | Yes | The name of an encrypted SSM parameter containing the SSH private key. There must not be any passphrase on the key. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. Refer to the ssh connector for more information on how to prepare and store the key. |
| ssh_options | List[String] | No | A list of SSH options as per ssh_config(5). e.g. StrictHostKeyChecking=no |
| type | String | Yes | git. |
Connector type: mariadb-rds¶
This is currently a synonym for mysql.
It has been defined in the event of future feature differences between conventional MySQL and AWS RDS MariaDB.
Connector type: mariadb¶
This is a synonym for mysql.
## Connector type: mssql
The mssql connector handles connections to Microsoft SQL Server databases.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| database | String | Yes* | The name of the database within the database server. |
| driver | String | No | The ODBC driver specification. This must correspond to the name of a section in /etc/odbcinst.ini. The default is FreeTDS. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| host | String | Yes* | The database host DNS name or IP address. |
| password | String | Yes* | The name of an encrypted SSM parameter containing the password. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. |
| port | Number | Yes* | The database port number. |
| preserve_case | Boolean | No | If true, don't fold database object names to lower case when quoting them for use in db_from_s3 jobs. The default is false (i.e. case folding is enabled). |
| secret_id | String | No | Obtain missing fields from AWS Secrets Manager. More information. |
| subtype | String | No | Specifies the underlying DBAPI 2.0 driver. The default and only allowed value is pyodbc. |
| timeout | Integer | No | Connection timeout in seconds. If not specified, no timeout is applied. |
| type | String | Yes | mssql. |
| user | String | Yes* | Database user name. |
Info
Fields with a Required column marked with * can have a value provided
directly in the connection specification or indirectly via AWS Secrets
Manager using the secret_id field. See Database Authentication Using AWS
Secrets Manager for
more information.
SSL connections are not currently supported.
When used with exe and pkg job types, the connection is implemented by the lava-sql CLI.
Note
There are some MSSQL CLI tools that come with the TDS or unixODBC packages.
None of them are wonderful so for now lava-sql will have to do. Also not
wonderful but what do you expect for free?
Implementation Notes¶
The current implementation requires the following components be installed and configured on the lava worker:
Configuring unixODBC with Free TDS
Connector type: mysql-aurora¶
The mysql-aurora connector handles connections to AWS RDS Aurora MySQL database clusters. This is almost a synonym for mysql. Key differences are:
-
The db_from_s3 job can take advantage of an AWS facility to load data directly from S3.
-
Database authentication using IAM credential generation is supported.
| Field | Type | Required | Description |
|---|---|---|---|
| ca_cert | String | No | The name of a file containing the CA certificate for the database server. Ignored unless ssl is true. |
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| database | String | Yes* | The name of the database (schema) within the database server. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| host | String | Yes* | The database host DNS name or IP address. |
| password | String | No* | The name of an encrypted SSM parameter containing the password. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. If not specified, the worker will attempt to generate temporary IAM user credentials. |
| port | Number | Yes* | The database port number. |
| preserve_case | Boolean | No | If true, don't fold database object names to lower case when quoting them for use in db_from_s3 jobs. The default is false (i.e. case folding is enabled). |
| secret_id | String | No | Obtain missing fields from AWS Secrets Manager. More information. |
| ssl | Boolean | No | Set to true to enable SSL. Default is false. |
| type | String | Yes | mysql-aurora |
| user | String | Yes* | Database user name. |
Info
Fields with a Required column marked with * can have a value provided
directly in the connection specification or indirectly via AWS Secrets
Manager using the secret_id field. See Database Authentication Using AWS
Secrets Manager for
more information.
When used with exe and
pkg job types, the connection is implemented by
the mysql CLI. Apart from the connection parameters, it is invoked with the
following options:
mysql --batch --connect-timeout=10
Creating Temporary IAM User Credentials for AWS RDS Aurora MySQL¶
If the password field is not present in the connection specification, lava
will attempt to
generate temporary IAM credentials using the generate-db-auth-token mechanism.
The specified user must already exist in the database. Enable IAM authentication for a user thus:
CREATE USER a_user IDENTIFIED WITH AWSAuthenticationPlugin AS 'RDS';
The IAM policy attached to the worker will need to contain an element something like this:
"Statement": [
{
"Sid": "GetRdsCreds",
"Effect": "Allow",
"Action": "rds-db:connect",
"Resource": [
"arn:aws:rds-db:ap-southeast-2:123456789012:dbuser:db-JMH2...6KW6Q/a_user"
]
}
]
The DB instance ID for use in the IAM policy can be obtained thus:
aws rds describe-db-instances --db-instance-identifier 'DB_ID' \
--query 'DBInstances[0].DbiResourceId' --output text
Info
SSL is mandatory when using temporary IAM user credentials.
Connector type: mysql-rds¶
This is currently a synonym for mysql-aurora.
It has been defined in the event of future feature differences between conventional AWS RDS Aurora MySQL and AWS RDS MySQL.
Connector type: mysql¶
The mysql connector handles connections to MySQL compatible databases.
| Field | Type | Required | Description |
|---|---|---|---|
| ca_cert | String | No | The name of a file containing the CA certificate for the database server. Ignored unless ssl is true. |
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| database | String | Yes* | The name of the database (schema) within the database server. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| host | String | Yes* | The database host DNS name or IP address. |
| password | String | Yes* | The name of an encrypted SSM parameter containing the password. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. |
| port | Number | Yes* | The database port number. |
| preserve_case | Boolean | No | If true, don't fold database object names to lower case when quoting them for use in db_from_s3 jobs. The default is false (i.e. case folding is enabled). |
| secret_id | String | No | Obtain missing fields from AWS Secrets Manager. More information. |
| ssl | Boolean | No | Set to true to enable SSL. Default is false. |
| type | String | Yes | mysql. |
| user | String | Yes* | Database user name. |
Info
Fields with a Required column marked with * can have a value provided
directly in the connection specification or indirectly via AWS Secrets
Manager using the secret_id field. See Database Authentication Using AWS
Secrets Manager for
more information.
When used with exe and
pkg job types, the connection is implemented by
the mysql CLI, either the MySQL Community version, or the MariaDB version,
depending on the variant installed on the worker. These have some minor CLI
parameter differences which lava manages for the connection parameters. Apart
from the connection parameters, it is invoked with the following options:
mysql --batch --connect-timeout=10
Connector type: oracle-rds¶
This is currently a synonym for oracle.
It has been defined in the event of future feature differences between conventional Oracle and AWS RDS Oracle.
Connector type: oracle¶
The oracle connector handles connections to Oracle databases.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| database | String | No* | A deprecated synonym for sid. |
| description | String | No | Description. |
| edition | String | No | Oracle version for compatibility in the form x.y[.z]. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| host | String | Yes* | The database host DNS name or IP address. |
| password | String | Yes* | The name of an encrypted SSM parameter containing the password. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. |
| port | Number | Yes* | The database port number. |
| secret_id | String | No | Obtain missing fields from AWS Secrets Manager. More information. |
| service_name | String | No* | The Oracle data base service name. Generally exactly one of service_name or sid must be specified. |
| sid | String | No* | The Oracle System Identifier of the database. Generally exactly one of service_name or sid must be specified. |
| type | String | Yes | oracle. |
| user | String | Yes* | Database user name. |
Info
Fields with a Required column marked with * can have a value provided
directly in the connection specification or indirectly via AWS Secrets
Manager using the secret_id field. See Database Authentication Using AWS
Secrets Manager for
more information.
When used with exe and
pkg job types, the connection is implemented by
the SQL*Plus
CLI,
sqlplus. Apart from the connection parameters, it is invoked with the
following options:
sqlplus -NOLOGINTIME -L -S -C <version>
The SQL*Plus CLI is a particularly contrary beast. It is important to explicitly
exit the CLI using an EXIT command at the end of any session or else it will
drop into interactive mode and sit there waiting for further commands until the
job reaches its timeout and is killed by lava. A safer approach is to send
commands to the connector via stdin, thus:
# Assume our conn_id is ora
$LAVA_CONN_ORA <<!
SELECT whatever FROM whichever;
!
When used with sql jobs, do not terminate the SQL with a semi-colon or a syntax error results.
When used with sqlc jobs, SQL commands must be terminated with a semi-colon or either a syntax error or no output will result.
Security Warnings¶
Oracle CLI clients, including sqlplus, do not provide any means to automate
login to the database without specifying the password on the command line. This
means the password is exposed in a process listing. Do not use the
oracle command line connector on any worker that has multi-user access.
The oracle connector does not currently support SSL/TLS.
Connector type: postgres-aurora¶
This connector support AWS RDS Aurora PostgreSQL clusters. This is almost a synonym for postgres. Key differences are:
-
The db_from_s3 job can take advantage of an AWS facility to load data directly from S3.
-
Database authentication using IAM credential generation is supported.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| database | String | Yes* | The name of the database within the database server. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| host | String | Yes* | The database host DNS name or IP address. |
| password | String | No* | The name of an encrypted SSM parameter containing the password. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. If not specified, the worker will attempt to generate temporary IAM user credentials. |
| port | Number | Yes* | The database port number. |
| preserve_case | Boolean | No | If true, don't fold database object names to lower case when quoting them for use in db_from_s3 jobs. The default is false (i.e. case folding is enabled). |
| secret_id | String | No | Obtain missing fields from AWS Secrets Manager. More information. |
| ssl | Boolean | No | Set to true to enable SSL. Default is false |
| subtype | String | No | Specifies the underlying DBAPI 2.0 driver. The default is pg8000 which should be used wherever possible. The pygresql driver is also available. |
| type | String | Yes | psql or postgres. |
| user | String | Yes* | Database user name. |
Info
Fields with a Required column marked with * can have a value provided
directly in the connection specification or indirectly via AWS Secrets
Manager using the secret_id field. See Database Authentication Using AWS
Secrets Manager for
more information.
When used with exe and
pkg job types, the connection is implemented by
the psql CLI. Apart from the connection parameters, it is invoked with the
following options:
psql --no-psqlrc --quiet --set ON_ERROR_STOP=on --pset footer=off
Creating Temporary IAM User Credentials for AWS RDS Aurora PostgreSQL¶
If the password field is not present in the connection specification, lava
will attempt to
generate temporary IAM credentials using the generate-db-auth-token mechanism.
The specified user must already exist in the database. Enable IAM authentication for a user thus:
CREATE USER a_user;
GRANT rds_iam TO a_user;
Info
SSL is mandatory when using temporary IAM user credentials.
Psql CLI Password Limitations¶
The psql CLI will not accept passwords in a PGPASS file (or entered interactively) that are longer than a certain (undocumented) length. IAM based authentication for RDS involves temporary passwords that are much longer than this limit.
To workaround this limitation, lava has to put long passwords into an environment variable. While this is not ideal from a security perspective, at least the passwords are short lived.
Connector type: postgres-rds¶
This is currently a synonym for postgres-aurora.
It has been defined in the event of future feature differences between conventional AWS RDS Aurora PostgreSQL and AWS RDS PostgreSQL.
Connector type: postgres¶
The postgres connector handles connections to Postgres compatible databases.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| database | String | Yes* | The name of the database within the database server. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| host | String | Yes* | The database host DNS name or IP address. |
| password | String | Yes* | The name of an encrypted SSM parameter containing the password. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. |
| port | Number | Yes* | The database port number. |
| preserve_case | Boolean | No | If true, don't fold database object names to lower case when quoting them for use in db_from_s3 jobs. The default is false (i.e. case folding is enabled). |
| secret_id | String | No | Obtain missing fields from AWS Secrets Manager. More information. |
| ssl | Boolean | No | Set to true to enable SSL. Default is false |
| subtype | String | No | Specifies the underlying DBAPI 2.0 driver. The default is pg8000 which should be used wherever possible. The pygresql driver is also available. |
| type | String | Yes | psql or postgres. |
| user | String | Yes* | Database user name. |
Info
Fields with a Required column marked with * can have a value provided
directly in the connection specification or indirectly via AWS Secrets
Manager using the secret_id field. See Database Authentication Using AWS
Secrets Manager for
more information.
When used with exe and
pkg job types, the connection is implemented by
the psql CLI. Apart from the connection parameters, it is invoked with the
following options:
psql --no-psqlrc --quiet --set ON_ERROR_STOP=on --pset footer=off
Connector type: psql¶
This is a synonym for postgres.
Connector type: redshift-serverless¶
This is the connector for Redshift Serverless clusters.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| database | String | Yes* | The name of the database within the Redshift Serverless namespace. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| external_id | String | No | Name of an SSM parameter containing an external ID to use when assuming the IAM role specified by role_arn when generating temporary IAM user credentials. While AWS does not consider this to be a sensitive security parameter, it is stored in the SSM parameter store for ease of management. It is still recommended to use a secure parameter. Can't hurt. |
| host | String | Yes* | The Redshift serverless workgroup endpoint address. |
| password_duration | String | No | The password duration when generating temporary IAM user credentials in the form nnX where nn is a number and X is s (seconds), m (minutes) or h (hours). If not specified, the default worker configuration is used. Limits imposed by the Redshift Serverless GetCredentials API apply. |
| password | String | No* | The name of an encrypted SSM parameter containing the password. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. If not specified, the worker will attempt to generate temporary IAM user credentials. |
| port | Number | Yes* | The Redshift serverless workgroup port number. |
| preserve_case | Boolean | No | If true, don't fold database object names to lower case when quoting them for use in db_from_s3 jobs. The default is false (i.e. case folding is enabled). |
| role_arn | String | No | The ARN of an IAM role that will be assumed when generating temporary IAM user credentials. |
| secret_id | String | No | Obtain missing fields from AWS Secrets Manager. More information. |
| ssl | Boolean | No | Set to true to enable SSL. Default is false. |
| subtype | String | No | Specifies the underlying DBAPI 2.0 driver. See Redshift Connector Subtypes below. |
| type | String | Yes | redshift-serverless. |
| user | String | Yes* | Database user name. |
| workgroup | String | No | The name of the workgroup associated with the database. This is used when generating temporary IAM user credentials. If required and not specified, the first component of the host field is used. |
Info
Fields with a Required column marked with * can have a value provided
directly in the connection specification or indirectly via AWS Secrets
Manager using the secret_id field. See Database Authentication Using AWS
Secrets Manager for
more information.
Redshift Serverless Connector Subtypes¶
The subtype field of the connection specification allows selection of different
database drivers.
| Subtype | Description |
|---|---|
| pg8000 | Pg8000 is the default if no subtype is specified. |
| redshift | This is the AWS Redshift connector. |
Creating Temporary IAM User Credentials for Redshift Serverless¶
Note
The AWS documentation on this leaves a lot to be desired.
If a password is not obtained from the password field or secrets manager, lava
will attempt to use the Redshift Serverless
GetCredentials
API to generate temporary IAM-based database user credentials.
Unlike the Redshift provisioned GetClusterCredentials API, the Redshift Serverless GetCredentials API does not allow the target database user name to be specified. The username is derived automatically from the IAM principal as follows:
-
For IAM users, the database username is
IAM:<IAM-USER-NAME>. -
For IAM roles, the database username is
IAMR:<IAM-ROLE-NAME>.
If the user does not already exist in the database, it will be automatically created and given access to the public schema. This is daft but that's how it is. The user can be created manually or given additional database permissions via the normal GRANT mechanism, as required.
This can be very limiting in terms of fine grained access control from lava to
Redshift. To provide some flexibility, the Redshift Serverless connector can
assume a different IAM role prior to generating database access credentials by
specifying the role_arn (and optional external_id) elements in the
connection specification. The assumed role is then the one that will determine
the database user name.
For example, assume the lava worker normally operates under the IAM role
lava-prod-worker-core. If no role_arn is specified, the database user will
be IAMR:lava-dev-worker-core.
If role_arn is arn:aws:iam::123456789123:role/rs01, the database user will
be IAMR:rs01.
The IAM policy attached to the lava-dev-worker-core role will need to contain
something like this:
"Statement": [
{
"Sid": "AssumeRoleForRedshiftServerlessAccess"
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resource": [
"arn:aws:iam::123456789123:role/rs01"
]
}
]
The IAM policy attached to the rs01 role will need to contain something like
this:
"Statement": [
{
"Sid": "GetRedshiftServerlessCreds",
"Effect": "Allow",
"Action": "redshift-serverless:GetCredentials",
"Resource": [
"arn:aws:redshift-serverless:ap-southeast-2:123456789123:workgroup/3741886a-223d-446f-a77c-a5d0e7b5ad32"
]
}
]
The trust policy for the rs01 role will need to contain the elements necessary
to allow it to be assumed by lava-dev-worker-core.
Note
Lava currently does not cache temporary credentials. Watch out for throttling
on the GetCredentials API.
Connector type: redshift¶
This is the connector for Redshift provisioned clusters. It can also be used for Redshift Serverless clusters except when IAM generated user credentials are used. In that case, the redshift-serverless connector must be used.
This connector is similar to postgres. Note
that some operations are specific to Redshift and are not supported on
conventional Postgres databases (e.g. the COPY and UNLOAD commands).
| Field | Type | Required | Description |
|---|---|---|---|
| cluster_id | String | No | The Redshift cluster identifier. If required and not specified, the first component of the host name is used. |
| conn_id | String | Yes | Connection identifier. |
| database | String | Yes* | The name of the database within the database server. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| host | String | Yes* | The database host DNS name or IP address. |
| password_duration | String | No | The password duration when generating temporary IAM user credentials in the form nnX where nn is a number and X is s (seconds), m (minutes) or h (hours). If not specified, the default worker configuration is used. Limits imposed by the GetClusterCredentials API apply. |
| password | String | No* | The name of an encrypted SSM parameter containing the password. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. If not specified, the worker will attempt to generate temporary IAM user credentials. |
| port | Number | Yes* | The database port number. |
| preserve_case | Boolean | No | If true, don't fold database object names to lower case when quoting them for use in db_from_s3 jobs. The default is false (i.e. case folding is enabled). |
| secret_id | String | No | Obtain missing fields from AWS Secrets Manager. More information. |
| ssl | Boolean | No | Set to true to enable SSL. Default is false. |
| subtype | String | No | Specifies the underlying DBAPI 2.0 driver. See Redshift Connector Subtypes below. |
| type | String | Yes | redshift. |
| user | String | Yes* | Database user name. |
Info
Fields with a Required column marked with * can have a value provided
directly in the connection specification or indirectly via AWS Secrets
Manager using the secret_id field. See Database Authentication Using AWS
Secrets Manager for
more information.
Redshift Connector Subtypes¶
The subtype field of the connection specification allows selection of different
database drivers.
| Subtype | Description |
|---|---|
| pg8000 | Pg8000 is the default if no subtype is specified. |
| redshift | This is the AWS Redshift connector. |
Info
As of version 8.1 (Kīlauea), the Redshift connector no longer supports PyGreSQL. This is not a lava change. PyGreSQL just doesn't work with Redshift any more.
Creating Temporary IAM User Credentials for Redshift¶
If the password field is not present in the connection specification, lava
will attempt to use the Redshift
GetClusterCredentials
API to generate temporary IAM-based database user credentials.
The specified user must already exist in the database as lava (deliberately)
does not support AutoCreate of users.
Lava will specify the target cluster ID, database and target user in the credentials request. This means that the IAM policy attached to the worker will need to contain an element something like this:
"Statement": [
{
"Sid": "GetRedshiftCreds",
"Effect": "Allow",
"Action": "redshift:GetClusterCredentials",
"Resource": [
"arn:aws:redshift:ap-southeast-2:123456789012:dbuser:cluster_id/target_user",
"arn:aws:redshift:ap-southeast-2:123456789012:dbname:cluster_id/mydb"
]
}
]
Info
Lava currently does not cache temporary credentials. Watch out for
throttling on the GetClusterCredentials API.
Connector type: ses¶
Warning
This is a legacy implementation. It is now deprecated and will be removed in a future release. Use the email connector instead.
The ses connector provides access to the AWS Simple Email Service (SES).
If can be used only with exe and pkg jobs. It provides an environment variable pointing to a script that will run the AWS CLI with appropriate parameters to access the SES service.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| from | String | No | The email address that is sending the email. This email address must be either individually verified with Amazon SES, or from a domain that has been verified with Amazon SES. If not specified, the value specified by the SES_FROM realm configuration parameter is used. A value must be specified by one of these mechanisms. |
| region | String | No | The AWS region name for the SES service. If not specified, the value specified by the SES_REGION realm configuration parameter is used, which itself defaults to us-east-1. |
| reply_to | String or List[String] | No | The reply-to email address(es) for messages. |
| return_path | String | No | The email address that bounces and complaints will be forwarded to when feedback forwarding is enabled. |
In an exe or package pkg job, the job specification will look something like this:
{
"job_id": "...",
"parameters": {
"connections": {
"email": "email-connection-id"
}
},
"payload": "my-payload.sh ..."
}
email connection. This will provide the job with an environment
variable LAVA_CONN_EMAIL which points to the executable handling the
connection.
If the job payload is a shell script, the connector would be invoked thus:
# Send an email with a text message body.
$LAVA_CONN_EMAIL --to fred@somewhere.com --subject "Hello Fred" --text msg.txt
# But wait -- we can do HTML as well
$LAVA_CONN_EMAIL --to fred@somewhere.com --subject "Hello Fred" --html msg.html
# Or read from stdin. The connector will look for <HTML> at start of message
# to determine if message is text or HTML.
$LAVA_CONN_EMAIL --to fred@somewhere.com --subject "Hello Fred" < msg.xxx
The connector script accepts the following arguments:
-
--to email ...
--cc email ...
--bcc email ...One or more recipient email addresses.
-
--subject text
Message subject.
-
--text filename
File containing the text body of the message. Optional.
-
--html filename
File containing the HTML body of the message. Optional.
If neither --text nor --html options are specified, the message body is
read from stdin. If the content begins with <HTML> (case insensitive), the
connector will send it as HTML otherwise as text.
Connector type: sharepoint¶
The sharepoint connector manages connections to SharePoint sites.
It is possible for Microsoft to have made this process more complex and unwieldy, but it is not obvious how.
| Field | Type | Required | Description |
|---|---|---|---|
| client_id | String | Yes | The Application ID that the SharePoint registration portal assigned your app. This resembles a UUID. |
| client_secret | String | Yes | Name of the SSE parameter containing the client secret. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. |
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| https_proxy | String | No | HTTPS proxy to use for accessing the SharePoint API endpoints. If not specified, the HTTPS_PROXY environment variable is used, if set. |
| org_base_url | String | Yes | The hostname component of the organisation's SharePoint base URL. e.g. acme.sharepoint.com. |
| password | String | Yes | Name of the SSM parameter containing the password for authenticating to SharePoint. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. |
| site_name | String | Yes | The SharePoint site name. |
| tenant | String | Yes | The Azure AD registered domain ID. This resembles a UUID. |
| type | String | Yes | sharepoint. |
| user | String | No | User name for authenticating to SharePoint. |
The connector supports the sharepoint_get_doc, sharepoint_get_list, sharepoint_put_doc, sharepoint_put_list and sharepoint_get_multi_doc and job types.
Using SharePoint Connectors¶
The sharepoint connector provides two distinct interfaces:
Python Interface for SharePoint Connectors¶
The sharepoint connector can be used with Python based
exe and pkg
jobs that invoke the lava connection manager directly. In this case, the
connector returns a lava.lib.sharepoint.Sharepoint object as described in the
lava API documentation. In summary, this class has the following methods:
delete_all_list_items(list_id, list_name)
get_doc(lib_name, path, out_file)
get_list(list_name, out_file, system_columns=None, data_columns=None,
header=True, **csv_writer_args)
put_doc(lib_name, path, src_file, title=None)
put_list(list_name, src_file, mode='append', error_missing=False,
data_columns=None, **csv_reader_args)
get_multi_doc(lib_name, path, out_path, glob=None)
close()
If the SharePoint connector key in the job's connectors map is spoint,
typical usage would be something like:
import os
from lava.connection import get_sharepoint_connection
# Get a lava.lib.sharepoint.Sharepoint instance
sp_conn = get_sharepoint_connection(
conn_id=os.environ['LAVA_CONNID_SPOINT'],
realm=os.environ['LAVA_REALM']
)
# Get a list from SharePoint and store it locally.
row_count = sp_conn.get_list('postcodes', 'postcodes.csv', delimiter=',')
# Close the connection
sp_conn.close()
Executable Interface for SharePoint Connectors¶
When used with exe,
pkg and
docker job types (e.g. shell scripts), the
connection is implemented by the lava-sharepoint command.
This is a somewhat higher level interface to the connector in that it can also handle moving data in and out of S3. Jinja rendering is handled as per the sharepoint_get_list, sharepoint_put_list, sharepoint_get_doc, sharepoint_put_doc and sharepoint_get_multi_doc job types.
If the SharePoint connector key in the job's connectors map is spoint,
usage is:
Usage for the get-doc sub-command:
Usage for the get-list sub-command:
Usage for the get-doc sub-command:
Usage for the put-doc sub-command:
Usage for the get-multi-doc sub-command:
The following examples show how to use the connector in an exe job using bash:
#!/bin/bash
# Copy a list from S3 to SharePoint, replacing existing contents.
$LAVA_CONN_SPOINT put-list --replace s3://my-bucket/data.csv My-List
# Get list back from SharePoint and place in S3. Include a header
$LAVA_CONN_SPOINT get-list -k alias/data --delimiter "," \
My-List s3://my-bucket/data.csv
# Copy a document from S3 to SharePoint.
$LAVA_CONN_SPOINT put-doc s3://my-bucket/lava.docx "Lava Docs:/Lava/User Guide.docx"
# Get a document from SharePoint and place in S3.
$LAVA_CONN_SPOINT get-doc "Lava Docs:/Lava/User Guide.docx" s3://my-bucket/lava.docx
# Get all docx files from SharePoint path and place in S3 base-prefix.
$LAVA_CONN_SPOINT get-multi-doc "Lava Docs:/Lava/" s3://my-bucket/base-prefix *.docx
# Get all files from SharePoint path and place in S3 base-prefix.
$LAVA_CONN_SPOINT get-multi-doc "Lava Docs:/Lava/" s3://my-bucket/base-prefix
Connector type: slack¶
The slack connector uses Slack webhooks to send messages to Slack channels. The target Slack workspace and channel are specified in Slack itself when the webhook is created.
| Field | Type | Required | Description |
|---|---|---|---|
| colour | Style | No | Default colour for the sidebar for Slack messages sent using attachment style. This can be any hex colour code or one of the Slack special values good, warning or danger. If not specified a default value is used. |
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| from | String | No | An arbitrary source identifier for display in Slack messages. If not specified, a default value is constructed when required. |
| preamble | String | No | Default preamble at the start of Slack messages. Useful values include things such as <!here> and <!channel> which will cause Slack to insert @here and @channel alert tags respectively. If not specified, no preamble is used. |
| style | String | No | Display style for Slack messages. Options are block (default), attachment and plain. The first two use the corresponding block or attachment message construction mechanism provided by Slack to make messages more presentable. |
| type | String | Yes | slack. |
| webhook_url | String | Yes | The webhook URL provided by Slack for sending messages. |
Using the Slack Connector¶
The slack connector provides two distinct interfaces:
Python Interface for Slack Connectors¶
Python scripts can directly access the underlying Python interface of a slack
connector. In this case, the connector returns a lava.lib.slack.Slack
object as described in the lava API documentation.
As an example, consider an exe job specification that looks something like this:
{
"job_id": "...",
"parameters": {
"connections": {
"slack": "slack-connection-id"
}
},
"payload": "my-payload.py ..."
}
A Python program can use the slack connector like this:
import os
from lava.connection import get_slack_connection
# If running as a lava exe/pkg/docker, get some info provided by lava in the
# environment. Assume our connector is labeled `slack` in the job spec.
realm = os.environ['LAVA_REALM']
conn_id = os.environ['LAVA_CONNID_SLACK']
# Get a slack connection
slacker = get_slack_connection(conn_id, realm)
# Send a formatted message
slacker.send(
subject='Oh no',
message='Your oscillation overthruster has malfunctioned',
style='attachment', # Overrides value in connection spec.
colour='#ff0000' # Nice bright red. Overrides value in connection spec.
)
Executable Interface for Slack Connectors¶
When used with exe,
pkg and
docker job types (e.g. shell scripts), the
connection is implemented by the lava-slack command.
When used as a connection script within a lava job, the -r REALM and
-c CONN_ID arguments don't need to be provided by the job as these are
provided by lava in the connection script.
Also, values for the ---bar-colour, --from, --preamble and --style
options will be supplied from the connection specification where possible.
These values can be overridden by providing the appropriate options to then
connection script.
As an example, consider an exe job specification that looks something like this:
{
"job_id": "...",
"parameters": {
"connections": {
"slack": "slack-connection-id"
}
},
"payload": "my-payload.sh ..."
}
Note the slack connection. This will provide the job with an environment
variable LAVA_CONN_SLACK which points to the executable handling the
connection.
If the job payload is a shell script, the connector would be invoked thus:
# Send a Slack message
$LAVA_CONN_SLACK --subject "Oh no" <<!
Dear Buckaroo,
Your oscillation overthruster has malfunctioned.
-- John Bigbooté
!
Connector type: smb¶
The smb connector manages connections to SMB file shares.
Info
The smb connector has undergone a significant upgrade in v8.0 (Incahuasi) to support the smbprotocol SMB implementation as well as the existing pysmb. The former has a number of advantages (e.g. DFS support). An effort has been made to retain backward compatibility for lava jobs, notwithstanding the two implementations have significant interface differences. Be warned, though, that some more esoteric usage patterns could experience a backward compatibility issue.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| domain | String | No | The network domain. Defaults to an empty string. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| encrypt | Boolean | No | Whether to encrypt the connection between Lava and the SMB server. Only available with the smbprotocol connection subtype. Default false. |
| host | String | Yes | DNS name or IP address of the SMB host. |
| is_direct_tcp | Boolean | No | If false, use NetBIOS over TCP/IP. If true use SMB over TCP/IP. Default false. |
| my_name | String | No | Local NetBIOS machine name that will identify the origin of connections. If not specified, defaults to the first 15 characters of lava-<REALM> |
| password | String | Yes | Name of the SSM parameter containing the password for authenticating to the SMB server. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. |
| port | Integer | No | Connection port number. If not specified, 139 is used if is_direct_tcp is false and 445 otherwise. |
| remote_name | String | Yes | NetBIOS machine name of the remote server. |
| subtype | String | No | Which connection type to use, smbprotocol or the default pysmb. To use encryption or DFS for the connection use the smbprotocol subtype. |
| type | String | Yes | smb. |
| use_ntlm_v2 | Boolean | No | Indicates whether pysmb should be NTLMv1 or NTLMv2 authentication algorithm for authentication. Default is true. |
| user | String | Yes | User name for authenticating to the SMB server. |
The connector supports the smb_get and smb_put job types.
Use with Python-based Executable Jobs¶
The connector can also be used with Python based
exe and pkg
jobs that invoke the lava connection manager directly. In this case, the
connector returns a lava.lib.smb.LavaSMBConnection which provides a basic,
common interface to the different subtypes.
The lava.lib.smb.LavaSMBConnection interface class provides enough
functionality for most common use-cases (list path, put file, get file etc.).
The concrete implementation is handled by of one of two subclasses (depending on
the subtype given in the connection spec):
-
a
lava.lib.smb.PySMBConnectionwhich implementsLavaSMBConnectionusing the Python package pysmb. This is the default if no connectionsubtypeis given. -
a
lava.lib.smb.SMBProtocolConnectionwhich implementsLavaSMBConnectionusing the Python package smbprotocol.
Note that this is the low level connector. It does not handle moving files in or out of S3 or Jinja rendering of parameters. It is up to the caller to do that as required.
If the SMB connector key in the job's connectors map is fserver, typical
usage would be something like:
import os
from lava.connection import get_smb_connection
# Get an smb.SMBConnection.SMBConnection instance
smb_conn = get_smb_connection(
conn_id=os.environ['LAVA_CONNID_FSERVER'],
realm=os.environ['LAVA_REALM']
)
# Get a file from share 'Public' and store locally
with open('local.txt', 'wb') as fp:
attributes, size = smb_conn.retrieve_file('Public', 'some_file.txt', fp)
smb_conn.close()
Use with Other Executable Jobs¶
When used with other exe and
pkg job types (e.g. shell scripts), the
connection is implemented by the lava-smb command.
This is a somewhat higher level interface to the connector in that it can also handle moving data in and out of S3. Jinja rendering is handled as per the smb_get and smb_put job types.
If the SMB connector key in the job's connectors map is fserver, usage is:
Usage for the get sub-command:
Usage for the put sub-command:
For example, the following code in an exe
job would transfer files between S3 and the Public share on an SMB server:
#!/bin/bash
# Copy file from S3 to SMB
$LAVA_CONN_FSERVER put --mkdir \
s3://my-bucket/data.csv Public:/a/path/data.csv
# Copy file from SMB to S3
$LAVA_CONN_FSERVER get --kms-key-id alias/data \
Public:/a/path/data.csv s3://my-bucket/data.csv
Connector type: sqlite3¶
The sqlite3 connector handles connections to SQLite3 file based databases.
Its use in general lava jobs is pretty marginal at best. It is mostly present to facilitate testing of lava itself.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| host | String | Yes | The name of the file containing the SQLite3 database. If it starts with s3://, the file will be copied from S3 when the connection is created and returned to S3 when the connection is closed if it has been modified. |
| port | Number | Yes* | A value is required but is ignored. |
| preserve_case | Boolean | No | If true, don't fold database object names to lower case when quoting them for use in db_from_s3 jobs. The default is false (i.e. case folding is enabled). |
| type | String | Yes | sqlite3. |
| user | String | Yes* | A value is required but is ignored. |
Info
Fields with a Required column marked with * must be present but the
value is ignored. This is an unfortunate interface idiosyncrasy resulting
from the need to maintain some internal compatibility with the other
database connectors.
When used with exe and
pkg job types, the connection is implemented by
the sqlite3 CLI. It is invoked with the following options:
sqlite3 -bail -batch DATABASE-FILE
Connector type: ssh, scp, sftp¶
This group of connectors provides support for the SSH family of clients.
When used with exe and pkg jobs, each connector provides an environment variable pointing to a script that will run the corresponding CLI with SSH keys managed in the background.
| Field | Type | Required | Description |
|---|---|---|---|
| conn_id | String | Yes | Connection identifier. |
| description | String | No | Description. |
| enabled | Boolean | Yes | Whether or not the connection is enabled. |
| ssh_key | String | Yes | The name of an encrypted SSM parameter containing the SSH private key. There must not be any passphrase on the key. For a given <REALM>, the SSM parameter name must be of the form /lava/<REALM>/... and the value must be a secure string encrypted using the lava-<REALM>-sys KMS key. |
| ssh_options | List[String] | No | A list of SSH options as per ssh_config(5). e.g. StrictHostKeyChecking=no |
| type | String | Yes | ssh, sftp or scp. |
The process for saving an SSH private key in the SSM parameter store using the AWS CLI looks like this:
# Create a new SSH key
ssh-keygen -f mykey
# Upload the private key to the SSM parameter store. Here realm name is "dev"
aws ssm put-parameter --name "/lava/dev/ssh01/ssh-key" \
--description "SSH key for ssh01" \
--type SecureString \
--value "$(cat mykey)" \
--key-id alias/lava-dev-sys