Docma Template Rendering¶
The document rendering phase combines a compiled docma template with run-time specified parameters and dynamically generated content to produce a final output document.
The rendering process is slightly different for PDF and HTML outputs.
Rendering for PDF Outputs¶
The main steps in the process for PDF production are:
-
Marshal the rendering parameters.
-
Collect the list of documents to be incorporated into the final output PDF.
-
Render HTML documents in the component list using Jinja to inject the rendering parameters.
-
Convert the HTML documents to PDF using WeasyPrint. This process will also generate any dynamic content from specifications embedded in the source HTML.
-
Assemble all of the components (generated PDFs and any listed static PDFs) into a single PDF document.
-
Add any requested watermarking or stamping to the document.
-
Jinja render any required metadata specified in the template configuration file and add it to the PDF.
-
Optionally, compress the PDF using lossless compression. Depending on the PDF contents, compression may, or may not, help.
Rendering for HTML Outputs¶
The main steps in the process for HTML production are:
-
Marshal the rendering parameters.
-
Collect the list of documents to be incorporated into the final output HTML.
-
Render HTML documents in the component list using Jinja to inject the rendering parameters.
-
Process
<IMG>tags in the HTML to generate and embed any dynamic content from specifications embedded in the source HTML. Static images may also be embedded. -
Assemble all of the component HTML documents into a single HTML document.
-
Jinja render any required metadata specified in the template configuration file and add it to the HTML.
Docma Parameter Validation¶
Docma supports the use of JSON Schema to validate
rendering parameters at run-time. Parameters are validated against a schema
provided in the parameters->schema key in the
template configuration file prior to generating
the output document. Failing validation will halt the production process.
Tip
Provision, and hence use, of a parameter validation schema is optional, but highly recommended to reduce the risk of generating an important document incorrectly or with nonsensical values.
All of the normal facilities of JSON Schema are
available, except for external schema referencing with $ref directives. Like
the JSON Schema built-in string
formats,
docma provided format checkers can be used
in a schema specification with the format attribute of string objects
The following sample schema fragment shows how these are used:
type: object
properties:
customer_email:
type: string
format: email # This is a JSON schema built-in format checker
customer_abn:
type: string
format: au.ABN # This is a docma provided format checker
target_consumption:
type: number
minimum: 0
consumption_unit:
type: string
format: energy_unit # This is a docma provided format checker
start_date:
type: string
format: date.dmy # This is a docma provided format checker
See also Docma Format Checkers.
Dynamic Content Generation¶
When docma converts HTML into PDF or stand-alone HTML, it needs to resolve all
URLs in the source HTML in things such as <img src="..."> tags. It does this
via a custom URL fetcher that allows content requests to be intercepted and
the resulting content generated dynamically. In this way, docma can generate
dynamic content, such as charts, for inclusion in the final output document.
Note
There are some differences in this process depending on whether the final output is PDF of HTML. See Dynamic Content Generation Differences Between PDF and HTML Output.
All URLs are constituted thus:
Docma determines which custom URL fetcher to apply based on the URL scheme (i.e.
the first part before the colon). The URL fetchers handle a range of non-standard,
docma specific schemes, as well as the standard http and https schemes.
Docma currently handles the following non-standard schemes:
| Scheme | Description |
|---|---|
| docma | Interface to docma dynamic content generators of various types. |
| file | Interface to access files contained within the compiled document template. |
| s3 | Interface to access files from AWS S3. |
Note
The docma URL fetcher interface is easily expandable to handle other schemes. See URL Fetchers.
Dynamic Content Generation Differences Between PDF and HTML Output¶
PDF generation from HTML is performed by WeasyPrint, which will invoke a custom
URL fetcher for any URL it needs to access during the conversion process.
This includes, but is not limited to, <IMG> tags.
For standalone HTML output, the process of invoking a custom URL fetcher is done
by docma itself. It is only applied to the src attribute of <IMG> tags
under specific circumstances. When it is done, the src attribute is replaced
in the <IMG> tag with the actual content returned by the URL fetcher. i.e.
the data is embedded within the standalone HTML output.
In practice, these differences work naturally, relative to the final viewing environment for the produced document, static PDF or dynamic HTML.
By default, in HTML outputs, <IMG> tags have the content embedded
in place of the src attribute in the following circumstances:
-
The
srcURL is nothttp(s)://(i.e. any of the docma custom schemes described below); or -
The
srcURL ishttp(s)://, has no query component?..., and the content size is between 100 bytes and 1MB in size.
For the http(s):// URLs, it is possible to override the default behaviour by
adding the data-docma-embed attribute to the <IMG> tag.
For images that are not embedded, it is assumed that the client (e.g. an email client or web browser) will fetch the images as required at display time.
<!-- Force the image to be embedded -->
<IMG src="http://host/img.png" data-docma-embed="true">
<!-- Prevent the image from being embedded -->
<IMG src="http://host/img.png" data-docma-embed="false">
<!-- This will not be embedded due to size unless we force it -->
<IMG src="http://host/multi-mega-byte-img.png">
<!-- This will not be embedded due to size unless we force it -->
<IMG src="http://host/one-pixel-img.png">
<!-- This will not be embedded due to query component unless we force it -->
<IMG src="http://host/do/something?x=20">
<!-- This will always be embedded and cannot be prevented -->
<IMG src="s3://my-bucket/corporate-logo.png">
Scheme: docma¶
URLs of the following form are intercepted by docma and used to invoke a dynamic content generator.
Note that for these docma URLs, there is no netloc component and hence no //
in the URL.
For example, this will generate a QR code:
<IMG style="height: 40px"
src="docma:qrcode?text=Hello%s20world&fg=white&bg=red">
The URL should be properly URL encoded. This can be fiddly, but Jinja can help here. The example above could also have been written in dictionary format thus:
<IMG style="height: 40px" src=docma:qrcode?{{
{
'text': 'Hello world',
'fg': 'white',
'bg': 'red'
} | urlencode
}}">
It could also have been written as a sequence of tuples:
<IMG style="height: 40px" src=docma:qrcode?{{
(
('text', 'Hello world'),
('fg', 'white'),
('bg', 'red')
) | urlencode
}}">
Info
The sequence format is required if any of the parameters needs to be used more than once.
Available content generators are:
| Name | Description |
|---|---|
| qrcode | Generate a QR code. |
| swatch | Generate a colour swatch as graphic placeholder. |
| vega | Generate a chart based on the Vega-Lite declarative syntax for specifying charts / graphs. |
Note
The dynamic content generator interface is readily extensible to add new types of content. See Content Generators.
Generating QR Codes¶
The QR code dynamic generator accepts the following parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| bg | String | No | Background colour of the QR code (e.g. blue or #0000ff). Default is white. |
| border | Integer | No | Number of boxes thick for the border. Default is the minimum allowed value of 4. |
| box | Integer | No | Number of pixels for each box in the QR code. Default is 10. |
| fg | String | No | Foreground colour of the QR code. Default is black. |
| text | String | Yes | Content to be encoded in the QR code. |
Examples:
<IMG style="height: 40px"
src="docma:qrcode?text=Hello%s20world&fg=white&bg=red">
<IMG style="height: 40px" src=docma:qrcode?{{
{
'text': 'Hello world',
'fg': 'white',
'bg': 'red'
} | urlencode
}}">
Generating Charts and Graphs¶
Docma supports the Vega-Lite declarative syntax for specifying charts / graphs. Vega-Lite specifies a mapping between source data and visual representations of the data. Docma provides mechanisms for specifying and accessing various data sources and feeding this data through a Vega-Lite specification to generate charts and graphs.
This is a large topic and more information is provided in Charts and Graphs in Docma. To whet your appetite, check out the Vega-Lite sample gallery.
This section just summarises the parameters for the vega content generator for
reference:
| Parameter | Type | Required | Description |
|---|---|---|---|
| data | String | No | A docma data source specification. This argument can be repeated if multiple data sources are required. If not specified, the file referenced by the spec parameter must contain all of the required data. |
| format | String | No | Either svg (the default) or png. Stick to svg if at all possible. |
| spec | String | Yes | The name of the file in the compiled document template that contains the Vega-Lite specification for the chart. The contents can be either YAML or JSON. |
| ppi | Integer | No | (png format only) Pixels-per-inch resolution of the generated image. Default 72. |
| scale | Float | No | (png format only) Scale the chart by the specified factor. Default is 1.0. Generally, it's better to control display size in the HTML but increasing the scale here can improve resolution. |
| params | JSON string | No | A string containing a JSON encoded object containing additional rendering parameters used when rendering the chart specification and any associated query specifications. |
Examples:
<IMG style="width: 5cm;"
src="docma:vega?spec=charts/my-chart.yaml&data=...">
<IMG style="width: 10cm;" src=docma:vega?{{
(
( 'spec', 'charts/my-chart.yaml' ),
( 'data', 'file;data/my-data.csv' ),
( 'params', { 'extra_rendering_param': 1234 } | tojson)
) | urlencode
}}">
Generating Graphic Placeholders (Swatches)¶
The swatch generator produces a simple coloured rectangle with an optional text message. It's not intended to be useful in final documents, Mondrian notwithstanding. It has two purposes:
-
As a simple code sample for dynamic content generators that can be copied and modified for new requirements.
-
As a temporary placeholder when developing the structure of a docma template that will be replaced subsequently by a real piece of content (e.g. a chart).
| Parameter | Type | Required | Description |
|---|---|---|---|
| color | String | No | Fill colour of the swatch. Default is a light grey. |
| font | String | No | Font file name for the text. Default is Arial. If the specified font is not available, a platform specific default is used. |
| font_size | Integer | No | Font size. Default is 18. |
| height | Integer | Yes | Swatch height in pixels. |
| text | String | No | Text to centre in the swatch. No effort is made to manipulate it to fit. |
| text_color | String | No | Colour for text. Default is black. |
| width | Integer | Yes | Swatch width in pixels. |
!!! question Colour or color?
The code and docma templates stick with color, because, well, that battle
is lost. The user guide uses colour in descriptive text. Blame Webster for
messing it up, not me.
Examples:
<IMG src="docma:swatch?width=150&height=150&color=seagreen">
<IMG src="docma:swatch?{{ {
'width': 150,
'height': 150,
'color': '#0080ff',
'text': 'Hello world',
'text_color': 'yellow',
'font_size': 24
} | urlencode }}"
>
Scheme: file¶
URLs in HTML files of the form file:... are intercepted by docma and the
content is extracted from a file within the compiled document template. As the
file is local to the template, there is no network location so the URL will be
like so:
<IMG src="file:resources/logo.png" alt="logo">
Warning
Do not include // after file:. It will not work.
Scheme: s3¶
URLs in HTML files of the form s3://... are intercepted by docma and the
content is extracted from AWS S3. A typical usage would be something like:
<IMG src="s3://my-bucket/some/path/logo.png" alt="logo">
Info
Files are limited to 10MB in size.
Watermarking¶
Docma supports the ability to watermark and stamp PDF documents using the concept of document overlays.
Info
Overlays are not supported for HTML output documents.
An overlay is a PDF document, generated by docma that can be used as either a watermark or a stamp.
A watermark is content merged into every page of the final PDF under the main document content.
A stamp is content merged into every page of the final PDF over the main document content.
Overlays are defined in the template configuration file using the following structure:
overlays:
my-overlay-1:
# We can have HTML files that will be rendered like other docs
- a4-portrait.html
# ... or static PDFs
- a4-landscape.pdf
# or ...
my-overlay-2: a4-portrait.html
Each overlay is a named list of documents (or a single document). When docma
is requested to add a watermark (or stamp), it is provided with the name of one
or more of the overlays (e.g. my-overlay-1).
It will then render each of the files in each overlay list in the same way as the main document, including rendering with dynamic run-time parameters.
Each page in the main document is then merged with the first page of the first overlay document in each list that has (approximately) the same page dimensions.
Info
The process will abort if a matching overlay page cannot be found for a main document page.
The presence of the overlays section in the configuration file does not itself
enable watermarking / stamping. This has to be explicitly requested.
Watermarking / stamping can be requested using the --watermark / --stamp
CLI options. If using the Python API, the watermark / stamp parameters to the
render_template() function are used.
It is possible to have both watermarking and stamping used on a single document, as well as having multiple overlays applied to a single document.
Info
A simple grid overlay is provided as part of the basic template created by
the docma new command. This can be
handy when adjusting page layout. To add the grid, the docma CLI rendering
command would be docma pdf --stamp grid .... Grid size and colour are
adjustable in the parameter defaults in the template config file.
Document Metadata¶
Docma allows the template to control some of the metadata added to the final PDF or HTML and enforces some values of its own.
PDF and HTML documents have slightly different conventions regarding metadata naming and formatting. Docma handles these variations.
In HTML, the metadata fields are added into the <HEAD> of the final document
in this form:
<meta content="Fred Nurk" name="author"/>
<meta content="A document about stuff" name="title"/>
<meta content="DRAFT, Top-Secret" name="keywords"/>
<meta content="2024-11-21T00:04:38.699978+00:00" name="creation_date"/>
In PDF, the meta data fields are used to populate the standard metadata elements recognised by common PDF readers.
| HTML Naming | PDF Naming | Controlled by | Comments |
|---|---|---|---|
| author | /Author | Template | From the metadata->author key in config.yaml |
| creation_date | /CreationDate | Docma | Document production datetime |
| creator | /Creator | Docma | Based on template id, version and docma version |
| keywords | /Keywords | Template | From the metadata->keywords key in config.yaml |
| subject | /Subject | Template | From the metadata->subject key in config.yaml |
| title | /Title | Template | From the metadata->title key in config.yaml |
Batch Rendering¶
Docma supports the ability to generate a batch of output documents from a
single document template using the pdf-batch (PDF) and html-batch (HTML)
sub-commands of the docma CLI.
The document template needs to anticipate the need for batch rendering by including some Jinja controlled content that will be varied for each document produced via document specific parameters. The source for the document specific batch parameters is a docma data loader. Data returned by the data loader is merged in with the fixed rendering parameters, a row at a time, and docma produces an output document using that combination. The source data for the batch parameters is specified using a docma data source specification.
Note
The following describes the process for PDF document batches. The process is similar for HTML batches.
This is how a batch rendering is invoked:
# Long form arguments
docma pdf-batch --template my-template.zip \
--file static-params.yaml \
--data-source-spec 'postgres;pglocal;queries/batch.yaml' \
--output 'whatever-{{id}}-{{familyname|lower}}.pdf'
# Short form arguments
docma pdf-batch -t my-template.zip \
-f static-params.yaml \
-d 'postgres;pglocal;queries/batch.yaml' \
-o 'whatever-{{id}}-{{familyname|lower}}.pdf'
Let's examine this bit by bit.
The docma pdf-batch sub-command is invoked specifying the compiled document
template:
docma pdf-batch --template my-template.zip
Rendering parameters are specified exactly as for the single document rendering process. These parameters are the same for every document in the rendering batch:
--file static-params.yaml \
The docma data source specification tells docma how to obtain rows of data to control the batch rendering. Each row is a set of key/value pairs that will be merged into the static rendering parameters and used to render one PDF document:
--data-source-spec 'postgres;pglocal;queries/batch.yaml' \
The docma data source specification is interpreted within the context of the document template.
As docma will be producing a series of PDF documents, it needs a mechanism to
provide each document with a unique name that corresponds to the batch data
entry that was used to produce it. This is done using the --output option with
an argument that is Jinja rendered to construct the filename. In this example,
it is assumed that the batch data contains id and familyname elements and
that these are a unique combination to avoid filename clashes:
--output 'whatever-{{id}}-{{familyname|lower}}.pdf'
Note
There are some strict constraints on the filename rendering process for safety reasons.