Extra Inputs
Both vtlengine.run() and vtlengine.semantic_analysis()
accept two optional parameters — value_domains and
external_routines — that extend a VTL script with
membership checks and SQL-based transformations respectively.
This page documents the definition format, input options, VTL syntax, and validation for each feature.
See also
Example 5: Run with multiple Value Domains and External Routines as dictionaries. — Walkthrough example using both features as dictionaries
Example 6: Run using Paths — Walkthrough example using both features as Path objects
VTL 2.1 Reference Manual — Full VTL specification
Value Domains
A Value Domain is a named set of unique values that share
a common data type. Value domains are used with the in
and not_in operators to perform membership checks
in VTL scripts.
Definition Format
Each value domain is a JSON object with three required fields:
Field |
Type |
Description |
|---|---|---|
|
string |
Unique identifier referenced in the VTL script. |
|
string |
Data type of the values. See supported types below. |
|
array |
List of unique values belonging to the domain.
Items must match the declared |
Example:
{
"name": "Countries",
"type": "String",
"setlist": ["DE", "FR", "IT", "ES"]
}
Multiple value domains can be provided as a JSON array:
[
{
"name": "Countries",
"type": "String",
"setlist": ["DE", "FR", "IT"]
},
{
"name": "Thresholds",
"type": "Integer",
"setlist": [10, 20, 50, 100]
}
]
Supported Types
Type |
|
|---|---|
|
JSON integer (e.g. |
|
JSON number (e.g. |
|
JSON string (e.g. |
|
JSON boolean ( |
|
JSON string in date format (e.g. |
|
JSON string in time period format
(e.g. |
|
JSON string as ISO 8601 interval
(e.g. |
|
JSON string in duration format (e.g. |
Input Options
The value_domains parameter accepts the following formats:
Dictionary: A single value domain as a Python dict.
Path to a JSON file: A
Pathpointing to a.jsonfile containing one or more value domain definitions.Path to a directory: A
Pathpointing to a directory; all.jsonfiles in the directory are loaded.List: A list mixing any of the above formats.
from pathlib import Path
# Single dict
value_domains = {
"name": "Countries",
"type": "String",
"setlist": ["DE", "FR", "IT"],
}
# Path to file
value_domains = Path("data/value_domains.json")
# Path to directory (loads all .json files)
value_domains = Path("data/value_domains/")
# List of mixed formats
value_domains = [
{"name": "Countries", "type": "String", "setlist": ["DE", "FR"]},
Path("data/extra_domains.json"),
]
VTL Usage
Value domains are referenced in VTL scripts using the in
and not_in operators. These can be used in both scalar
and component contexts.
/* Scalar membership: returns Boolean */
DS_r <- DS_1 [calc Me_2 := Me_1 in Countries];
/* Negated membership */
DS_r <- DS_1 [calc Me_2 := Me_1 not_in Countries];
/* Filter rows where a component belongs to a domain */
DS_r <- DS_1 [filter Me_1 in Thresholds];
Example
import pandas as pd
from vtlengine import run
script = """
DS_r <- DS_1 [calc Me_2 := Me_1 in Countries];
"""
data_structures = {
"datasets": [
{
"name": "DS_1",
"DataStructure": [
{
"name": "Id_1",
"type": "Integer",
"role": "Identifier",
"nullable": False,
},
{
"name": "Me_1",
"type": "String",
"role": "Measure",
"nullable": True,
},
],
}
]
}
datapoints = {
"DS_1": pd.DataFrame(
{"Id_1": [1, 2, 3], "Me_1": ["DE", "US", "FR"]}
),
}
value_domains = {
"name": "Countries",
"type": "String",
"setlist": ["DE", "FR", "IT"],
}
result = run(
script=script,
data_structures=data_structures,
datapoints=datapoints,
value_domains=value_domains,
)
print(result["DS_r"])
Validation
Use vtlengine.validate_value_domain() to validate
the JSON structure of value domains before execution:
from vtlengine import validate_value_domain
value_domains = {
"name": "Countries",
"type": "String",
"setlist": ["DE", "FR", "IT"],
}
# Raises an exception if the structure is invalid
validate_value_domain(value_domains)
External Routines
External Routines allow VTL scripts to execute SQL queries
through the eval() operator. Queries are executed in
a sandboxed DuckDB environment.
Note
Currently, only SQL external routines are supported.
The language parameter in the eval() call must
be set to "SQL".
Definition Format
Each external routine is a JSON object with two required fields:
Field |
Type |
Description |
|---|---|---|
|
string |
Identifier referenced in the VTL |
|
string |
SQL query to execute. Table names in the query must match the dataset names passed as operands. |
JSON format:
{
"name": "SQL_1",
"query": "SELECT Id_1, SUM(Me_1) AS Me_1 FROM DS_1 GROUP BY Id_1;"
}
Multiple routines can be provided as a JSON array:
[
{
"name": "SQL_1",
"query": "SELECT Id_1, COUNT(*) AS Me_1 FROM DS_1 GROUP BY Id_1;"
},
{
"name": "SQL_2",
"query": "SELECT Id_1, Me_1 FROM DS_1 WHERE Me_1 > 10;"
}
]
Input Options
The external_routines parameter accepts the following formats:
Dictionary: A single routine as a Python dict.
Path to a file: A
Pathpointing to a.jsonfile.Path to a directory: A
Pathpointing to a directory; all.jsonfiles in the directory are loaded.List: A list mixing any of the above formats.
from pathlib import Path
# Single dict
external_routines = {
"name": "SQL_1",
"query": "SELECT Id_1, COUNT(*) AS Me_1 FROM DS_1 GROUP BY Id_1;",
}
# Path to file
external_routines = Path("data/SQL_1.json")
# Path to directory (loads all .json files)
external_routines = Path("data/routines/")
# List of mixed formats
external_routines = [
{"name": "SQL_1", "query": "SELECT * FROM DS_1;"},
Path("data/SQL_2.json"),
]
VTL Syntax
The eval() operator invokes an external routine:
DS_r := eval(
SQL_NAME(DS_1, DS_2)
language "SQL"
returns dataset {
identifier<integer> Id_1,
measure<number> Me_1
}
);
SQL_NAME: Name matching the external routine definition.
Operands: Input datasets passed to the SQL query (
DS_1,DS_2, etc.).language: Must be
"SQL".returns dataset: Defines the output structure with component roles (
identifier,measure,attribute) and types.
Note
The column names in the SQL query result must match the
component names declared in the returns dataset clause.
Security
External routines are executed in a sandboxed DuckDB in-memory database with the following restrictions:
INSTALLandLOADcommands are forbidden.URLs (
http://,https://) inFROMclauses are forbidden.External file access is disabled.
Extension loading is disabled.
Configuration is locked after initialization.
Results are checked for infinite values.
Examples
Using a dictionary
import pandas as pd
from vtlengine import run
script = """
DS_r <- eval(
SQL_1(DS_1)
language "SQL"
returns dataset {
identifier<integer> Id_1,
measure<number> Me_1
}
);
"""
data_structures = {
"datasets": [
{
"name": "DS_1",
"DataStructure": [
{
"name": "Id_1",
"type": "Integer",
"role": "Identifier",
"nullable": False,
},
{
"name": "Me_1",
"type": "Number",
"role": "Measure",
"nullable": True,
},
],
}
]
}
datapoints = {
"DS_1": pd.DataFrame(
{"Id_1": [1, 2, 3, 4, 5], "Me_1": [10, 20, 30, 40, 50]}
),
}
external_routines = {
"name": "SQL_1",
"query": "SELECT Id_1, Me_1 * 2 AS Me_1 FROM DS_1;",
}
result = run(
script=script,
data_structures=data_structures,
datapoints=datapoints,
external_routines=external_routines,
)
print(result["DS_r"])
Validation
Use vtlengine.validate_external_routine() to validate
the JSON structure and SQL syntax before execution:
from vtlengine import validate_external_routine
external_routines = {
"name": "SQL_1",
"query": "SELECT Id_1, SUM(Me_1) AS Me_1 FROM DS_1 GROUP BY Id_1;",
}
# Raises an exception if the structure or SQL is invalid
validate_external_routine(external_routines)