SDMX Inputs
If you work mostly with SDMX, the engine has you covered. It reads
SDMX-ML, SDMX-JSON and SDMX-CSV files directly, understands the
pysdmx structure objects (Schema, DataStructureDefinition,
Dataflow) and PandasDataset bundles, and can produce SDMX
TransformationScheme objects on the way out.
This page collects the SDMX-specific patterns you’re likely to run
into beyond the basics: loading SDMX files through vtlengine.run(),
binding one Dataflow to two VTL datasets, feeding the engine a
registered TransformationScheme, and aliasing SDMX dataflows to the
names your script uses.
For the basic case — a script plus a list of PandasDataset objects
from pysdmx.io.get_datasets — start with Run SDMX in
the 10 minutes to VTL Engine instead. This page picks up where that leaves off.
Run with SDMX files
If you have SDMX files on disk and you’d rather not convert them
yourself, hand them straight to vtlengine.run() — there’s no need
to go through vtlengine.run_sdmx() for that. The engine picks the
right loader from each file’s extension and translates the contents to
its internal VTL representation behind the scenes.
Accepted SDMX formats for data_structures:
SDMX-ML structure files (
.xml)SDMX-JSON structure files (
.json)pysdmx objects (
Schema,DataStructureDefinition,Dataflow)
Accepted SDMX formats for datapoints:
SDMX-ML data files (
.xml)SDMX-CSV data files (
.csv) — auto-detected
Note
SDMX-JSON is supported for structures, not for data —
pysdmx only parses SDMX-JSON in its Structure and Reference
Metadata variants.
For the full list of supported SDMX formats and versions, see pysdmx’s Formats and versions supported — the engine inherits its parsing support from pysdmx.
The engine routes each file based on its extension. For CSV files in particular, it first tries to parse them as SDMX-CSV and falls back to plain CSV if that doesn’t work — so you can mix the two without thinking about it.
One detail that catches users out: the dataset name in the structure
file (the DSD ID) often differs from the name in the data file (the
Dataflow reference). When that happens, use the sdmx_mappings
argument to alias the data file’s URN to whatever name your script
uses:
from pathlib import Path
from vtlengine import run
# Using SDMX structure and data files directly
structure_file = Path("path/to/structure.xml") # SDMX-ML structure
data_file = Path("path/to/data.xml") # SDMX-ML data
# Map the data file's Dataflow URN to the structure's DSD name
mapping = {"Dataflow=AGENCY:DATAFLOW_ID(1.0)": "DSD_NAME"}
script = "DS_r <- DSD_NAME [calc Me_2 := OBS_VALUE * 2];"
result = run(
script=script,
data_structures=structure_file,
datapoints=data_file,
sdmx_mappings=mapping
)
You can also use sdmx_mappings to give datasets custom names in your
VTL script:
from pathlib import Path
from vtlengine import run
structure_file = Path("path/to/structure.xml")
data_file = Path("path/to/data.xml")
script = "DS_r <- MY_DATASET [calc Me_2 := OBS_VALUE * 2];"
# Map SDMX URN to VTL dataset name
mapping = {"Dataflow=MD:TEST_DF(1.0)": "MY_DATASET"}
result = run(
script=script,
data_structures=structure_file,
datapoints=data_file,
sdmx_mappings=mapping
)
You can also mix VTL JSON structures with SDMX structures and plain CSV datapoints with SDMX data files:
from pathlib import Path
from vtlengine import run
# Mix of VTL JSON and SDMX structures
vtl_structure = {"datasets": [{"name": "DS_1", "DataStructure": [...]}]}
sdmx_structure = Path("path/to/sdmx_structure.xml")
# Mix of plain CSV and SDMX data
datapoints = {
"DS_1": Path("path/to/plain_data.csv"), # Plain CSV
"DS_2": Path("path/to/sdmx_data.xml"), # SDMX-ML
}
result = run(
script=script,
data_structures=[vtl_structure, sdmx_structure],
datapoints=datapoints
)
TransformationScheme as the script
If your VTL script already lives in an SDMX repository as a
TransformationScheme, you don’t have to extract the text and pass
it as a string — run_sdmx accepts the object directly. Each
Transformation inside the scheme contributes one statement to the
script the engine executes.
When you don’t pass a mappings argument, the script must reference
a single input dataset, and the data file you load must contain just
one dataset too — the engine has to figure out the pairing
unambiguously.
from pathlib import Path
from pysdmx.io import get_datasets
from pysdmx.model.vtl import TransformationScheme, Transformation
from vtlengine import run_sdmx
data = Path("docs/_static/data.xml")
structure = Path("docs/_static/metadata.xml")
datasets = get_datasets(data, structure)
script = TransformationScheme(
id="TS1",
version="1.0",
agency="MD",
vtl_version="2.1",
items=[
Transformation(
id="T1",
uri=None,
urn=None,
name=None,
description=None,
expression="DS_1 [calc Me_4 := OBS_VALUE];",
is_persistent=True,
result="DS_r1",
annotations=(),
),
Transformation(
id="T2",
uri=None,
urn=None,
name=None,
description=None,
expression="DS_1 [rename OBS_VALUE to Me_5];",
is_persistent=True,
result="DS_r2",
annotations=(),
)
],
)
run_sdmx(script, datasets=datasets)
Mapping SDMX dataflows to VTL aliases
Sometimes the name your script uses for a dataset doesn’t match the
SDMX dataflow’s short-URN — maybe the script was written first, maybe
the SDMX names are too unwieldy to drop into VTL expressions, or maybe
you just want a friendlier handle. Pass a mappings argument to
bridge the two.
You can express the mapping as a plain dict or as a pysdmx
VtlDataflowMapping object — pick whichever fits your code:
from pathlib import Path
from pysdmx.io import get_datasets
from pysdmx.model.vtl import TransformationScheme, Transformation
from pysdmx.model.vtl import VtlDataflowMapping
from vtlengine import run_sdmx
data = Path("docs/_static/data.xml")
structure = Path("docs/_static/metadata.xml")
datasets = get_datasets(data, structure)
script = TransformationScheme(
id="TS1",
version="1.0",
agency="MD",
vtl_version="2.1",
items=[
Transformation(
id="T1",
uri=None,
urn=None,
name=None,
description=None,
expression="DS_1 [calc Me_4 := OBS_VALUE]",
is_persistent=True,
result="DS_r",
annotations=(),
),
],
)
# Mapping using VtlDataflowMapping object:
mapping = VtlDataflowMapping(
dataflow="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=MD:TEST_DF(1.0)",
dataflow_alias="DS_1",
id="VTL_MAP_1",
)
# Mapping using dictionary:
mapping = {
"Dataflow=MD:TEST_DF(1.0)": "DS_1"
}
run_sdmx(script, datasets, mappings=mapping)
Files used in the examples can be found here: