SDMX Inputs

If you work mostly with SDMX, the engine has you covered. It reads SDMX-ML, SDMX-JSON and SDMX-CSV files directly, understands the pysdmx structure objects (Schema, DataStructureDefinition, Dataflow) and PandasDataset bundles, and can produce SDMX TransformationScheme objects on the way out.

This page collects the SDMX-specific patterns you’re likely to run into beyond the basics: loading SDMX files through vtlengine.run(), binding one Dataflow to two VTL datasets, feeding the engine a registered TransformationScheme, and aliasing SDMX dataflows to the names your script uses.

For the basic case — a script plus a list of PandasDataset objects from pysdmx.io.get_datasets — start with Run SDMX in the 10 minutes to VTL Engine instead. This page picks up where that leaves off.

Run with SDMX files

If you have SDMX files on disk and you’d rather not convert them yourself, hand them straight to vtlengine.run() — there’s no need to go through vtlengine.run_sdmx() for that. The engine picks the right loader from each file’s extension and translates the contents to its internal VTL representation behind the scenes.

Accepted SDMX formats for data_structures:

SDMX-ML structure files (.xml)
SDMX-JSON structure files (.json)
pysdmx objects (Schema, DataStructureDefinition, Dataflow)

Accepted SDMX formats for datapoints:

SDMX-ML data files (.xml)
SDMX-CSV data files (.csv) — auto-detected

Note

SDMX-JSON is supported for structures, not for data — pysdmx only parses SDMX-JSON in its Structure and Reference Metadata variants.

For the full list of supported SDMX formats and versions, see pysdmx’s Formats and versions supported — the engine inherits its parsing support from pysdmx.

The engine routes each file based on its extension. For CSV files in particular, it first tries to parse them as SDMX-CSV and falls back to plain CSV if that doesn’t work — so you can mix the two without thinking about it.

One detail that catches users out: the dataset name in the structure file (the DSD ID) often differs from the name in the data file (the Dataflow reference). When that happens, use the sdmx_mappings argument to alias the data file’s URN to whatever name your script uses:

from pathlib import Path

from vtlengine import run

# Using SDMX structure and data files directly
structure_file = Path("path/to/structure.xml")  # SDMX-ML structure
data_file = Path("path/to/data.xml")            # SDMX-ML data

# Map the data file's Dataflow URN to the structure's DSD name
mapping = {"Dataflow=AGENCY:DATAFLOW_ID(1.0)": "DSD_NAME"}

script = "DS_r <- DSD_NAME [calc Me_2 := OBS_VALUE * 2];"

result = run(
    script=script,
    data_structures=structure_file,
    datapoints=data_file,
    sdmx_mappings=mapping
)

You can also use sdmx_mappings to give datasets custom names in your VTL script:

from pathlib import Path

from vtlengine import run

structure_file = Path("path/to/structure.xml")
data_file = Path("path/to/data.xml")

script = "DS_r <- MY_DATASET [calc Me_2 := OBS_VALUE * 2];"

# Map SDMX URN to VTL dataset name
mapping = {"Dataflow=MD:TEST_DF(1.0)": "MY_DATASET"}

result = run(
    script=script,
    data_structures=structure_file,
    datapoints=data_file,
    sdmx_mappings=mapping
)

You can also mix VTL JSON structures with SDMX structures and plain CSV datapoints with SDMX data files:

from pathlib import Path

from vtlengine import run

# Mix of VTL JSON and SDMX structures
vtl_structure = {"datasets": [{"name": "DS_1", "DataStructure": [...]}]}
sdmx_structure = Path("path/to/sdmx_structure.xml")

# Mix of plain CSV and SDMX data
datapoints = {
    "DS_1": Path("path/to/plain_data.csv"),          # Plain CSV
    "DS_2": Path("path/to/sdmx_data.xml"),           # SDMX-ML
}

result = run(
    script=script,
    data_structures=[vtl_structure, sdmx_structure],
    datapoints=datapoints
)

TransformationScheme as the script

If your VTL script already lives in an SDMX repository as a TransformationScheme, you don’t have to extract the text and pass it as a string — run_sdmx accepts the object directly. Each Transformation inside the scheme contributes one statement to the script the engine executes.

When you don’t pass a mappings argument, the script must reference a single input dataset, and the data file you load must contain just one dataset too — the engine has to figure out the pairing unambiguously.

from pathlib import Path

from pysdmx.io import get_datasets
from pysdmx.model.vtl import TransformationScheme, Transformation

from vtlengine import run_sdmx

data = Path("docs/_static/data.xml")
structure = Path("docs/_static/metadata.xml")
datasets = get_datasets(data, structure)
script = TransformationScheme(
    id="TS1",
    version="1.0",
    agency="MD",
    vtl_version="2.1",
    items=[
        Transformation(
            id="T1",
            uri=None,
            urn=None,
            name=None,
            description=None,
            expression="DS_1 [calc Me_4 := OBS_VALUE];",
            is_persistent=True,
            result="DS_r1",
            annotations=(),
        ),
        Transformation(
            id="T2",
            uri=None,
            urn=None,
            name=None,
            description=None,
            expression="DS_1 [rename OBS_VALUE to Me_5];",
            is_persistent=True,
            result="DS_r2",
            annotations=(),
        )
    ],
)
run_sdmx(script, datasets=datasets)

Mapping SDMX dataflows to VTL aliases

Sometimes the name your script uses for a dataset doesn’t match the SDMX dataflow’s short-URN — maybe the script was written first, maybe the SDMX names are too unwieldy to drop into VTL expressions, or maybe you just want a friendlier handle. Pass a mappings argument to bridge the two.

You can express the mapping as a plain dict or as a pysdmx VtlDataflowMapping object — pick whichever fits your code:

from pathlib import Path

from pysdmx.io import get_datasets
from pysdmx.model.vtl import TransformationScheme, Transformation
from pysdmx.model.vtl import VtlDataflowMapping

from vtlengine import run_sdmx

data = Path("docs/_static/data.xml")
structure = Path("docs/_static/metadata.xml")
datasets = get_datasets(data, structure)
script = TransformationScheme(
    id="TS1",
    version="1.0",
    agency="MD",
    vtl_version="2.1",
    items=[
        Transformation(
            id="T1",
            uri=None,
            urn=None,
            name=None,
            description=None,
            expression="DS_1 [calc Me_4 := OBS_VALUE]",
            is_persistent=True,
            result="DS_r",
            annotations=(),
        ),
    ],
)
# Mapping using VtlDataflowMapping object:
mapping = VtlDataflowMapping(
        dataflow="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=MD:TEST_DF(1.0)",
        dataflow_alias="DS_1",
        id="VTL_MAP_1",
    )

# Mapping using dictionary:
mapping = {
"Dataflow=MD:TEST_DF(1.0)": "DS_1"
}
run_sdmx(script, datasets, mappings=mapping)

Files used in the examples can be found here:

SDMX Inputs

Run with SDMX files

Sharing one Dataflow between two datasets

TransformationScheme as the script

Mapping SDMX dataflows to VTL aliases