Pandas Engine
The pandas engine is the default execution backend used by vtlengine.run() and
vtlengine.run_sdmx(). It runs every VTL operation against in-memory
pandas DataFrames and is selected whenever use_duckdb
is left at its default value of False.
Note
Execution engines only apply to vtlengine.run() and vtlengine.run_sdmx().
vtlengine.semantic_analysis() performs validation only and does not execute
operators against data, so it is engine-agnostic.
Overview
Default: nothing to opt into; calls to
vtlengine.run()andvtlengine.run_sdmx()use it out of the box.In-memory: every dataset is materialised as a
pandas.DataFrameand operators apply DataFrame transformations, joins, and groupbys.Stable surface: the result of every operation is itself a DataFrame and can be inspected, debugged, or post-processed with the full pandas API.
When to use it
Datasets fit comfortably in RAM (single-node, no spill-to-disk requirements).
You want full interoperability with the pandas ecosystem — pass DataFrames in, receive DataFrames back, plug into pandas-based pipelines downstream.
You are running smaller scripts or interactive exploration where startup time matters more than raw throughput.
Limitations
The entire dataset is held in memory; very large inputs (multi-GB) can exhaust available RAM since there is no spill-to-disk.
No native support for S3 URIs in
datapointsoroutput_folder— pass local paths or DataFrames, or switch to the DuckDB engine.Always single-threaded: VTL operators run sequentially on a single thread. Whatever vectorisation pandas or NumPy expose internally is the only available parallelism. Use the DuckDB engine when multi-threaded query execution matters.
Configuration
The pandas engine respects the number-handling environment variables shared with the rest of the engine:
OUTPUT_NUMBER_SIGNIFICANT_DIGITS — precision for arithmetic and CSV output.
COMPARISON_ABSOLUTE_THRESHOLD— tolerance for Number comparison operators.
See Environment Variables for full details.
When to switch to DuckDB
Consider DuckDB Engine if you need any of the following:
Datasets that approach or exceed available RAM (DuckDB can spill to disk via
VTL_TEMP_DIRECTORYor run on a file-backed database viaVTL_USE_IN_MEMORY_DB=0).Reading from or writing to S3 URIs (
s3://bucket/key).Multi-threaded query execution (
VTL_THREADS); the pandas engine is always single-threaded.