Environment Variables
VTL Engine uses environment variables to configure number handling, the DuckDB execution engine, and S3 connectivity. All variables are optional and have sensible defaults.
Number Handling
These variables control how VTL Engine handles floating-point precision in numeric operations, comparison operators, and output formatting.
Important
IEEE 754 float64 guarantees 15 significant decimal digits (DBL_DIG = 15). The valid range of 6-15 reflects the practical precision limits of double-precision floating point.
COMPARISON_ABSOLUTE_THRESHOLD
Controls the significant digits used for Number comparison operations (=, <>, >=, <=, between).
Value |
Behavior |
|---|---|
Not defined |
Uses default value of 15 significant digits |
|
Uses the specified number of significant digits |
|
Disables tolerance (uses Python’s default exact comparison) |
The tolerance is calculated as: 0.5 * 10^(-(N-1)) where N is the number of significant digits.
For the default of 15, this gives a relative tolerance of 5e-15, which filters floating-point
arithmetic artifacts while preserving meaningful differences.
OUTPUT_NUMBER_SIGNIFICANT_DIGITS
Controls the significant digits used for:
Numeric operations: Precision of arithmetic operations (
+,-,*,/,mod,power, etc.) by setting the Decimal context precision.CSV output: Formatting Number values when writing to CSV files.
DuckDB DECIMAL scale: Number of decimal places used by the DuckDB engine (paired with VTL_DUCKDB_DECIMAL_WIDTH for the precision).
Value |
Behavior |
|---|---|
Not defined |
Uses default value of 15 significant digits (pandas) / 10 (DuckDB) |
|
Uses the specified number of significant digits |
|
Disables precision limiting (uses Python/pandas defaults; DuckDB falls back to its maximum scale of 15) |
For output formatting, this variable controls the float_format parameter in pandas to_csv,
using the general format specifier (e.g., %.15g) which automatically switches between fixed
and exponential notation.
DuckDB Engine
These variables tune the DuckDB execution engine (DuckDB Engine) when use_duckdb=True.
They have no effect on the default pandas backend.
VTL_MEMORY_LIMIT
Maximum memory the DuckDB engine may consume.
Value |
Behavior |
|---|---|
Not defined |
Uses default value of 80% of system RAM |
|
Percentage of system RAM |
|
Absolute size in GB / MB / KB |
integer |
Absolute size in bytes |
When DuckDB exceeds this limit it spills to VTL_TEMP_DIRECTORY.
VTL_THREADS
Number of worker threads DuckDB may use during query execution.
Value |
Behavior |
|---|---|
Not defined |
Uses default value of 1 |
integer |
Use exactly that many threads |
VTL_TEMP_DIRECTORY
Directory where DuckDB writes spill files when memory is exceeded, and where the file-backed database lives when VTL_USE_IN_MEMORY_DB is disabled. When unset, the engine falls back to Python’s tempfile.gettempdir(), which resolves in this order:
The directory named by the
TMPDIRenvironment variableThe directory named by the
TEMPenvironment variableThe directory named by the
TMPenvironment variableA platform-specific location:
Linux/POSIX:
/tmp,/var/tmp,/usr/tmp(in that order)Windows:
C:\TEMP,C:\TMP,\TEMP,\TMP(in that order)macOS: typically
/var/folders/or~/Library/Caches/
The current working directory as a last resort
The engine creates a unique sub-directory per session under the resolved location and removes it when the connection closes.
VTL_MAX_TEMP_DIRECTORY_SIZE
Caps the total disk space DuckDB may use for spill-to-disk.
Value |
Behavior |
|---|---|
Not defined / empty |
No cap; DuckDB may use all available disk space in |
|
Absolute size cap; queries that would exceed it fail |
VTL_USE_IN_MEMORY_DB
Selects the DuckDB storage backend.
Value |
Behavior |
|---|---|
Not defined / |
Use an in-memory database (default) |
|
Use a file-backed database under |
VTL_DUCKDB_DECIMAL_WIDTH
Total number of digits (precision) used for the DuckDB DECIMAL type. Decimal scale is
controlled separately by OUTPUT_NUMBER_SIGNIFICANT_DIGITS.
Value |
Behavior |
|---|---|
Not defined |
Uses default value of 28 |
|
Uses the specified precision |
|
Disables precision limiting (uses DuckDB’s maximum precision of 38) |
VTL_SKIP_LOAD_VALIDATION
Skips the post-load validation that the DuckDB engine runs after each table is created (no-duplicates, temporal column format, DWI cardinality). Intended for benchmarking; do not use in production.
Value |
Behavior |
|---|---|
Not defined / empty |
Validation runs (default) |
|
Validation is skipped |
S3 Configuration
S3 URIs (s3://...) are read by DuckDB’s built-in
httpfs extension when
use_duckdb=True; the AWS environment variables below are picked up automatically.
AWS_ACCESS_KEY_ID
The access key ID for AWS authentication.
AWS_SECRET_ACCESS_KEY
The secret access key for AWS authentication.
AWS_SESSION_TOKEN
(Optional) Session token for temporary AWS credentials.
AWS_DEFAULT_REGION
(Optional) Default AWS region for S3 operations.
AWS_ENDPOINT_URL
(Optional) Custom endpoint URL for S3-compatible storage services (e.g., MinIO, LocalStack).
For more details on AWS configuration, see the boto3 documentation.
Examples
Setting comparison threshold
# Use 10 significant digits for more lenient comparisons (tolerance ~5e-10)
export COMPARISON_ABSOLUTE_THRESHOLD=10
# Use maximum precision (default, tolerance ~5e-15)
export COMPARISON_ABSOLUTE_THRESHOLD=15
# Disable tolerance-based comparison (exact floating-point comparison)
export COMPARISON_ABSOLUTE_THRESHOLD=-1
Controlling numeric precision
# Use 10 significant digits for arithmetic and output
export OUTPUT_NUMBER_SIGNIFICANT_DIGITS=10
# Use maximum precision (default)
export OUTPUT_NUMBER_SIGNIFICANT_DIGITS=15
# Disable precision limiting (use Python/pandas defaults)
export OUTPUT_NUMBER_SIGNIFICANT_DIGITS=-1
Tuning the DuckDB engine
# Cap memory at 16 GB and use 4 threads
export VTL_MEMORY_LIMIT=16GB
export VTL_THREADS=4
# Use a file-backed database under a custom spill directory, capped at 200 GB
export VTL_USE_IN_MEMORY_DB=0
export VTL_TEMP_DIRECTORY=/var/lib/vtlengine/duckdb-spill
export VTL_MAX_TEMP_DIRECTORY_SIZE=200GB
# Increase DECIMAL precision to 38 digits (max)
export VTL_DUCKDB_DECIMAL_WIDTH=38
Using S3 with environment variables
# Set AWS credentials
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=eu-west-1
from vtlengine import run
result = run(
script="DS_r := DS_1;",
data_structures=data_structures,
datapoints="s3://my-bucket/input/DS_1.csv",
output_folder="s3://my-bucket/output/",
use_duckdb=True,
)
Using a custom S3 endpoint
# For S3-compatible services (MinIO, LocalStack)
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin