Environment Variables

VTL Engine uses environment variables to configure number handling, the DuckDB execution engine, and S3 connectivity. All variables are optional and have sensible defaults.

Number Handling

These variables control how VTL Engine handles floating-point precision in numeric operations, comparison operators, and output formatting.

Important

IEEE 754 float64 guarantees 15 significant decimal digits (DBL_DIG = 15). The valid range of 6-15 reflects the practical precision limits of double-precision floating point.

COMPARISON_ABSOLUTE_THRESHOLD

Controls the significant digits used for Number comparison operations (=, <>, >=, <=, between).

Value

Behavior

Not defined

Uses default value of 15 significant digits

6 to 15

Uses the specified number of significant digits

-1

Disables tolerance (uses Python’s default exact comparison)

The tolerance is calculated as: 0.5 * 10^(-(N-1)) where N is the number of significant digits.

For the default of 15, this gives a relative tolerance of 5e-15, which filters floating-point arithmetic artifacts while preserving meaningful differences.

OUTPUT_NUMBER_SIGNIFICANT_DIGITS

Controls the significant digits used for:

  1. Numeric operations: Precision of arithmetic operations (+, -, *, /, mod, power, etc.) by setting the Decimal context precision.

  2. CSV output: Formatting Number values when writing to CSV files.

  3. DuckDB DECIMAL scale: Number of decimal places used by the DuckDB engine (paired with VTL_DUCKDB_DECIMAL_WIDTH for the precision).

Value

Behavior

Not defined

Uses default value of 15 significant digits (pandas) / 10 (DuckDB)

6 to 15

Uses the specified number of significant digits

-1

Disables precision limiting (uses Python/pandas defaults; DuckDB falls back to its maximum scale of 15)

For output formatting, this variable controls the float_format parameter in pandas to_csv, using the general format specifier (e.g., %.15g) which automatically switches between fixed and exponential notation.

DuckDB Engine

These variables tune the DuckDB execution engine (DuckDB Engine) when use_duckdb=True. They have no effect on the default pandas backend.

VTL_MEMORY_LIMIT

Maximum memory the DuckDB engine may consume.

Value

Behavior

Not defined

Uses default value of 80% of system RAM

"80%"

Percentage of system RAM

"8GB" / "8192MB" / "8388608KB"

Absolute size in GB / MB / KB

integer

Absolute size in bytes

When DuckDB exceeds this limit it spills to VTL_TEMP_DIRECTORY.

VTL_THREADS

Number of worker threads DuckDB may use during query execution.

Value

Behavior

Not defined

Uses default value of 1

integer >= 1

Use exactly that many threads

VTL_TEMP_DIRECTORY

Directory where DuckDB writes spill files when memory is exceeded, and where the file-backed database lives when VTL_USE_IN_MEMORY_DB is disabled. When unset, the engine falls back to Python’s tempfile.gettempdir(), which resolves in this order:

  1. The directory named by the TMPDIR environment variable

  2. The directory named by the TEMP environment variable

  3. The directory named by the TMP environment variable

  4. A platform-specific location:

    • Linux/POSIX: /tmp, /var/tmp, /usr/tmp (in that order)

    • Windows: C:\TEMP, C:\TMP, \TEMP, \TMP (in that order)

    • macOS: typically /var/folders/ or ~/Library/Caches/

  5. The current working directory as a last resort

The engine creates a unique sub-directory per session under the resolved location and removes it when the connection closes.

VTL_MAX_TEMP_DIRECTORY_SIZE

Caps the total disk space DuckDB may use for spill-to-disk.

Value

Behavior

Not defined / empty

No cap; DuckDB may use all available disk space in VTL_TEMP_DIRECTORY

"100GB" / "500MB"

Absolute size cap; queries that would exceed it fail

VTL_USE_IN_MEMORY_DB

Selects the DuckDB storage backend.

Value

Behavior

Not defined / "1" / "true"

Use an in-memory database (default)

"0" / any other value

Use a file-backed database under VTL_TEMP_DIRECTORY (recommended for very large datasets that approach available RAM)

VTL_DUCKDB_DECIMAL_WIDTH

Total number of digits (precision) used for the DuckDB DECIMAL type. Decimal scale is controlled separately by OUTPUT_NUMBER_SIGNIFICANT_DIGITS.

Value

Behavior

Not defined

Uses default value of 28

6 to 38

Uses the specified precision

-1

Disables precision limiting (uses DuckDB’s maximum precision of 38)

VTL_SKIP_LOAD_VALIDATION

Skips the post-load validation that the DuckDB engine runs after each table is created (no-duplicates, temporal column format, DWI cardinality). Intended for benchmarking; do not use in production.

Value

Behavior

Not defined / empty

Validation runs (default)

"1" / "true" / "yes"

Validation is skipped

S3 Configuration

S3 URIs (s3://...) are read by DuckDB’s built-in httpfs extension when use_duckdb=True; the AWS environment variables below are picked up automatically.

AWS_ACCESS_KEY_ID

The access key ID for AWS authentication.

AWS_SECRET_ACCESS_KEY

The secret access key for AWS authentication.

AWS_SESSION_TOKEN

(Optional) Session token for temporary AWS credentials.

AWS_DEFAULT_REGION

(Optional) Default AWS region for S3 operations.

AWS_ENDPOINT_URL

(Optional) Custom endpoint URL for S3-compatible storage services (e.g., MinIO, LocalStack).

For more details on AWS configuration, see the boto3 documentation.

Examples

Setting comparison threshold

# Use 10 significant digits for more lenient comparisons (tolerance ~5e-10)
export COMPARISON_ABSOLUTE_THRESHOLD=10

# Use maximum precision (default, tolerance ~5e-15)
export COMPARISON_ABSOLUTE_THRESHOLD=15

# Disable tolerance-based comparison (exact floating-point comparison)
export COMPARISON_ABSOLUTE_THRESHOLD=-1

Controlling numeric precision

# Use 10 significant digits for arithmetic and output
export OUTPUT_NUMBER_SIGNIFICANT_DIGITS=10

# Use maximum precision (default)
export OUTPUT_NUMBER_SIGNIFICANT_DIGITS=15

# Disable precision limiting (use Python/pandas defaults)
export OUTPUT_NUMBER_SIGNIFICANT_DIGITS=-1

Tuning the DuckDB engine

# Cap memory at 16 GB and use 4 threads
export VTL_MEMORY_LIMIT=16GB
export VTL_THREADS=4

# Use a file-backed database under a custom spill directory, capped at 200 GB
export VTL_USE_IN_MEMORY_DB=0
export VTL_TEMP_DIRECTORY=/var/lib/vtlengine/duckdb-spill
export VTL_MAX_TEMP_DIRECTORY_SIZE=200GB

# Increase DECIMAL precision to 38 digits (max)
export VTL_DUCKDB_DECIMAL_WIDTH=38

Using S3 with environment variables

# Set AWS credentials
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=eu-west-1
from vtlengine import run

result = run(
    script="DS_r := DS_1;",
    data_structures=data_structures,
    datapoints="s3://my-bucket/input/DS_1.csv",
    output_folder="s3://my-bucket/output/",
    use_duckdb=True,
)

Using a custom S3 endpoint

# For S3-compatible services (MinIO, LocalStack)
export AWS_ENDPOINT_URL=http://localhost:9000
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin