API

The API package contains all the methods to handle VTL scripts and perform validations, execute over data and other functionalities.

vtlengine.generate_sdmx(script: str | Path, agency_id: str, id: str, version: str = '1.0') TransformationScheme

Function that generates a TransformationScheme object from a VTL script.

The TransformationScheme object is the SDMX representation of the VTL script. For more details please check the SDMX IM VTL objects, line 2266.

Parameters:
  • script – A string with the VTL script.

  • agency_id – The Agency ID used in the generated TransformationScheme object.

  • id – The given id of the generated TransformationScheme object.

  • version – The Version used in the generated TransformationScheme object. (default: “1.0”)

Returns:

The generated Transformation Scheme object.

vtlengine.prettify(script: str | TransformationScheme | Path) str

Function that prettifies the VTL script given.

Parameters:

script – VTL script as a string, a Transformation Scheme object or Path with the VTL script.

Returns:

A str with the prettified VTL script.

vtlengine.run(script: str | TransformationScheme | Path, data_structures: Dict[str, Any] | Path | List[Dict[str, Any]] | List[Path], datapoints: Dict[str, DataFrame] | str | Path | List[Dict[str, Any]] | List[Path], value_domains: Dict[str, Any] | Path | None = None, external_routines: Path | str | None = None, time_period_output_format: str = 'vtl', return_only_persistent: bool = True, output_folder: Path | str | None = None) Dict[str, Dataset]

Run is the main function of the API, which mission is to execute the vtl operation over the data.

Concepts you may need to know:

  • Vtl script: The script that shows the set of operations to be executed.

  • Data Structure: JSON file that contains the structure and the name for the dataset(s) (and/or scalar) about the datatype (String, integer or number), the role (Identifier, Attribute or Measure) and the nullability each component has.

  • Data point: Pandas Dataframe that holds the data related to the Dataset.

  • Value domains: Collection of unique values on the same datatype.

  • External routines: SQL query used to transform a dataset.

Important

The data structure and the data points must have the same dataset name to be loaded correctly.

Important

If pointing to a Path or an S3 URI, dataset_name will be taken from the file name. Example: If the path is ‘path/to/data.csv’, the dataset name will be ‘data’.

Important

If using an S3 URI, the path must be in the format:

s3://bucket-name/path/to/data.csv

The following environment variables must be set (from the AWS account):

  • AWS_ACCESS_KEY_ID

  • AWS_SECRET_ACCESS_KEY

For more details, see s3fs documentation.

Before the execution, the DAG analysis reviews if the VTL script is a direct acyclic graph.

This function has the following params:

Parameters:
  • script – VTL script as a string, a Transformation Scheme object or Path with the VTL script.

  • data_structures – Dict, Path or a List of Dicts or Paths with the data structures.

  • datapoints – Dict, Path, S3 URI or List of S3 URIs or Paths with data.

  • value_domains – Dict or Path of the value domains JSON files. (default:None)

  • external_routines – String or Path of the external routines SQL files. (default: None)

  • time_period_output_format – String with the possible values (“sdmx_gregorian”, “sdmx_reporting”, “vtl”) for the representation of the Time Period components.

  • return_only_persistent – If True, run function will only return the results of Persistent Assignments. (default: True)

  • output_folder – Path or S3 URI to the output folder. (default: None)

Returns:

The datasets are produced without data if the output folder is defined.

Raises:

Exception – If the files have the wrong format, or they do not exist, or their Paths are invalid.

vtlengine.run_sdmx(script: str | TransformationScheme | Path, datasets: Sequence[PandasDataset], mappings: VtlDataflowMapping | Dict[str, str] | None = None, value_domains: Dict[str, Any] | Path | None = None, external_routines: Path | str | None = None, time_period_output_format: str = 'vtl', return_only_persistent: bool = True, output_folder: Path | str | None = None) Dict[str, Dataset]

Executes a VTL script using a list of pysdmx PandasDataset objects.

This function prepares the required VTL data structures and datapoints from the given list of pysdmx PandasDataset objects. It validates each PandasDataset uses a valid Schema instance as its structure. Each Schema is converted to the appropriate VTL JSON data structure, and the Pandas Dataframe is extracted.

Important

We recommend to use this function in combination with the get_datasets pysdmx method.

Important

The mapping between pysdmx PandasDataset and VTL datasets is done using the Schema instance of the PandasDataset. The Schema ID is used as the dataset name.

DataStructure=MD:TEST_DS(1.0) -> TEST_DS

The function then calls the run function with the provided VTL script and prepared inputs.

Before the execution, the DAG analysis reviews if the generated VTL script is a direct acyclic graph.

Parameters:
  • script – VTL script as a string, a Transformation Scheme object or Path with the VTL script.

  • datasets – A list of PandasDataset.

  • mappings – A dictionary or VtlDataflowMapping object that maps the dataset names.

  • value_domains – Dict or Path of the value domains JSON files. (default:None)

  • external_routines – String or Path of the external routines SQL files. (default: None)

  • time_period_output_format – String with the possible values (“sdmx_gregorian”, “sdmx_reporting”, “vtl”) for the representation of the Time Period components.

  • return_only_persistent – If True, run function will only return the results of Persistent Assignments. (default: True)

  • output_folder – Path or S3 URI to the output folder. (default: None)

Returns:

The datasets are produced without data if the output folder is defined.

Raises:

SemanticError – If any dataset does not contain a valid Schema instance as its structure.

vtlengine.semantic_analysis(script: str | TransformationScheme | Path, data_structures: Dict[str, Any] | Path | List[Dict[str, Any]] | List[Path], value_domains: Dict[str, Any] | Path | None = None, external_routines: Dict[str, Any] | Path | None = None) Dict[str, Dataset]

Checks if the vtl scripts and its related datastructures are valid. As part of the compatibility with pysdmx library, the vtl script can be a Transformation Scheme object, which availability as input is going to be serialized as a string VTL script.

Concepts you may need to know:

  • Vtl script: The script that shows the set of operations to be executed.

  • Data Structure: JSON file that contains the structure and the name for the dataset(s) (and/or scalar) about the datatype (String, integer or number), the role (Identifier, Attribute or Measure) and the nullability each component has.

  • Value domains: Collection of unique values on the same datatype.

  • External routines: SQL query used to transform a dataset.

This function has the following params:

Parameters:
  • script – Vtl script as a string, Transformation Scheme object or Path to the folder that holds the vtl script.

  • data_structures – Dict or Path (file or folder), or List of Dicts or Paths with the data structures JSON files.

  • value_domains – Dict or Path of the value domains JSON files. (default: None)

  • external_routines – String or Path of the external routines SQL files. (default: None)

Returns:

The computed datasets.

Raises:

Exception – If the files have the wrong format, or they do not exist, or their Paths are invalid.