API

The API package contains all the methods to load data into the vtl engine. It has a function to ensure if the operation can be performed, and another function to prepare it to be operated.

vtlengine.run(script: str | Path, data_structures: Dict[str, Any] | Path | List[Dict[str, Any] | Path], datapoints: Dict[str, Any] | str | Path | List[str | Path], value_domains: Dict[str, Any] | Path | None = None, external_routines: Path | str | None = None, time_period_output_format: str = 'vtl', return_only_persistent: bool = False, output_folder: Path | str | None = None) Any

Run is the main function of the API, which mission is to ensure the vtl operation is ready to be performed. When the vtl expression is given, an AST object is created. This vtl script can be given as a string or a path with the folder or file that contains it. At the same time, data structures are loaded with its datapoints.

The data structure information is contained in the JSON file given, and establish the datatype (string, integer or number), and the role that each component is going to have (Identifier, Attribute or Measure). It can be a dictionary or a path to the JSON file or folder that contains it.

Moreover, a csv file with the data to operate with is going to be loaded. It can be given with a dictionary (dataset name : pandas Dataframe), a path or S3 URI to the folder, path or S3 to the csv file that contains the data.

Important

The data structure and the data points must have the same dataset name to be loaded correctly.

Important

If pointing to a Path or an S3 URI, dataset_name will be taken from the file name. Example: If the path is ‘path/to/data.csv’, the dataset name will be ‘data’.

Important

If using an S3 URI, the path must be in the format:

s3://bucket-name/path/to/data.csv

The following environment variables must be set (from the AWS account):

  • AWS_ACCESS_KEY_ID

  • AWS_SECRET_ACCESS_KEY

For more details, see s3fs documentation.

Before the execution, the DAG analysis reviews if the VTL script is a direct acyclic graphs.

If value domain data or external routines are required, the function loads this information and integrates them into the Interpreter class.

Moreover, if any component has a Time Period component, the external representation is passed to the Interpreter class.

Concepts you may need to know:

  • Vtl script: The expression that shows the operation to be done.

  • Data Structure: JSON file that contains the structure and the name for the dataset(s) (and/or scalar) about the datatype (String, integer or number), the role (Identifier, Attribute or Measure) and the nullability each component has.

  • Data point: Pointer to the data. It will be loaded as a Pandas Dataframe.

  • Value domains: Collection of unique values that have the same datatype.

  • External routines: SQL query used to transform a dataset.

This function has the following params:

Parameters:
  • script – String or Path with the vtl expression.

  • data_structures – Dict, Path or a List of Dicts or Paths with the data structures.

  • datapoints – Dict, Path, S3 URI or List of S3 URIs or Paths with data.

  • value_domains – Dict or Path of the value domains JSON files. (default:None)

  • external_routines – String or Path of the external routines SQL files. (default: None)

  • time_period_output_format – String with the possible values (“sdmx_gregorian”, “sdmx_reporting”, “vtl”) for the representation of the Time Period components.

  • return_only_persistent – If True, run function will only return the results of Persistent Assignments. (default: False)

  • output_folder – Path or S3 URI to the output folder. (default: None)

Returns:

The datasets are produced without data if the output folder is defined.

Raises:

Exception – If the files have the wrong format, or they do not exist, or their Paths are invalid.

vtlengine.semantic_analysis(script: str | Path, data_structures: Dict[str, Any] | Path | List[Dict[str, Any] | Path], value_domains: Dict[str, Any] | Path | None = None, external_routines: Dict[str, Any] | Path | None = None) Any

Checks if the vtl operation can be done.To do that, it generates the AST with the vtl script given and also reviews if the data structure given can fit with it.

This vtl script can be a string with the actual expression or a filepath to the folder that contains the vtl file.

Moreover, the data structure can be a dictionary or a filepath to the folder that contains it.

If there are any value domains or external routines, this data is taken into account. Both can be loaded the same way as data structures or vtl scripts are.

Finally, the Interpreter class takes all of this information and checks it with the ast generated to return the semantic analysis result.

Concepts you may know:

  • Vtl script: The expression that shows the operation to be done.

  • Data Structure: JSON file that contains the structure and the name for the dataset(s) (and/or scalar) about the datatype (String, integer or number), the role (Identifier, Attribute or Measure) and the nullability each component has.

  • Value domains: Collection of unique values on the same datatype.

  • External routines: SQL query used to transform a dataset.

This function has the following params:

Parameters:
  • script – String or Path of the vtl expression.

  • data_structures – Dict or Path (file or folder), or List of Dicts or Paths with the data structures JSON files.

  • value_domains – Dict or Path of the value domains JSON files. (default: None)

  • external_routines – String or Path of the external routines SQL files. (default: None)

Returns:

The computed datasets.

Raises:

Exception – If the files have the wrong format, or they do not exist, or their Paths are invalid.