RTV Tutorial on Configuration Files

Quick Reference:

To execute validation scenario via RTV configuration file you need to run rtv from command line and provide a path to the configuration file, like this:

rtv /path/to/config/file

Currently supported formats:

  • yaml

  • json

NOTE: This tutorial uses yaml format for examples in most places.

Structure

A valid configuration file for RTV should have two main sections:

  • definitions - this section holds a list of framework’s entities defined which will be used in the validation scenario.

  • actions - this section should hold a list of actions wich will be performed during validation scenario execution.

Minimal example:

yaml:

definitions:
    - name: csv_reader
      class: CSVReader
      delimiter: "|"

actions:
    - read:
        reader: csv_reader
        source: vector.csv
        output_name: vector_data

json:

{
    "definitions": [
        {
            "name": "csv_reader",
            "class": "CSVReader",
            "delimiter": "|"
        }
    ],
    "actions": [
        {
            "read": {
                    "reader": "csv_reader",
                    "source": "vector.csv",
                    "output_name": "vector_data"
                }
        }
    ]
}

Definitions

Each element in the list of definitions in definitions section of the configuration file should have following required fields:

  • name: You can think of it as an alias or a variable name, that you can later use in the config to reference defined entity.

  • class: A constructor class name of the entity.

The rest of the definition fields are arbitrary parameters for certain entity. In previous example delimiter field is a parameter of CSVReader.

NOTE: You can find a list of available entities/classes and their parameters in the following sections of this tutorial.

Actions

The common structure for actions section entry is as follows:

actions:
    - <action_type>:
        - <action_param>: ...
          # ...

        - <action_param>: ...
          # ...

A set of <action_param> fields is specific to a certain action type.

Example with read <action_type>:

actions:
    - read:
        - reader: csv_reader
          source: vector.csv
          output_name: vector_table_data

        - reader: txt_reader
          source: vector.txt
          output_name: vector_text_data

NOTE: You will find info on availabe <action_type> and realated <action_param> in the following section of this tutorial.

During the validation run the actions will be executed in order that they were defined in the config, so the following example will lead to an error:

actions:
    - transform:
        input: vector_data
        output_name: transformed_vector_data
        transformers: vector_transposer

    - read:
        reader: csv_reader
        source: vector.csv
        output_name: vector_data

transform action will raise an exception when trying to access vector_data entry as it will only be available after successful read action execution.

Available actions

read

Used to read data from arbitrary source(s), convert it to RTV internal data representation and save it to the current scenario’s data store.

Fields:

  • reader: A name of the Reader entity to use for the action execution.

  • source: A path to a source.

  • output_name: A unique (to the current scenario) name that will be used to store and reference the action’s result.

  • pattern: Optional field, a regex pattern to match more than one source file. If this field is provided then source should be a path to a directory with source files to match the pattern.Defaults to empty string.

  • prefix_key: Optional field, a prefix string to prepend to every key of resulting data entry. Defaults to empty string.

Example:

Read reference.csv and target.csv source files and save resulting data as reference and target respectively:

definitions:
    - name: csv_reader
      class: CSVReader

actions:
    - read:
        - reader: csv_reader
          source: reference.csv
          output_name: reference
          prefix_key: ref

        - reader: csv_reader
          source: iterations/
          pattern: iter_(\d+).csv
          # will match: iter_001.csv, iter_002.csv...
          output_name: target

write

Used to write a data entry to some output destination using Writer entity.

Fields:

  • input: A name of the data entry to write to output.

  • writer: A name of the defined Writer entity to use for the action execution.

  • output: An action result’s output destination. Actual type depends on the writer implementation.

Example:

Write result data entry to a json file named validation_result.json using JSONWriter entity.

definitions:
    # ...
    - name: json_writer
      class: JSONWriter
    # ...

actions:
    # ...
    - write:
        input: result
        writer: json_writer
        output: validation_result

transform

Used to transform data entries using Transformer entities and save the result as a new data entry.

Fields:

  • input: A name of the data entry to transform.

  • transformers: A name (or a list of names) of Transformer entity to use for the action execution.

  • output_name: A unique name that will be used to store and later reference the result of the action.

Example:

Transform result data entry using inverse_transformer and save the transformed result to result_transformed data entry.

definitions:
    # ...
    - name: inverse_transformer
      class: InverseTransformer
    # ...

actions:
    # ...
    - transform:
        input: result
        writer: inverse_transformer
        output: result_transformed
    # ...

validate

Used to perform validation on target data entry against reference data entry using single or multiple Validation entities.

Fields:

  • reference: A data entry name to use as reference.

  • target: A data entry name to use as target.

  • validations: A name (or a list of names) of Validation entity to use for the action execution.

  • output_name: A unique name that will be used to store and reference the result of the action.

Example:

Validate a data entry against b data entry using v1 validation and write the resulting data entry to result.

definitions:
    # ...
    - name: mae
      class: MeanAbsoluteError
      threshold: 0.5

    - name: v1
      class: StrategyValidation
      strategies: mae
      keys: all
    # ...

actions:
    # ...
    - validate:
        reference: b
        target: a
        validations: v1
        output_name: result
    # ...

free

Used to remove data entries from the current scenario data store.

Fields:

  • targets: Names of data entries to remove.

Example:

Remove a and b data entries.

actions:
    - free:
        targets: [a,b]