Currently implemented data quality checks

A list of available checks is shown below:

Check

Description

Flag

Arguments

Check function

Target

Missing values

Checks for missing values on the data.

missing

completeness*

missing_values missing_values_data*

Constant, series** and dataseries*

Outlier values

Checks for outlier values on the data.

outliers

outliers_method, outliers_nstd, outliers_niqr

outlier_values

Constant and data

Series range

Checks if series is inside a range.

series_range

series_range_values

series_range

Series

Series monotony

Checks if series is monotonically increasing.

series_monotony

series_monotony

Series

Series increment type

Checks if series series increment type

series_increment

series_increment_type

series_increment_type

Series

* completness argument is only used for dataseries calling missing_values_data

** the check for missing values is always passed over series values as the missing values in the series dimesion have to be removed before passing other tests.

Information about each check argument is shown in the table below:

Argument

Check

Description

Possible values

Default

completeness

Missing values

If set to ‘any’ the check will fail if there is any missing value for any series value. If set to ‘all’ the check will fail if all the data values are missing for a given series value (column). It only has an effect when data is a matrix (2 or more dimensions).

‘any’ or ‘all

‘any’

outliers_method

Outlier values

The method to be used. Can be ‘std’ for standard deviation method or ‘iqr’ for interquartile range method.

‘std’ or ‘iqr’

‘std’

outliers_nstd

Outlier values

For ‘std’ method, the number of standard deviations to define outliers.

float > 0

2

outliers_niqr

Outlier values

For ‘iqr’ method, the number of interquartile ranges to define outliers.

float > 0

1.5

series_range_values

Series range

The minimum and maximum value of the series.

[float, float]

[-inf, inf]

series_increment_type

Series increment type

The series distribution. If ‘linear’ will check if the series increment linearly.

‘linear’

‘linear’