hookeai.utilities.fit_data_scalers.fit_data_scaler_from_dataset

fit_data_scaler_from_dataset(dataset, features_type, n_features, scaling_type='mean-std', scaling_parameters={})[source]

Fit features type data scaler from given data set.

Data scaler normalization tensors are fitted from given data set, overriding provided data scaling parameters.

Parameters:
  • dataset (torch.utils.data.Dataset) – Time series data set. Each sample is stored as a dictionary where each feature (key, str) data is a torch.Tensor(2d) of shape (sequence_length, n_features).

  • features_type (str) – Features for which data scaler is fitted (e.g., ‘features_in’, ‘features_out’). Must be directly available from data set samples.

  • n_features (int) – Number of features (dimensionality).

  • scaling_type ({'min-max', 'mean-std'}, default='mean-std') – Type of data scaling. Min-Max scaling (‘min-max’) or standardization (‘mean-std’).

  • scaling_parameters (dict, default={}) – Data scaling parameters (item, dict) for each features type (key, str). For ‘min-max’ data scaling, the parameters are the ‘minimum’ and ‘maximum’ features normalization tensors, as well as the ‘norm_minimum’ and ‘norm_maximum’ normalization bounds. For ‘mean-std’ data scaling, the parameters are the ‘mean’ and ‘std’ features normalization tensors.

Returns:

data_scaler – Data scaler.

Return type:

{TorchStandardScaler, TorchMinMaxScaler}