hookeai.miscellaneous.dataset_processing.read_csv_response_dataset.split_dataset

split_dataset(dataset, split_sizes, is_copy_dataset=False, is_save_subsets=False, subsets_directory=None, subsets_basename=None, seed=None)[source]

Randomly split data set into non-overlapping subsets.

Parameters:
  • dataset (torch.utils.data.Dataset) – Time series data set. Each sample is stored as a dictionary where each feature (key, str) data is a torch.Tensor(2d) of shape (sequence_length, n_features).

  • split_sizes (dict) – Size (item, float) of each data subset name (key, str), where size is a fraction contained between 0 and 1. The sum of all sizes must equal 1.

  • is_copy_dataset (bool, default=False) – If True, then subsets are disconnected from original parent data set.

  • is_save_subsets (bool, default=False) – If True, then save data subsets to files.

  • subsets_directory (str, default=None) – Directory where the data subsets files are stored.

  • subset_basename (str, default=None) – Subset file base name.

  • seed (int, default=None) – Seed for random data set split generator.

Returns:

dataset_split – Data subsets (key, str, item, torch.utils.data.Subset).

Return type:

dict