graphorge.gnn_base_model.data.graph_dataset.GNNGraphDataset

class GNNGraphDataset(dataset_directory, dataset_sample_files, dataset_basename='graph_dataset', is_store_dataset=False)[source]

Bases: Dataset

Graph Neural Network graph data set.

_dataset_directory

Directory where the Graph Neural Network graph data set is stored (all data set samples files).

Type:

str

_dataset_sample_files

Graph Neural Network graph data set samples files paths. Each sample file contains a torch_geometric.data.Data object describing a homogeneous graph.

Type:

list[str]

_dataset_samples

Graph Neural Network graph data set samples data. Each sample is stored as a torch_geometric.data.Data object describing a homogeneous graph. Only populated if _is_store_dataset is True, otherwise is set as an empty list.

Type:

list

_is_store_dataset

If True, then the Graph Neural Network graph data set samples are loaded and stored in attribute dataset_samples_data. If False, the dataset object holds solely the samples data files paths and load the corresponding files when accessing a given sample data.

Type:

bool, default=False

_dataset_basename

Data set file base name.

Type:

str

__len__(self):

Return size of data set (number of samples).

__getitem__(self, index)[source]

Return data set sample from corresponding index.

get_dataset_directory(self)[source]

Get directory where the Graph Neural Network graph data set is stored.

get_dataset_sample_files(self)[source]

Get Graph Neural Network graph data set data set samples files paths.

set_dataset_basename(self, dataset_basename)[source]

Set data set file base name.

get_dataset_basename(self)[source]

Get data set file base name.

save_dataset(self)[source]

Save Graph Neural Network graph data set to file.

load_dataset(dataset_file_path)[source]

Load PyTorch data set.

_update_dataset_directory(self, dataset_directory, is_reload_data=False)[source]

Update directory where Graph Neural Network graph data set is stored.

update_dataset_file_internal_directory(dataset_file_path, new_directory, is_reload_data=False)[source]

Update internal directory of stored data set in provided file.

Constructor.

Parameters:
  • dataset_directory (str) – Directory where the Graph Neural Network graph data set is stored (all data set samples files).

  • dataset_sample_files (list[str]) – Graph Neural Network graph data set samples file paths. Each sample file contains a torch_geometric.data.Data object describing a homogeneous graph.

  • dataset_basename (str, default='graph_dataset') – Data set file base name.

  • is_store_dataset (bool, default=False) – If True, then the Graph Neural Network graph data set samples are loaded and stored in attribute dataset_samples_data. If False, the dataset object holds solely the samples data files paths and load the corresponding files when accessing a given sample data.

List of Public Methods

get_dataset_basename

Get data set file base name.

get_dataset_directory

Get directory where Graph Neural Network graph data set is stored.

get_dataset_sample_files

Get Graph Neural Network graph data set samples files paths.

load_dataset

Load PyTorch data set.

save_dataset

Save Graph Neural Network graph data set to file.

set_dataset_basename

Set data set file base name.

update_dataset_file_internal_directory

Update internal directory of stored data set in provided file.

Methods

__init__(dataset_directory, dataset_sample_files, dataset_basename='graph_dataset', is_store_dataset=False)[source]

Constructor.

Parameters:
  • dataset_directory (str) – Directory where the Graph Neural Network graph data set is stored (all data set samples files).

  • dataset_sample_files (list[str]) – Graph Neural Network graph data set samples file paths. Each sample file contains a torch_geometric.data.Data object describing a homogeneous graph.

  • dataset_basename (str, default='graph_dataset') – Data set file base name.

  • is_store_dataset (bool, default=False) – If True, then the Graph Neural Network graph data set samples are loaded and stored in attribute dataset_samples_data. If False, the dataset object holds solely the samples data files paths and load the corresponding files when accessing a given sample data.

_update_dataset_directory(dataset_directory, is_reload_data=False)[source]

Update directory where GNN graph data set is stored.

Stored data set samples files paths directory is updated according with the new directory.

Parameters:
  • dataset_directory (str) – Directory where the Graph Neural Network graph data set is stored (all data set samples files).

  • is_reload_data (bool, default=False) – Reload and store data set samples in attribute dataset_samples_data. Only effective if is_store_dataset=True.

get_dataset_basename()[source]

Get data set file base name.

Returns:

dataset_basename – Data set file base name.

Return type:

str

get_dataset_directory()[source]

Get directory where Graph Neural Network graph data set is stored.

Returns:

dataset_directory – Directory where the Graph Neural Network graph data set is stored (all data set samples files).

Return type:

str

get_dataset_sample_files()[source]

Get Graph Neural Network graph data set samples files paths.

Returns:

dataset_sample_files – Graph Neural Network graph data set samples files paths. Each sample file contains a torch_geometric.data.Data object describing a homogeneous graph.

Return type:

list[str]

static load_dataset(dataset_file_path)[source]

Load PyTorch data set.

Parameters:

dataset_file_path (str) – PyTorch data set file path.

Returns:

dataset – PyTorch data set.

Return type:

torch.utils.data.Dataset

save_dataset(is_append_n_sample=True)[source]

Save Graph Neural Network graph data set to file.

Graph Neural Network graph data set is stored in dataset_directory as a pickle file named graph_dataset.pkl or graph_dataset_n< n_sample >.pkl.

Parameters:

is_append_n_sample (bool, default=True) – If True, then data set size (number of samples) is appended to Graph Neural Network graph data set filename.

Returns:

dataset_file_path – PyTorch data set file path.

Return type:

str

set_dataset_basename(dataset_basename)[source]

Set data set file base name.

Parameters:

dataset_basename (str) – Data set file base name.

static update_dataset_file_internal_directory(dataset_file_path, new_directory, is_reload_data=False)[source]

Update internal directory of stored data set in provided file.

Update is only performed if the new directory does not match the internal directory of the stored data set.

Parameters:
  • dataset_file_path (str) – PyTorch data set file path.

  • new_directory (str) – PyTorch data set new directory.

  • is_reload_data (bool, default=False) – Reload and store data set samples in attribute dataset_samples_data. Only effective if is_store_dataset=True.