🔢 Not computing the node features in HungaryCPDataLoader
Was going through the methods present in the HungaryCPDataLoader during the dataset abstraction task and noticed the following in the _get_targets_and_features method
Our Version
def _get_targets_and_features(self):
stacked_target = np.array(self._dataset["FX"])
self._all_targets = np.array(
[stacked_target[i, :].T for i in range(stacked_target.shape[0])]
)
But inside the PyTorch Geometric Temporal version they are computing the features list
PyG-T Version
def _get_targets_and_features(self):
stacked_target = np.array(self._dataset["FX"])
self.features = [
stacked_target[i : i + self.lags, :].T
for i in range(stacked_target.shape[0] - self.lags)
]
self.targets = [
stacked_target[i + self.lags, :].T
for i in range(stacked_target.shape[0] - self.lags)
]
Need to confirm why we omitted this computation in our dataloader and add it back in if necessary.
Montevideo Bus
Something similar was also noticed in MontevideoBusDataLoader, where we are only calculating the target and not the features. We were also not considering the lags while calculating the target.
Our Version
def _get_targets(self, target_var: str = "y"):
targets = []
for node in self._dataset["nodes"]:
y = node.get(target_var)
targets.append(np.array(y))
stacked_targets = np.stack(targets).T
standardized_targets = (
stacked_targets - np.mean(stacked_targets, axis=0)
) / np.std(stacked_targets, axis=0)
self._all_targets = np.array([
standardized_targets[i, :].T
for i in range(len(standardized_targets))
])
PyG-T version
def _get_features(self, feature_vars: List[str] = ["y"]):
features = []
for node in self._dataset["nodes"]:
X = node.get("X")
for feature_var in feature_vars:
features.append(np.array(X.get(feature_var)))
stacked_features = np.stack(features).T
standardized_features = (
stacked_features - np.mean(stacked_features, axis=0)
) / np.std(stacked_features, axis=0)
self.features = [
standardized_features[i : i + self.lags, :].T
for i in range(len(standardized_features) - self.lags)
]
def _get_targets(self, target_var: str = "y"):
targets = []
for node in self._dataset["nodes"]:
y = node.get(target_var)
targets.append(np.array(y))
stacked_targets = np.stack(targets).T
standardized_targets = (
stacked_targets - np.mean(stacked_targets, axis=0)
) / np.std(stacked_targets, axis=0)
self.targets = [
standardized_targets[i + self.lags, :].T
for i in range(len(standardized_targets) - self.lags)
]
For the new dataloader that is being implemented #85 , I will be following the PyG-T version unless our previous version is a better choice.
PedalMe
Something similar was also noticed for PedalMe dataset in our version.
WikiMath
Noticed that there is a difference in our version when calculating the standardized_target array. We are also not using the lags parameter when calculating self._all_targets. We are also omitting the calculation of self._all_features
PyG-T Version
def _get_targets_and_features(self):
targets = []
for time in range(self._dataset["time_periods"]):
targets.append(np.array(self._dataset[str(time)]["y"]))
stacked_target = np.stack(targets)
standardized_target = (
stacked_target - np.mean(stacked_target, axis=0)
) / np.std(stacked_target, axis=0)
self.features = [
standardized_target[i : i + self.lags, :].T
for i in range(len(targets) - self.lags)
]
self.targets = [
standardized_target[i + self.lags, :].T
for i in range(len(targets) - self.lags)
]
Our Version
def _set_targets(self):
r"""Calculates and sets the target attributes"""
targets = []
for time in range(self.gdata["total_timestamps"]):
targets.append(np.array(self._dataset[str(time)]["y"]))
stacked_target = np.stack(targets)
standardized_target = (stacked_target - np.mean(stacked_target, axis=0)) / (
np.std(stacked_target, axis=0) + 10**-10
)
breakpoint()
self._all_targets = np.array(
[standardized_target[i, :].T for i in range(len(targets))]
)