STGraph icon indicating copy to clipboard operation
STGraph copied to clipboard

🔢 Not computing the node features in HungaryCPDataLoader

Open nithinmanoj10 opened this issue 2 years ago • 3 comments

Was going through the methods present in the HungaryCPDataLoader during the dataset abstraction task and noticed the following in the _get_targets_and_features method

Our Version

def _get_targets_and_features(self):
    stacked_target = np.array(self._dataset["FX"])
    self._all_targets = np.array(
        [stacked_target[i, :].T for i in range(stacked_target.shape[0])]
    )

But inside the PyTorch Geometric Temporal version they are computing the features list

PyG-T Version

def _get_targets_and_features(self):
    stacked_target = np.array(self._dataset["FX"])
    self.features = [
        stacked_target[i : i + self.lags, :].T
        for i in range(stacked_target.shape[0] - self.lags)
    ]
    self.targets = [
        stacked_target[i + self.lags, :].T
        for i in range(stacked_target.shape[0] - self.lags)
    ]

Need to confirm why we omitted this computation in our dataloader and add it back in if necessary.

nithinmanoj10 avatar Nov 05 '23 06:11 nithinmanoj10

Montevideo Bus

Something similar was also noticed in MontevideoBusDataLoader, where we are only calculating the target and not the features. We were also not considering the lags while calculating the target.

Our Version

def _get_targets(self, target_var: str = "y"):
    targets = []
    for node in self._dataset["nodes"]:
        y = node.get(target_var)
        targets.append(np.array(y))
    stacked_targets = np.stack(targets).T
    standardized_targets = (
        stacked_targets - np.mean(stacked_targets, axis=0)
    ) / np.std(stacked_targets, axis=0)
    self._all_targets = np.array([
        standardized_targets[i, :].T
        for i in range(len(standardized_targets))
    ])

PyG-T version

def _get_features(self, feature_vars: List[str] = ["y"]):
    features = []
    for node in self._dataset["nodes"]:
        X = node.get("X")
        for feature_var in feature_vars:
            features.append(np.array(X.get(feature_var)))
    stacked_features = np.stack(features).T
    standardized_features = (
        stacked_features - np.mean(stacked_features, axis=0)
    ) / np.std(stacked_features, axis=0)
    self.features = [
        standardized_features[i : i + self.lags, :].T
        for i in range(len(standardized_features) - self.lags)
    ]

def _get_targets(self, target_var: str = "y"):
    targets = []
    for node in self._dataset["nodes"]:
        y = node.get(target_var)
        targets.append(np.array(y))
    stacked_targets = np.stack(targets).T
    standardized_targets = (
        stacked_targets - np.mean(stacked_targets, axis=0)
    ) / np.std(stacked_targets, axis=0)
    self.targets = [
        standardized_targets[i + self.lags, :].T
        for i in range(len(standardized_targets) - self.lags)
    ]

For the new dataloader that is being implemented #85 , I will be following the PyG-T version unless our previous version is a better choice.

nithinmanoj10 avatar Nov 25 '23 10:11 nithinmanoj10

PedalMe

Something similar was also noticed for PedalMe dataset in our version.

nithinmanoj10 avatar Nov 26 '23 16:11 nithinmanoj10

WikiMath

Noticed that there is a difference in our version when calculating the standardized_target array. We are also not using the lags parameter when calculating self._all_targets. We are also omitting the calculation of self._all_features

PyG-T Version

 def _get_targets_and_features(self):

    targets = []
    for time in range(self._dataset["time_periods"]):
        targets.append(np.array(self._dataset[str(time)]["y"]))
    stacked_target = np.stack(targets)
    standardized_target = (
        stacked_target - np.mean(stacked_target, axis=0)
    ) / np.std(stacked_target, axis=0)
    self.features = [
        standardized_target[i : i + self.lags, :].T
        for i in range(len(targets) - self.lags)
    ]
    self.targets = [
        standardized_target[i + self.lags, :].T
        for i in range(len(targets) - self.lags)
    ]

Our Version

def _set_targets(self):
    r"""Calculates and sets the target attributes"""
    targets = []
    for time in range(self.gdata["total_timestamps"]):
        targets.append(np.array(self._dataset[str(time)]["y"]))
    stacked_target = np.stack(targets)
    standardized_target = (stacked_target - np.mean(stacked_target, axis=0)) / (
        np.std(stacked_target, axis=0) + 10**-10
    )
    breakpoint()
    self._all_targets = np.array(
        [standardized_target[i, :].T for i in range(len(targets))]
    )

nithinmanoj10 avatar Jan 06 '24 09:01 nithinmanoj10