MLclf icon indicating copy to clipboard operation
MLclf copied to clipboard

Problem I met when using resize

Open a-green-hand-jack opened this issue 2 years ago • 1 comments

I had some problems using transforms.Resize((32, 32) because my network is designed for 33232 images, so I tried to use transforms.Resize((32, 32). But an error like this occurred:

TypeError: Unexpected type <class 'numpy.ndarray'>

To solve this problem, I tried to define a ResizeClass of my own:

class ResizeCustom(transforms.Resize):
    def __init__(self, size, interpolation=Image.BILINEAR):
        super(ResizeCustom, self).__init__(size, interpolation)

    def __call__(self, img):
        if isinstance(img, np.ndarray):
            img = Image.fromarray(img)

        return super(ResizeCustom, self).__call__(img)

However, the problem was not solved and an error like this appeared:

  File "d:\Slef_Learning\MY_Project\WuYang\TDA_new_dataset\nets\net_out_tda.py", line 123, in images_to_matrix_lists
    trainset, validation_dataset, test_dataset = MLclf.miniimagenet_clf_dataset(ratio_train=0.6, ratio_val=0.2, seed_value=None, shuffle=True, transform=self.train_transform, save_clf_data=True)
                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda\envs\PyTorchGpu\Lib\site-packages\MLclf\MLclf.py", line 319, in miniimagenet_clf_dataset
    data_feature_label_permutation_split = MLclf.miniimagenet_convert2classification(data_dir=data_dir, ratio_train=ratio_train, ratio_val=ratio_val, seed_value=seed_value, shuffle=shuffle, task_type='classical_or_meta', save_clf_data=save_clf_data, transform=transform)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda\envs\PyTorchGpu\Lib\site-packages\MLclf\MLclf.py", line 193, in miniimagenet_convert2classification
    data_feature_label['images'] = MLclf._feature_norm(data_feature_label['images'], transform=transform)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda\envs\PyTorchGpu\Lib\site-packages\MLclf\MLclf.py", line 295, in _feature_norm
    feature_output[i] = transform(feature_i)
    ~~~~~~~~~~~~~~^^^
RuntimeError: The expanded size of the tensor (84) must match the existing size (32) at non-singleton dimension 2.  Target sizes: [3, 84, 84].  Tensor sizes: [3, 32, 32]

It seems that there are some problems with the size of the picture, but I don't understand where the problem occurs; especially this [3,84,84], isn't the original size of the picture 64*64?

I have also tried some bits and pieces, but they have no effect. Is there any solution?

a-green-hand-jack avatar Jan 15 '24 02:01 a-green-hand-jack

Well, I found a way that might solve this problem. First, when using Reszie, use ToTenser first, that is:

train_transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Resize((32, 32), antialias=True),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
            ])

Then maybe need to modify a function in the source code _feature_norm

I think the problem is this. A feature_output is created based on the size of the original image. That is to say, its shape corresponds to the shape of the image that was not enhanced before, but the shape of the image after enhancement is different.

After making the following modifications, you can use Reszie normally.

@staticmethod
    def _feature_norm(feature, transform=None):
        """
        This function transforms the dimension of feature from (batch_size, H, W, C) to (batch_size, C, H, W).
        :param feature: feature / mini-imagenet's images.
        :return: transformed feature.
        """
        if transform is None:
            # convert a PIL image to tensor (H*W*C) in range [0,255] to a torch.Tensor(C*H*W) in the range [0.0,1.0]
            transform = transforms.Compose([transforms.ToTensor()])
            print('The argument transform is None, so only tensor conversion and normalization between [0,1] is done!')
        else:
            transform = transform

        feature_output = []
        
        for i, feature_i in enumerate(feature):
            transformed_feature = transform(feature_i)
            feature_output.append(transformed_feature)  # Move the channel dimension to the correct position
        
        return torch.stack(feature_output).numpy()

Of course, under normal circumstances, we may not necessarily use Reszie, but if you are like me and have to use Resize, this method may be helpful.

a-green-hand-jack avatar Jan 15 '24 03:01 a-green-hand-jack