aihwkit Model Initialized outsize [w_min, w_max]: Pinpointing the bug in issue #604

Description

As discussed in #604, the model weights will sometimes fall outsize [w_min, w_max].

Bug Pinpoint

The bug happens because of the incorrect initialization of w_min_bound_ and w_max_bound_ (see the code). It seems the following code snippet is designed to deal with the situation where the share weights is deployed and PulsedDevice.perfect_bias is turned on. When the PulsedDevice.perfect_bias is on, the last dimension of the weights is incorrectly amplified by 100 times, yielding the incorrect active regions and weights.

// perfect bias
if ((par.perfect_bias) && (j == this->x_size_ - 1)) {
  w_scale_up_[i][j] = par.dw_min;
  w_scale_down_[i][j] = par.dw_min;
  w_min_bound_[i][j] = (T)100. * par.w_min; // essentially no bound
  w_max_bound_[i][j] = (T)100. * par.w_max; // essentially no bound
}

TODO

I was trying to fit the bug directly, but I found that I couldn't control shared_weight through the AnalogLinear initialization. I guess we should design a flag here to better control the shared weight behavior.

Apr 01 '24 23:04 Zhaoxian-Wu

Thanks for raising this issue. perfect_bias is indeed some "old" parameter setting, that should only be relevant for analog_bias. Since we have digital_bias now, it should actually be deleted. It has nothing to do with shared_weights so this is not relevant here.

Apr 02 '24 06:04 maljoras

I see. It seems that using digital_bias instead is a more natural solution. But what does shared_weights do? Does that mean multiple tiles share the same torch array?

Apr 02 '24 17:04 Zhaoxian-Wu

Shared weights is saying that the memory to the tile is handled from torch (and not from within C++). This means that also the backward etc is handled by torch. Note that the RPUCuda library is capable of handling the memory of the tile arrays and data internally (as it is a independent library that can be also used independently of pytorch)

Apr 02 '24 19:04 maljoras

Shared weights is saying that the memory to the tile is handled from torch (and not from within C++). This means that also the backward etc is handled by torch. Note that the RPUCuda library is capable of handling the memory of the tile arrays and data internally (as it is a independent library that can be also used independently of pytorch)

I see. Thanks for your kind explaination :D

Apr 08 '24 04:04 Zhaoxian-Wu

@maljoras do we need to remove the perfect_bias in the code flow when we are using digital bias? what do you suggest here? It looks that we have a bug we need to solve.

May 08 '24 15:05 kaoutar55

I think this could be moved to a new issue @kaoutar55 , since the issue was opened because of a problem that finally seemed to be a concept bug, we can open a discussion about the perfect_bias if you like @maljoras and close this issue because actually the issue was solved, or at least that was my impression correct me if I'm wrong @Zhaoxian-Wu

Aug 06 '24 10:08 Borjagodoy

@Zhaoxian-Wu please look at this and try it at your end with the suggested changes. Let us know if the issue is resolved.

Sep 18 '24 15:09 kaoutar55

Hello @Zhaoxian-Wu! Can you give us a heads up on this? Are you able to make any advancements or think that is need to address anything here even with the latests aihwkit version? If not let us know when you can please to close the issue, thanks!

Jan 13 '25 13:01 PabloCarmona

Hi @Borjagodoy @kaoutar55 @PabloCarmona . Thanks for your response. I believe it is a conceptual issue about what is "perfect_bias". We can move the issue to the discussion.

We can discuss the plan to solve it there, if it's needed. I believe it's worth fixing since it may lead to confusing results when I try to study the weight distribution of analog tiles.

Mar 11 '25 20:03 Zhaoxian-Wu

Moved to discussion https://github.com/IBM/aihwkit/discussions/604 that still on going

Nov 10 '25 17:11 PabloCarmona