inconsistency observed between different functions
HI @antoinedemathelin
This is my setup to call DANN function
Case1 : DANN "model_DANN = adapt.feature_based.DANN(encoder=get_model_encode(), task=get_model_task(), discriminator=get_model_descriminator(), Xt=data_target)
#train DANN model_DANN.fit(data_source, epochs=1, batch_size=128)"
I am trying to follow the same procedure to call Finetuning / Regular transferlearning NN functions. Case 2: model fine tuning finetunig = FineTuning(encoder=get_model_encode(), task=get_model_task())
finetunig.fit(data_source, epochs=1, batch_size=128)
Case3 src_model = RegularTransferNN(task = get_model_baseline(),lambdas=0., random_state=19)
src_model.fit(data_source, epochs=1, verbose=1, batch_size=128)
Observations:
- DANN works fine for the set of inputs i have.
- The encoder, task and descriminator and data source functions are exactly the same in both the setups.
- the batch size of data source is set to 128 in both cases. So it should return 128x10x300x64 data every iteration for the fit function.
- Overall i have a 2000 data points each of size 10x300x64. 128 is not a mulitple of 2000!
Question: in case 2 i get an error when i use batch size of 128 and i dont get an error if i use batch size which is a divisor of 2000, say 100. I am unsure how DANN is able to handle this where as Finetuning / Transferlearning NN is unable to handle this.
Error seen in Finetuning is mainly related to the querying of more data than whatsa actually available in last step of the epoch! Is it possible that DANN handles this well where as Finetuning doesnt. In case of TransferlearningNN i get some weird error like this: error_RegularTransferLearningNN.txt
Hi @sreenivasaupadhyaya ,
Thank you for reporting these bugs. Indeed, there is a lot of difficulties to handle when the dataset size is not a multiplier of the batch size. I think I know why DANN is working and not FineTuning but I will need to further investigate.
For the second error, it's strange indeed. can you please send me the code of the function get_model_baseline, I will try on my side to reproduce the error with synthetic data.
@antoinedemathelin
here is the code
def get_model_baseline(): # model definition spec_start = Input(shape=(10,300,64)) # CNN spec_cnn = spec_start conv_layers = 1 nb_cnn2d_filt = [16] for i in range(conv_layers): spec_cnn = Conv2D(filters=nb_cnn2d_filt[i], kernel_size=(3, 3), padding='same')(spec_cnn) spec_cnn = BatchNormalization()(spec_cnn) spec_cnn = Activation('relu')(spec_cnn) spec_cnn = MaxPooling2D(pool_size=(5, 4),data_format='channels_first', padding='same')(spec_cnn) spec_cnn = Dropout(0.05)(spec_cnn) spec_cnn = Permute((2, 1, 3))(spec_cnn)
#Reshape
spec_rnn = Reshape((60,-1 ))(spec_cnn)
sed = spec_rnn
sed = TimeDistributed(Dense(128, input_dim=256, activation='relu'))(sed)
sed = TimeDistributed(Dense(7, activation='softmax'))(sed)
model_task = Model(inputs=spec_start, outputs=sed)
model_task.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy',metrics=['accuracy','categorical_crossentropy'])
model_task.summary()
return model_task
Hi @sreenivasaupadhyaya,
I tried to reproduce your bugs, but I didn't find them, for FineTuning I try different batch_size and it works fine. Are you using validation data? I think you can have this kind of bug when the size of your validation data is not a multiplier of the batch_size. For RegularTransfer I didn't find the bug neither, I find an other one nevertheless, when I try to fit the network from get_model_baseline I get an error with MaxPooling being on channel_first, when I change to channel_last all work fine. It may come from this, I see that you use a Permute layer after MaxPooling, can't you use it before to avoid using channel_first instead of channel_last ?
However, I make the tests on a Python 3.8 environment with Tensorflow version more recent than the one you use, so here is the code I used, please try it on your side and tell me if you encounter a bug:
First, the setup:
import adapt
import numpy as np
import tensorflow as tf
from tensorflow.keras.optimizers import SGD
# Fake data for the example
X_target = np.random.randn(32, 10, 300, 64)
X_source = np.random.randn(32, 10, 300, 64)
y_source = np.random.randn(32, 60, 7)
nb_of_target_data = 32
nb_of_source_data = 32
# Fake networks
def get_model_encoder():
mod = tf.keras.Sequential()
mod.add(tf.keras.layers.GlobalAvgPool2D(input_shape=(10, 300, 64)))
return mod
def get_model_task():
mod = tf.keras.Sequential()
mod.add(tf.keras.layers.Dense(1, input_shape=(64,)))
mod.add(tf.keras.layers.Dense(60*7))
mod.add(tf.keras.layers.Reshape((60, 7)))
return mod
def get_model_discriminator():
mod = tf.keras.Sequential()
mod.add(tf.keras.layers.Dense(1, activation="sigmoid"))
return mod
# Build your generators by yielding the data
def data_gen_target():
for i in range(nb_of_target_data):
yield X_target[i]
def data_gen_source():
for i in range(nb_of_source_data):
yield (X_source[i], y_source[i])
# Build tensorflow datasets
target_data = tf.data.Dataset.from_generator(data_gen_target,
output_types=tf.float32,
output_shapes=[10, 300, 64])
source_data = tf.data.Dataset.from_generator(data_gen_source,
output_types=(tf.float32, tf.float32),
output_shapes=([10, 300, 64], [60, 7]))
FineTuning:
finetuning = adapt.parameter_based.FineTuning(get_model_encoder(),
get_model_task())
finetuning.fit(source_data,
epochs=3,
batch_size=10)
RegularTransfer
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Reshape, TimeDistributed
from tensorflow.keras.layers import Activation, MaxPooling2D, Dropout, Permute, Dense
from tensorflow.keras import Model
from tensorflow.keras.optimizers import Adam
def get_model_baseline():
# model definition
spec_start = Input(shape=(10,300,64))
# CNN
spec_cnn = spec_start
conv_layers = 1
nb_cnn2d_filt = [16]
for i in range(conv_layers):
spec_cnn = Conv2D(filters=nb_cnn2d_filt[i], kernel_size=(3, 3), padding='same')(spec_cnn)
spec_cnn = BatchNormalization()(spec_cnn)
spec_cnn = Activation('relu')(spec_cnn)
spec_cnn = MaxPooling2D(pool_size=(5, 4),data_format='channels_last', padding='same')(spec_cnn)
spec_cnn = Dropout(0.05)(spec_cnn)
spec_cnn = Permute((2, 1, 3))(spec_cnn)
spec_rnn = Reshape((60,-1 ))(spec_cnn)
sed = spec_rnn
sed = TimeDistributed(Dense(128, input_dim=256, activation='relu'))(sed)
sed = TimeDistributed(Dense(7, activation='softmax'))(sed)
model_task = Model(spec_start, sed)
model_task.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy',metrics=['accuracy','categorical_crossentropy'])
model_task.summary()
return model_task
reg = adapt.parameter_based.RegularTransferNN(get_model_baseline(),
lambdas=0.,
random_state=19)
reg.fit(source_data, batch_size=10, epochs=3)
I tried to replicate the code you posted and for the RegularTransferNN, i get the same error with your setup as well. error.pdf
Hi @sreenivasaupadhyaya, Which version of adapt and Tensorflow are you using?
Hi @sreenivasaupadhyaya, Which version of adapt and Tensorflow are you using?
I am using tensorflow 2.3.1 and python 3.6.9 and adapt 0.4.1
Hi @sreenivasaupadhyaya,
I find the issue for RegularTransfer. The train_step function used was not compatible with Tensorflow 2.3.1. I opened a pull-request to fix this: #64
For FineTuning, it seems that there is no bug with the batch size, so I guess your bug comes from the use of validation_data ?
The bug has been fixed, you can install the fixed version with:
pip install git+https://github.com/adapt-python/adapt.git
@antoinedemathelin thanks for the fix, However i do see the same issue again for the RegularTransferNN -> "TypeError: minimize() got an unexpected keyword argument 'tape' "
Is there anyway to check if i have obtained the changes after reinstalling adapt using the command you specified.
Hello @sreenivasaupadhyaya,
Yes, I think you should do pip uninstall adapt before the above command line. It should work then.
pip install git+https://github.com/adapt-python/adapt.git
It works now! yay. For fine tuning, its some compatibility issue with my custom generator i think,Will keep posted and close the issue!
Thank you @antoinedemathelin I am closing this issue for now. Just as a closing comment, i am finding some interesting observation that each step in the batch data generation is called twice when the data size is not integer multiple of the batch size. code " import adapt import numpy as np import tensorflow as tf from tensorflow.keras.optimizers import SGD
Fake data for the example
X_target = np.random.randn(32, 10, 300, 64) X_source = np.random.randn(32, 10, 300, 64) y_source = np.random.randn(32, 60, 7)
X_val = np.random.randn(32, 10, 300, 64) y_val = np.random.randn(32, 60, 7)
X_val2 = np.random.randn(32, 10, 300, 64) y_val2 = np.random.randn(32, 60, 7)
nb_of_target_data = 32 nb_of_source_data = 32 nb_of_val_data = 32 nb_of_val_data2 = 32
Fake networks
def get_model_encoder(): mod = tf.keras.Sequential() mod.add(tf.keras.layers.GlobalAvgPool2D(input_shape=(10, 300, 64))) return mod
def get_model_task(): mod = tf.keras.Sequential() mod.add(tf.keras.layers.Dense(1, input_shape=(64,))) mod.add(tf.keras.layers.Dense(60*7)) mod.add(tf.keras.layers.Reshape((60, 7))) return mod
def get_model_discriminator(): mod = tf.keras.Sequential() mod.add(tf.keras.layers.Dense(1, activation="sigmoid")) return mod
Build your generators by yielding the data
def data_gen_target(): print("Target") for i in range(nb_of_target_data): print('Target:',i) yield X_target[i]
def data_gen_source(): print("Enter") for i in range(nb_of_source_data): print('Source:',i) yield (X_source[i], y_source[i]) def data_gen_val(): print("Enter val") for i in range(nb_of_val_data): print('Val:',i) yield (X_val[i], y_val[i]) def data_gen_val2(): print("Enter val2") for i in range(nb_of_val_data2): print('Val2:',i) yield (X_val2[i], y_val2[i])
Build tensorflow datasets
target_data = tf.data.Dataset.from_generator(data_gen_target, output_types=tf.float32, output_shapes=[10, 300, 64])
source_data = tf.data.Dataset.from_generator(data_gen_source, output_types=(tf.float32, tf.float32), output_shapes=([10, 300, 64], [60, 7]))
val_data = tf.data.Dataset.from_generator(data_gen_val, output_types=(tf.float32, tf.float32), output_shapes=([10, 300, 64], [60, 7]))
val_data2 = tf.data.Dataset.from_generator(data_gen_val2, output_types=(tf.float32, tf.float32), output_shapes=([10, 300, 64], [60, 7]))
ft = adapt.parameter_based.FineTuning(get_model_encoder(), get_model_task(), training=False, pretrain=False)
"
execution: case1:
ft.fit(source_data,epochs=1, batch_size=10)
case2: ft.fit(source_data.batch(10), epochs=1)
observation: in case1, some how each step (iteration) of the data is called twice in the last part of the logs below.
Computing src dataset size... Done! Computing tgt dataset size... Done! Enter Source: 0 Source: 1 Source: 2 Source: 3 Source: 4 Source: 5 Source: 6 Source: 7 Source: 8 Source: 9 Source: 10 Source: 11 Source: 12 Source: 13 Source: 14 Source: 15 Source: 16 Source: 17 Source: 18 Source: 19 Source: 20 Source: 21 Source: 22 Source: 23 Source: 24 Source: 25 Source: 26 Source: 27 Source: 28 Source: 29 Source: 30 Source: 31 Enter Source: 0 Source: 1 Source: 2 Source: 3 Source: 4 Source: 5 Source: 6 Source: 7 Source: 8 Source: 9 Source: 10 Source: 11 Source: 12 Source: 13 Source: 14 Source: 15 Source: 16 Source: 17 Source: 18 Source: 19 Source: 20 Source: 21 Source: 22 Source: 23 Source: 24 Source: 25 Source: 26 Source: 27 Source: 28 Source: 29 Source: 30 Source: 31 Enter Source: 0 Enter Source: 0 Source: 1 Source: 1 Source: 2 Source: 2 Source: 3 Source: 3 Source: 4 Source: 4 Source: 5 Source: 5 Source: 6 Source: 6 Source: 7 Source: 7 Source: 8 Source: 8 1/Unknown - 0s 243us/step - loss: 1.0305Source: 9 Source: 9 Source: 10 Source: 10 Source: 11 Source: 11 Source: 12 Source: 12 Source: 13 Source: 13 Source: 14 Source: 14 Source: 15 Source: 15 Source: 16 Source: 16 Source: 17 Source: 17 Source: 18 Source: 18 Source: 19 Source: 19 Source: 20 Source: 20 Source: 21 Source: 21 Source: 22 Source: 22 Source: 23 Source: 23 Source: 24 Source: 24 Source: 25 Source: 25 Source: 26 Source: 26 Source: 27 Source: 27 Source: 28 Source: 28 Source: 29 Source: 29 Source: 30 Source: 30 Source: 31 Source: 31 4/4 - 0s 16ms/step - loss: 1.0187 adapt.parameter_based._finetuning.FineTuning at 0x7f46b85c5780
in case 2, each step (iteration) of the data is called only once in the last part of the logs below.
logs below: Computing src dataset size... Done! Computing tgt dataset size... Done! Enter Source: 0 Source: 1 Source: 2 Source: 3 Source: 4 Source: 5 Source: 6 Source: 7 Source: 8 Source: 9 Source: 10 Source: 11 Source: 12 Source: 13 Source: 14 Source: 15 Source: 16 Source: 17 Source: 18 Source: 19 Source: 20 Source: 21 Source: 22 Source: 23 Source: 24 Source: 25 Source: 26 Source: 27 Source: 28 Source: 29 Source: 30 Source: 31 Enter Source: 0 Source: 1 Source: 2 Source: 3 Source: 4 Source: 5 Source: 6 Source: 7 Source: 8 Source: 9 Source: 10 Source: 11 Source: 12 Source: 13 Source: 14 Source: 15 Source: 16 Source: 17 Source: 18 Source: 19 Source: 20 Source: 21 Source: 22 Source: 23 Source: 24 Source: 25 Source: 26 Source: 27 Source: 28 Source: 29 Source: 30 Source: 31 4/4 - 0s 3ms/step - loss: 1.0109 <adapt.parameter_based._finetuning.FineTuning at 0x7f46b85c5780>
Hi @sreenivasaupadhyaya,
Yes, it's a good observation, but this is not a bug, it is done in purpose.
Having a last batch smaller than the batch size can be detrimental for some algorithm, we have observed that if the last batch is very small (ex: size 1 like here) it can disturb a lot the gradient descent... To avoid this, the dataset is resized to a multiple of the batch size (the last batch is filled with previous data from the dataset). See issue #48.
In the second case when you already give a batched dataset, the computed size is no more 32 but 4 (= 32 // 10) so there is no need to resize it, that's why you don't see the duplication.