Transfer Learning Errors
First, Transfer Learning documentation needs an update. It references
` data = dp.Data("your_file.csv")
model_results = data_labeler.fit(x=data['samples'], y=data['labels'], validation_split=0.2, epochs=2, labels=labels) `
Accessing the dp.Data object directly isn't subscriptable - you need the data property of the BaseData child class to access the embedded dataframe.
eg:
data = dp.Data("your_file.csv") data_frame = data.data model_results = data_labeler.fit(x=data_frame['samples'], y=data_frame['labels'], validation_split=0.2, epochs=2, labels=labels)
Second, when running a transfer learning per documented process, the package returns the following error:
ValueError: The
default_labelof UNKNOWN must exist in the label mapping.
This makes sense when extending with a new label, but transfer learning doesn't work as documented.
@FryLaurie I've been having the same issue. I noticed that on the examples page, they use different options to "retrain" the labeler: https://capitalone.github.io/DataProfiler/labeler.html#Transfer-Learning-a-Labeler
# this will use transfer learning to retrain the labeler on your new dataset and labels.
# Setting labels with a list of labels or label mapping will overwrite the existing labels with new ones
# Setting the reset_weights parameter to false allows transfer learning to occur
model_results = labeler.fit(x=df_data[0], y=df_data[1], validation_split=0.2,
epochs=10, labels=None, reset_weights=False)
Notice these 2 options: labels=None, reset_weights=False
A warning however: when I rerun my script to test it with those options, my laptop runs out of memory and the process is killed before it has a chance to complete, even with only 1 epoch.
However, this could be an indication that it is actually working and attempting to extend the base labeler/model, it's just that their base model is probably very large and my computer can't handle it.
So I can't prove if this actually works... but I figured I'd share anyway since no one answered your issue. And if you have access to stronger computers and can get it to run and it does work, let me know :D