Models for target with test data only
What are the models (are there any?) provided by adapt which use source data to predict target data but without any training data from the target?
Using the naming conventions from this chart (https://adapt-python.github.io/adapt/map.html), this question is in regard to I believe SrcOnly.
This is regression, by the way. Both source and target share the same features.
So for example, Xs has 10,000 samples with 100 features, Xs_y is 10,000 x 1.
The target contains labeled test data. For example, Xt has 100 samples x 100 features, Xt_y is 100 x 1.
The goal is to transfer knowledge from a model trained on Xs, Xs_y to predict Xt, Xt_y. As there is no training data in the target, the only available prediction would be to directly test a traditional ML model trained on Xs, Xt_y on the target test data.
Hi @davidshumway,
If you have no target data available, i.e. neither Xt nor yt, yes your best "transfer" possibility is to fit your ML model on the source data, i.e. "SrcOnly"
Transfer learning and domain adaptation algorithms assume that you have some clues about what is your target, either a set of target inputs Xt or a little set of both inputs and labels Xt, yt.
If you want to improve your model trained on source whithout any clue about the target, you can look at "generalization" algorithms and regularization techniques.
In the case where your source dataset Xs, ys is composed of several different sources (Xs_1, ..., Xs_n), (ys_1, ..., ys_n) and you suppose that your target will look like one of them, you can still use some algorithms of adapt, as DANN or DeepCORAL to find some common representation space for (Xs_1, ..., Xs_n).
Best,
Very interesting. Thank you, @antoinedemathelin. In this case, perhaps as a baseline a traditional classifier could be trained on source data and then tested on the target.
In a little different direction, another dataset might contain a lot of target data but only independent variables, ie no dependent variable (assuming one dependent variable in the source). For example, there might be 20 years of weather data for a particular city but the city has no PM2 data whereas other cities (sources) have both weather and PM2 data.
Hi @davidshumway, I am not sure to understand your problem, what is the variable you want to predict in your example? If, let's say you want to predict PM2 based on the weather data, then you can use unsupervised domain adaptation to match the source and target weather features. Best,