transformers icon indicating copy to clipboard operation
transformers copied to clipboard

Regression Models

Open vrunm opened this issue 2 years ago • 14 comments

Feature request

I am working on a regression problem and I am looking forward to using Transformers for it but before jumping into the implementation and all stuff, I am curious that can you use transformers for a regression problem? I have around 90 features (floating points) and one target. I couldn’t find any paper on transformers for regression problems so please let me know if any of you used transformers for this purpose.

I am working on a problem where I am having tabular data having more than 90 features and one target and all the features are in integers (continuous). I want to use pre-trained BERT, GPT2 but when it comes to the tokenizer the tokenizer is expecting the input in the text format. I can change the integer data in the text format like this:

original_data = [1,2,3,4,5,…,94] transformed_data = ["1,2,3,4,5,…,94"]

Now if I pass the transformed_data to the tokenizer then surely it will work but I wanna know if someone tried to use transformers for this purpose and if yes, then what was the outcome, and how did the results look like?

How can I use the transformers library for this purpose all the tokenizers are trained for the text data so I am kinda lost. Any help will be appreciate.

Motivation

The purpose of regression models is to predict a continuous output variable based on one or more input variables. Regression models are widely used in many fields such as finance, economics, engineering, and social sciences, where the goal is to understand the relationship between the input variables and the output variable and to make predictions based on that understanding.

In regression analysis, the focus is on building a model that captures the relationship between the input variables and the output variable. This model is then used to predict the values of the output variable for new input data. The model can also be used to identify the important input variables that have a significant impact on the output variable.

Regression models come in various types, such as linear regression, logistic regression, polynomial regression, and others. The choice of the regression model depends on the type of data, the type of relationship between the input and output variables, and the purpose of the analysis.

Your contribution

I can implement some of the code given in this Post:

vrunm avatar May 07 '23 07:05 vrunm

Hi @vrunm, I think you can use the forums for this sort of discussion. Some helpful links would be https://discuss.huggingface.co/t/tabular-classification-regression-pipeline/22030/2 and https://discuss.huggingface.co/t/how-to-set-up-trainer-for-a-regression/12994 (related to your Post). You can check the model documentation for informer and time series transformer. Also, this blog, Probabilistic Time Series Forecasting with 🤗 Transformers might be helpful as well. There are multiple papers and repo as well that use transformers for regression which you can easily find by searching on google.

ayubih avatar May 07 '23 07:05 ayubih

@hsuyab I did go through these links and search online but I want something very specific and customizable. I want something that can be used from Huggingface as a core function.

vrunm avatar May 07 '23 09:05 vrunm

Well you can check these, but I don't understand what you mean exactly by core functionality, https://pytorch-tabular.readthedocs.io/en/latest/ and https://pytorch-forecasting.readthedocs.io/

ayubih avatar May 07 '23 12:05 ayubih

@hsuyab I want to build a multi variate regression model and want to use a Huggingface class specifically designed to that. Not a pipeline which does not allow to train and finetune your model.

vrunm avatar May 07 '23 12:05 vrunm

@vrunm okay, you can try loading in the modules and modifying the class functions by yourself however creating this functionality separately wouldn't make sense imo. It's still better to use some other libraries that are focused on this task or best use something like xgbosst/lightgbm.

ayubih avatar May 07 '23 12:05 ayubih

@hsuyab Sure I will try that but do you have the code to modify the class functions or should I create a PR for this?

vrunm avatar May 07 '23 12:05 vrunm

It's best you create a PR and use that.

ayubih avatar May 07 '23 15:05 ayubih

@hsuyab can you share with me the outline of the classes to change to implement this functionality. I think asking the contributors will be a better choice.

vrunm avatar May 07 '23 17:05 vrunm

Is it possible to implement regression from a specific class of huggingface transformers? What should the outline of the classes to change to implement this as a PR?

vrunm avatar May 07 '23 17:05 vrunm

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jun 06 '23 15:06 github-actions[bot]

@hsuyab were you able to open a PR to add this functionality?

vrunm avatar Jun 06 '23 15:06 vrunm

no, imo performing regression is not something that's needed as a feature in transformers as of now as there are other libraries that are focused on implementing this in a better way.

ayubih avatar Jun 06 '23 23:06 ayubih

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jul 01 '23 15:07 github-actions[bot]

@hsuyab Do you think it is necessary to implement this functionality for now? Would really like your comments on the classes to implemented for this?

vrunm avatar Jul 01 '23 15:07 vrunm

@vrunm @hsuyab Thanks for discussing and raising this issue.

Questions about how to solve problems using transformers are best placed in our forums. We try to reserve the github issues for feature requests and bug reports.

One thing to note is that regression is already possible to do with models like BERT if num_labels is set to 1 in the config e.g. see this line in the code: https://github.com/huggingface/transformers/blob/33aafc26ee68df65c7d9457259fc3d59f79eef4f/src/transformers/models/bert/modeling_bert.py#L1583C26-L1583C26

amyeroberts avatar Jul 11 '23 17:07 amyeroberts

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Aug 05 '23 15:08 github-actions[bot]