code2vec icon indicating copy to clipboard operation
code2vec copied to clipboard

Model for other task.

Open XuRuiAngel opened this issue 3 years ago • 2 comments

Thank you for providing the source code to your paper, I have a question. Can this model be used to identify Java design patterns? According to my rough understanding, each line of input data of the current model is the path context of a code block, while the identification of design patterns needs to analyze the whole Java files even cross files. I want to ask if the code2vec model can complete the prediction in this scenario.

XuRuiAngel avatar Oct 19 '22 03:10 XuRuiAngel

Hi @XuRuiAngel,

By default, code2vec is only able to interpret single methods. However, some other studies have tried to overcome this limitation. For example, Compton et al. have used various aggregation methods on the method embeddings to create class-level embeddings. You could experiment yourself with some aggregation methods to combine the various java files and use the resulting embeddings to train a classifier on your task of classifying design patterns.

daveymathijssen avatar Oct 19 '22 09:10 daveymathijssen

Hi @XuRuiAngel , Thank you for your interest in our work.

As @daveymathijssen said, code2vec is currently working on a single function at a time. We have other recent work such as PolyCoder https://github.com/VHellendoorn/Code-LMs and a CodeBERT model that we fine-tuned on Java, that maybe you can fine-tune to your task: https://huggingface.co/neulab/codebert-java

Let me know if you have any more questions. Best, Uri

urialon avatar Oct 19 '22 15:10 urialon