Egor

Results 39 comments of Egor

Hi, I think that it's the same issue for me. After following instruction for installation of tensorflow/fold I got: ``` >>> import tensorflow; tensorflow.__version__ '1.2.1' >>> import tensorflow_fold --------------------------------------------------------------------------- NotFoundError...

Copy-paste of https://github.com/src-d/hercules/issues/188#issuecomment-461127952 About classification - it's possible to collect some labels from PRs in huge repositories like * https://github.com/tensorflow/tensorflow/labels and [bug related](https://github.com/tensorflow/tensorflow/labels?utf8=%E2%9C%93&q=bug) * https://github.com/pytorch/pytorch/labels and [bug related](https://github.com/pytorch/pytorch/labels?utf8=%E2%9C%93&q=bug) * and...

About classification - it's possible to collect some labels from PRs in huge repositories like * https://github.com/tensorflow/tensorflow/labels and [bug related](https://github.com/tensorflow/tensorflow/labels?utf8=%E2%9C%93&q=bug) * https://github.com/pytorch/pytorch/labels and [bug related](https://github.com/pytorch/pytorch/labels?utf8=%E2%9C%93&q=bug) * and so on ###...

Current pattern: `r"[^a-zA-Z|]ci\W|[\s-]ci\W|ci[\s-]|[\s-]ci[\s-]|bot$|pipeline|release|routing"` Problems with regexp: ``` ('cici jiayi shen', '[email protected]'), ('daniel adrian bohbot', '[email protected]'), ('horaci macias', '[email protected]'), ('melvindebot', '[email protected]'), ("daniel obot", "[email protected]") ``` Some French and Chinese names/surnames may...

Ideas: * use regexp to find highly probable bots (19k found from 1300M rows `author.date, author.email, author.name, committer.date, committer.email, committer.name`) * calculate authors/committer fraction - it may show that distributions...

### There are at least several problems that may affect the quality: 1. Noisy labels - * false positives from regexp - like: `abbot`, `julia jenkins` and so on *...

One possible way to solve it - make an option to return result in machine-readable format (ex. JSON).

I will try, but it's still strange. Because several attempts are successful, then it starts to fail. + It was possible to make `hash` step on 300k files after increasing...

Yes, GitHub issues in this particular case. > Could you provide an example of the case `CAT -> issues -> CAT`? Example: https://github.com/src-d/code-annotation/issues/174 - there are several suspicious diff/highlights. So...