codeprep issues

Bump joblib from 0.15.1 to 1.2.0

Bumps [joblib](https://github.com/joblib/joblib) from 0.15.1 to 1.2.0. Changelog Sourced from joblib's changelog. Release 1.2.0 Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported....

dependabot[bot]

dependencies

Bump pygments from 2.6.1 to 2.7.4

1

Bumps [pygments](https://github.com/pygments/pygments) from 2.6.1 to 2.7.4. Release notes Sourced from pygments's releases. 2.7.4 Updated lexers: Apache configurations: Improve handling of malformed tags (#1656) CSS: Add support for variables (#1633, #1666)...

dependabot[bot]

dependencies

Does codeprep works on JavaScript source code preprocessing?

3

It seems that it dosenot work when delt with javascript language. And is there any solution to remove end of a token'\t' in the token sequence.

Kelin-hao

bug

question

why use byte not str while in path （Windows）

https://github.com/giganticode/codeprep/blob/f5a35b68fab930e095a99dbd83e27f63c23552a4/codeprep/pipeline/to_repr.py#L60 eh, I am working with this repository. on windows I find when I use unicode like chinese in path like "./文档/", to_repr.py is likely to encode this string to...

lyksdu

PreprocessingMetadata enhancement

1

* Rename `PreprocessingMetadata` -> `PreppedTokenMetadata` * Represent `word_boundaries` field as a list of the number of subtoken in each token, e.g [1, 3, 1, 2] instead of [0, 1, 4,...

hlibbabii

enhancement

By default use end-of-full-token character (</t>) instead of token boundaries (<w>, </w>) for all kinds of pre-processing for consistency

Currently: ```python >>> api.basic("getName") ['', 'get', 'Name', ''] ``` To be done: ```python >>> api.basic("getName") ['get', 'Name', ''] ```

hlibbabii

enhancement

Create PreppedTokenSequence class to incapsulate getting full tokens from subtokens

The tasks for the new `PreppedTokenSequence` class are to encapsulate getting full tokens from subtokens (which is currently done by `FullTokenIterator` class) and at the same time provide transparent access...

hlibbabii

enhancement

Enhance `ParsedToken` hierarchy

* rename `SplitContainer` to Identifier * make Identifier abstract and extend it with `SingleWordIdentifier`, `TwoWordIdentifier`, `ThreeWordIdentifier`, `FourOrMoreWordIdentifier` * make other classes that have sub-classes abstract

hlibbabii

enhancement

codeprep
codeprep copied to clipboard

Metadata

Bump joblib from 0.15.1 to 1.2.0

Bump pygments from 2.6.1 to 2.7.4

Does codeprep works on JavaScript source code preprocessing?

why use byte not str while in path （Windows）

PreprocessingMetadata enhancement

By default use end-of-full-token character (</t>) instead of token boundaries (<w>, </w>) for all kinds of pre-processing for consistency

Create PreppedTokenSequence class to incapsulate getting full tokens from subtokens

Enhance `ParsedToken` hierarchy

← Metadata

Owner

Metadata

codeprep codeprep copied to clipboard

Metadata

← Metadata

Owner

Metadata

codeprep
codeprep copied to clipboard