cpuhrsch
cpuhrsch
find_match searches a list of strings and returns first entry that partially or fully contains the given string match.
This PR changes the IMDB download to actually use the filename stored to detect whether the data has already been downloaded. This can further prevent unnecessary querying of google drive.
Edit raw.translation dataset to return a RawTextIterableDataset, which uses worker information to restrict the underlying iterator to a subset such that DataLoader won't return duplicate entries, if given an instance...
This PR typedefs strings that are meant to be constant, i.e. read-only. They can then be optionally replaced by std::string_view. Also adds the ever so important "#pragma once" to the...
Create a single union regular expression that uses a lambda to query a dictionary of patterns for the correct replacement. This causes a significant speedup, however is different from the...
Right now the default sometime is "train", "test", "valid" and sometimes (but more commonly) "train", "valid", "test". We should pick a single convention (this PR opts for the latter) to...
Language modeling datasets construct *all* datasets even if only a subset is constructed. It also stores the fully numericalized version of the dataset if it's stored as "a single line"...
a) The documentation doesn't clearly state that one factory function is meant to be used to construct a Vocabulary from a dataset (e.g. AG_NEWS) and another is meant to be...