Flowise icon indicating copy to clipboard operation
Flowise copied to clipboard

[FEATURE] Add glob to file loaders

Open robertofabrizi opened this issue 2 years ago • 2 comments

Describe the feature you'd like Override all instances of Loaders with the constructor that also takes as input an optional glob to filter out by file extention.

Additional context As seen in the source code here the Folder with Files creates a DirectoryLoader passing the path and the mapping between file extention and loader for that specific extention (mostly TextLoaders ofc). That is fine per se, but the fact is that Folder with Files takes an optional TextSplitter as its input, which is then applied to every loaded file. Just to test it out, I pointed it to a path with both Java and Scala sources, linked to it a Java Code Splitter, and it used it to split Scala sources too. Maybe with the simple addition of an extra input field in the UI we could use something like this

loader = DirectoryLoader('../', glob="**/*.md")

and filter out only the file extentions that make sense based on the input TextSplitter.

robertofabrizi avatar Feb 01 '24 08:02 robertofabrizi

so does that means the filtered files will be splitted by the connected splitter, while the rest will not?

HenryHengZJ avatar Feb 06 '24 17:02 HenryHengZJ

so does that means the filtered files will be splitted by the connected splitter, while the rest will not?

That'd be my idea, which might not be the optimal solution and maybe there is a better way, but I rekon that it'd be better than parsing a git repo with multiple file extentions, splitting them all with just one type of splitter, even those not matching said splitter tokens, which end up as vectors in the db making noise and reducing accuracy.

robertofabrizi avatar Feb 06 '24 17:02 robertofabrizi