glean issues

stemming

For mainly English text corpora, using a porter stemmer variant at index- and search-time might be a good idea. (If stemming, the terminal $ in the keyword search should be...

silentbicycle

feature

fuzzy keyword search

1

Add agrep / nrgrep support in gln.c - format_cmd. Will need to set up a gunzip -c pipe for compressed token indexes. This will allow searching for keywords with a...

silentbicycle

feature

nyi

easy

large file indexing

A second token table could be added for large files, with an additional value for which block(s) have the token. Using grep to search files whole is slow for very...

silentbicycle

feature

reduce index size

Anything that can reduce the token occurrence index size without greatly increasing complexity or lookup time is worth considering.

silentbicycle

ongoing

portability testing - Cygwin, FreeBSD, NetBSD, etc.

Tested so far on OpenBSD/amd64, Linux (debian/i386), OS X. Testing on FreeBSD, NetBSD, Cygwin, etc. would be good. There are (hopefully) not a lot of portability issues - the main...

silentbicycle

ongoing

Non-text file indexing support

Set up configuration hooks\* for non-text files that nonetheless can be meaningfully indexed: Pass .mp3s through id3tag, PDFs through ps2ascii, .docs through antiword, etc., and index the output. (Optionally, cache...

silentbicycle

feature

i18n

glean should also work for non-ASCII text. It just needs a different hashing algorithm for hash_word in whash.c, a different word separator, and testing by people fluent in a whitespace-separated,...

silentbicycle

feature

incremental database updates

Rather than rebuilding the DB from scratch, add another table to the hash table chains in the DB files. When searching in gln.db, search all tables. Add a "merge"/"pack"/whatever command...

silentbicycle

nyi

add NEAR search operator

Add NEAR (alongside AND, OR, NOT); should be based on grep -C $NUM_CONTEXT_LINES.

silentbicycle

nyi

easy

glean
glean copied to clipboard

Metadata

stemming

fuzzy keyword search

large file indexing

reduce index size

portability testing - Cygwin, FreeBSD, NetBSD, etc.

Non-text file indexing support

i18n

incremental database updates

add NEAR search operator

← Metadata

Owner

Metadata

glean glean copied to clipboard

Metadata

← Metadata

Owner

Metadata

glean
glean copied to clipboard