TextAnalysis.jl
TextAnalysis.jl copied to clipboard
Julia package for text analysis
Added a directional `coo_matrix()` version to `coom.jl` so that the directional or asymmetric coocurrence matrix can be built with `CooMatrix()`. The current version of coo_matrix() is bidirectional or symmetric and...
I'm trying to create a StringDocument based on a string that contains utf-8 characters, and all i'm getting is a `StringIndexError` My code is as follows ```julia str = "Lo...
I was looking into https://github.com/JuliaText/TextAnalysis.jl/issues/149 and realized that the underlying problem is an assumption that `NGramDocument`s' n-grams are made up of individual tokens, but each n-gram is actually just a...
Stemming a NGramDocument stems only the last word of each ngram. Notice below how `repository` is stemmed to `repositori` in one place but left intact in another. ``` julia> td...
BinaryProvider has not been updated for arm64-apple-darwin. This prevents TextAnalysis from working natively on Apple Silicon macs as well.
Would be nice to include the case that tokens are `CorpusLoaders.TaggedWord`, for example. cc @aviks
Upon trying to remove sparse terms from a corpus via ```julia remove_sparse_terms!(corp, .05) ``` I run into the following error message: ```julia PCRE compilation error: regular expression is too large...
In the document Preprocessing Documents section: https://juliahub.com/docs/TextAnalysis/5Mwet/0.7.2/documents/#Preprocessing-Documents In line 384, prepare!(sd, strip_preposition) do not work but prepare!(sd, strip_prepositions) work. In line 389, prepare!(sd, strip_spares_terms) do not work but prepare!(sd, strip_sparse_terms)...
resolves: https://github.com/JuliaText/TextAnalysis.jl/issues/242 On testing, it raises `UndefVarError: BIO1 not defined`. I tried figuring out the reason but couldn't. Should I use something else instead of `Main`? I am new to...
I needed the calculation of cosine similarity. My first attempt was a bare implementation of a wikpedia article. But I found out, that this was not as fast as desired...