Houfu Ang
Houfu Ang
I am not able to upgrade from 3.13.6 to 4.x.x. Pressing check for updates in the web interface tells me that 3.13.6 is latest version (hogwash! :anger: ) I have...
I added it as a plugin in the env file > CKAN__PLUGINS="**activity** image_view text_view pdf_view recline_view resource_proxy webpage_view datastore datapusher envvars" Does it work for you?
Is this enabled: https://docs.ckan.org/en/2.10/maintaining/configuration.html#ckan-activity-streams-enabled Should be on by default though...
Might be something to do with the word tokeniser. Needs investigation.
@caryknoop To check my bearings... could you comment what you expect the answer to be for your reproduction? ```python text1 = f"This is a sentence\n\nexamples" text2 = f"This is a...
Investigators gonna be investigating: Table shows various tokenisers and their outputs and expected redlines. |Method | Test sequence 1 | Test sequence 2 | Expected redline| |-------------| ----------------------|-------------------------|----------| |Original redlines...
In my mind this is probably a very important and big feature. What's the minimum feature set? Read and extract only the text (without formatting and pagination) and compare? 🤔...
@HRNPH The latest commit (#28) provides an example pipeline for files. Are you still interested in taking a stab on PDF files? Let me know your thoughts (including which PDF...
Now open to others to try before I do it myself lol.
Thanks for this! Let me take some time to read over it (looks good to merge though)