memorious
memorious copied to clipboard
Lightweight web scraping toolkit for documents and structured data.
Bumps [servicelayer[amazon,google]](https://github.com/alephdata/servicelayer) from 1.19.0 to 1.20.1. Release notes Sourced from servicelayer[amazon,google]'s releases. 1.19.1 What's Changed Bump fakeredis from 1.7.1 to 1.8 by @dependabot in alephdata/servicelayer#66 Bump fakeredis from 1.8 to...
 When using the `directory` method in the store part of the pipeline, errors will be thrown when memorious tries to save a file with a really long filepath (as...
I'm running a crawler that does a lot of recursion, so much that at some point, the stored session information in Redis expires, and the scraper stops using the configured...
Currently the bulk of the testing we do in Memorious is based on mocking the interface of various functions to perform the tests. Often this doesn't test the expected output;...
This includes a tweak to the parse function so that it generates an entity id before creating an entity. There are two ways in which this can occur 1. Supply...
the parse_ftm function should include the create of an entity_id value. This should be able to be defined within the yml but if it is not then it should be...
Processing a listing of paged results should be built in. * Paging by following 'next' links * Paging by calculating a sequence from the number of results or the 'last'...
Sometimes databases restart for maintenance while a long running crawler is running. Instead of immediately quitting with an in that case, we should retry to establish the connection a few...
We have this recurring request from some editors to index project Confluence wikis into Aleph. The idea is to index all the reporters notes from a given wiki space into...