Implement Kafka scanner

Open dgoldenberg1234 opened this issue 9 years ago • 0 comments

This ticket would add a scanner implementation that read documents from a kafka topic as a consumer. When documents are large it would be expected that the item read is a pointer and FetchUrl processor is used to subsequently obtain the content. We should also include the ability to include a content hash in the item read from kafka since we will not be able to inspect the bytes of a document to be fetched further down the pipeline before deciding if we should process it.

Apr 05 '16 23:04 dgoldenberg1234