jesterj
jesterj copied to clipboard
Implement Kafka scanner
This ticket would add a scanner implementation that read documents from a kafka topic as a consumer. When documents are large it would be expected that the item read is a pointer and FetchUrl processor is used to subsequently obtain the content. We should also include the ability to include a content hash in the item read from kafka since we will not be able to inspect the bytes of a document to be fetched further down the pipeline before deciding if we should process it.