sparkler
sparkler copied to clipboard
Writing Data to Elasticsearch Storage Engine
Task Description
This is a task that is currently being worked on in order to provide Elasticsearch as a backend storage engine option for Sparkler. This builds upon the Factory Pattern outlined in Issue 218 where we abstract out storage engine-specific implementation.
To achieve the final goal of being able to write Sparkler data into the Elasticsearch storage engine, the team envisions that we'll be following these steps:
- Make sure the Elasticsearch storage engine is set up appropriately and ready to accept data
- Write simple data to Elasticsearch a. Perhaps a simple visualization to prove functionality
- Reorganize Sparkler data into a format conducive to Elasticsearch indexing
- Write data into Elasticsearch
- Visualize data in Elasticsearch (this will likely be brought up in a future issue)
This is a WIP and updates will be posted here as we make progress.
@thammegowda @buggtb @lewismc We had a few questions about Crawler.scala while adding Elasticsearch:
- How is the deep crawl different from a "normal" crawl? We only run deep crawl when -dc flag is enabled, but we always run normal crawl?
- What does the FairFetcher class do? Do we need to know since FairFetcher is not specific to Solr?
- Why is "storageProxy.commitCrawlDb()" called before the crawl, after the deep crawl, and after the normal crawl again?