Resume bulkload on abort?
Just had a situation, when the mapreduce job finished processing RDF and saving it to /tmp folder. However, my HBase server stopped working and ./bulkload reported error and quit.
This scenario is unlikely to happen in production cluster (redundant zookeeper + redundant hbase), however for local clusters it would nice to have ./bulkload split into two phases:
- Mapreduce data to hbase tables (i.e. saving it to /tmp folder)
- Load /tmp folder into hbase table "bla"
Right now, if I want to continue ./bulkload it simply throws stating that /tmp folder already exists.
Hi Ivan,yes, making the two parts of the bulk load separated by a command line switch or a separated command makes sense.However if you still have the h-files in the temp HDFS folder, you can bulk load them with following command:
hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles hdfs://storefileoutput
Mapreduce data to hbase tables (i.e. saving it to /tmp folder) Load /tmp folder into hbase table "bla"
Right now, if I want to continue ./bulkload it simply throws stating that /tmp folder already exists.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/Merck/Halyard","title":"Merck/Halyard","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/Merck/Halyard"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Resume bulkload on abort? (#27)"}],"action":{"name":"View Issue","url":"https://github.com/Merck/Halyard/issues/27"}}}