Resume bulkload on abort?

Open earthquakesan opened this issue 8 years ago • 1 comments

Just had a situation, when the mapreduce job finished processing RDF and saving it to /tmp folder. However, my HBase server stopped working and ./bulkload reported error and quit.

This scenario is unlikely to happen in production cluster (redundant zookeeper + redundant hbase), however for local clusters it would nice to have ./bulkload split into two phases:

Mapreduce data to hbase tables (i.e. saving it to /tmp folder)
Load /tmp folder into hbase table "bla"

Right now, if I want to continue ./bulkload it simply throws stating that /tmp folder already exists.

Oct 01 '17 12:10 earthquakesan

Hi Ivan,yes, making the two parts of the bulk load separated by a command line switch or a separated command makes sense.However if you still have the h-files in the temp HDFS folder, you can bulk load them with following command: hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles hdfs://storefileoutput Thanks,Adam -------- Původní zpráva --------Od: Ivan Ermilov [email protected] Datum: 01.10.17 14:22 (GMT+01:00) Komu: Merck/Halyard [email protected] Cc: Subscribed [email protected] Předmět: [Merck/Halyard] Resume bulkload on abort? (#27) Just had a situation, when the mapreduce job finished processing RDF and saving it to /tmp folder. However, my HBase server stopped working and ./bulkload reported error and quit. This scenario is unlikely to happen in production cluster (redundant zookeeper + redundant hbase), however for local clusters it would nice to have ./bulkload split into two phases:

Mapreduce data to hbase tables (i.e. saving it to /tmp folder) Load /tmp folder into hbase table "bla"

Right now, if I want to continue ./bulkload it simply throws stating that /tmp folder already exists.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/Merck/Halyard","title":"Merck/Halyard","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/Merck/Halyard"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Resume bulkload on abort? (#27)"}],"action":{"name":"View Issue","url":"https://github.com/Merck/Halyard/issues/27"}}}

Oct 01 '17 12:10 asotona