logging-flume icon indicating copy to clipboard operation
logging-flume copied to clipboard

allow poorly formatted events/data to be dropped

Open milleka2 opened this issue 10 years ago • 2 comments

I ran into an issue where some of the raw data going into ElasticSearch was malformed (fields didn't match the data mapping), which ES rejected as part of the bulk insert. The Flume ES sink currently handles this by just sending the record over and over (hoping that maybe ES will just accept it later). Unfortunately, this creates a LOT of log traffic in ES default log settings AND it backed up our flume channel, because the data doesn't getting any better by blindly retrying it.

This patch allows users to choose between 3 options on what to do when bulk insert errors occur:

  1. retry until it somehow magically works (current default within apache flume)
  2. log the error message, then drop it
  3. drop it silently.

In our case, we just want to drop it, because losing a few records is worth it to keep our data flows moving. However, it would be better to have a more advanced option that can account for times when the ES server is down. Unfortunately, the ES client API doesn't allow for this flexibility to know the type of error, so this was the best option available at the time.

milleka2 avatar Nov 22 '15 19:11 milleka2

You may be able to add a fourth option,that is 4) log the error message, then make the transmitted data is written to a file specified by the user. This will have a high demand for data services reissue opportunity retries.(^__^)

manzhizhen avatar Jan 15 '16 02:01 manzhizhen

Can one of the admins verify this patch?

asfgit avatar Aug 17 '18 13:08 asfgit

Support for Elasticsearch has been removed from the main Flume build since it is so out of date and its new license is problematic. Please work against the flume-search repo.

rgoers avatar Oct 08 '22 04:10 rgoers