servicecomb-pack icon indicating copy to clipboard operation
servicecomb-pack copied to clipboard

If Alpha Server Fail when book-service do send event, the book-service will fail forerver

Open wyzssw opened this issue 7 years ago • 3 comments

It report every time when alpha server fail for example database down

{
	"timestamp": 1531398807451,
	"status": 500,
	"error": "Internal Server Error",
	"exception": "javax.transaction.TransactionalException",
	"message": "Failed to process subsequent requests because no alpha server is available",
	"path": "/booking/abc/3/3"
}

Why RetryableMessageSender do throw OmegaException when send event :SagaStartedEvent

@Override
  public AlphaResponse send(TxEvent event) {
    if (event.type() == SagaStartedEvent) {
      throw new OmegaException("Failed to process subsequent requests because no alpha server is available");
    }
    try {
      return availableMessageSenders.take().send(event);
    } catch (InterruptedException e) {
      throw new OmegaException("Failed to send event " + event + " due to interruption", e);
    }
  }
LoadBalancedClusterMessageSender.java

@Override
  public AlphaResponse send(TxEvent event) {
    do {
      MessageSender messageSender = fastestSender();//If Failure First Time, the sender must be 
                                                                                             //large latency,the second try must be the 
                                                                                             //RetryableMessageSender
      try {
        long startTime = System.nanoTime();
        AlphaResponse response = messageSender.send(event);
        senders.put(messageSender, System.nanoTime() - startTime); //Only success can modify the 
                                                                                                                  //senders map

        return response;
      } catch (OmegaException e) {
        throw e;
      } catch (Exception e) {
        LOG.error("Retry sending event {} due to failure", event, e);

        // very large latency on exception
        senders.put(messageSender, Long.MAX_VALUE);
      }
    } while (!Thread.currentThread().isInterrupted());

    throw new OmegaException("Failed to send event " + event + " due to interruption");
  }


  private MessageSender fastestSender() {
    return senders.entrySet()
        .stream()
        .filter(entry -> entry.getValue() < Long.MAX_VALUE)
        .min(Comparator.comparingLong(Entry::getValue))
        .map(Entry::getKey)
        .orElse(retryableMessageSender);
  }


wyzssw avatar Jul 12 '18 12:07 wyzssw

As alpha server works as a coordinate (the compensation call is count here ), if omega cannot talk to the coordinate it just keep throwing the exception by using fast fail strategy.

WillemJiang avatar Jul 12 '18 22:07 WillemJiang

The alpha server is not down ,keeping normal connection , but Omega may consider it down forerver; Even if alpha cluster with many servers,only one fail may result in killing a server, senders.put(messageSender, Long.MAX_VALUE); and fastestSender() actually consider the alpha server dead but just It is possible that alpha server just fail one time。

I can produce it in SpringCloud demo,by stop the mysql of alpha server;in the meantime,trigger the request /booking/abc/3/3 at postman,the book-service will response Failed to process subsequent requests because no alpha server is available, then start the mysql, retrigger the request,but book-service still response the Failed to process subsequent requests because no alpha server is available。To resovle that only by means of restart the boo-service process. It means If alpha server upgrade or mysql shake,every application must restart

wyzssw avatar Jul 13 '18 03:07 wyzssw

I got your point, alpha server should be stateless, so it doesn't make sense that it throw exception when the event is saga-start. And we need to revisit the retry logic here. I just create a SCB-745 for this issue.

WillemJiang avatar Jul 13 '18 08:07 WillemJiang