If Alpha Server Fail when book-service do send event, the book-service will fail forerver
It report every time when alpha server fail for example database down
{
"timestamp": 1531398807451,
"status": 500,
"error": "Internal Server Error",
"exception": "javax.transaction.TransactionalException",
"message": "Failed to process subsequent requests because no alpha server is available",
"path": "/booking/abc/3/3"
}
Why RetryableMessageSender do throw OmegaException when send event :SagaStartedEvent
@Override
public AlphaResponse send(TxEvent event) {
if (event.type() == SagaStartedEvent) {
throw new OmegaException("Failed to process subsequent requests because no alpha server is available");
}
try {
return availableMessageSenders.take().send(event);
} catch (InterruptedException e) {
throw new OmegaException("Failed to send event " + event + " due to interruption", e);
}
}
LoadBalancedClusterMessageSender.java
@Override
public AlphaResponse send(TxEvent event) {
do {
MessageSender messageSender = fastestSender();//If Failure First Time, the sender must be
//large latency,the second try must be the
//RetryableMessageSender
try {
long startTime = System.nanoTime();
AlphaResponse response = messageSender.send(event);
senders.put(messageSender, System.nanoTime() - startTime); //Only success can modify the
//senders map
return response;
} catch (OmegaException e) {
throw e;
} catch (Exception e) {
LOG.error("Retry sending event {} due to failure", event, e);
// very large latency on exception
senders.put(messageSender, Long.MAX_VALUE);
}
} while (!Thread.currentThread().isInterrupted());
throw new OmegaException("Failed to send event " + event + " due to interruption");
}
private MessageSender fastestSender() {
return senders.entrySet()
.stream()
.filter(entry -> entry.getValue() < Long.MAX_VALUE)
.min(Comparator.comparingLong(Entry::getValue))
.map(Entry::getKey)
.orElse(retryableMessageSender);
}
As alpha server works as a coordinate (the compensation call is count here ), if omega cannot talk to the coordinate it just keep throwing the exception by using fast fail strategy.
The alpha server is not down ,keeping normal connection , but Omega may consider it down forerver;
Even if alpha cluster with many servers,only one fail may result in killing a server,
senders.put(messageSender, Long.MAX_VALUE); and fastestSender() actually consider the alpha server dead but just It is possible that alpha server just fail one time。
I can produce it in SpringCloud demo,by stop the mysql of alpha server;in the meantime,trigger the request /booking/abc/3/3 at postman,the book-service will response Failed to process subsequent requests because no alpha server is available, then start the mysql, retrigger the request,but book-service still response the Failed to process subsequent requests because no alpha server is available。To resovle that only by means of restart the boo-service process.
It means If alpha server upgrade or mysql shake,every application must restart
I got your point, alpha server should be stateless, so it doesn't make sense that it throw exception when the event is saga-start. And we need to revisit the retry logic here. I just create a SCB-745 for this issue.