WeBASE-Node-Manager
WeBASE-Node-Manager copied to clipboard
压力测试过程中,前置服务节点由于区块高度不相同,临时变成了观察节点后,无法恢复成共识节点
背景与现象:
- webase-node-manager版本:v1.5.5
- 区块链fisco bcos版本:v2.9.0
- 测试场景,有3个group: group1, group2, group3。对group3进行压力测试。group3有5个节点,5个节点都是共识节点
- 对前置节点node A进行数据上链的压力测试,测试过程中出现前置节点与群组的最高高度不一致(可能是允许出现暂时性的不一致,也有可能是代码bug),导致前置节点node A被更新为观察节点。
问题:
- node A更新为观察节点后,再也恢复不了为共识节点。
定位到的问题,以及初步原因分析:
- node A恢复了共识节点的前提条件为node A的区块高度需要被正确更新,更新node A高度的定时任务的执行失败,出现以下报错:
2024-11-20 14:52:55.566 [node-mgr-task-12] INFO ChainService(ChainService.java:518) - Run task:[DeployType:0, isChainRunning:false]
2024-11-20 14:52:55.611 [node-mgr-task-12] ERROR FrontRestTools(FrontRestTools.java:382) - fail restTemplateExchange. frontList is empty groupId:3
2024-11-20 14:52:55.611 [node-mgr-task-12] ERROR NodeStatusMonitorTask(NodeStatusMonitorTask.java:103) - in checkNodeStatusByGroup checkAndUpdateNodeStatus error: []
com.webank.webase.node.mgr.base.exception.NodeMgrException: all front of group: 3 is stopped
at com.webank.webase.node.mgr.front.frontinterface.FrontRestTools.restTemplateExchange(FrontRestTools.java:383) ~[main/:?]
at com.webank.webase.node.mgr.front.frontinterface.FrontRestTools.getForEntity(FrontRestTools.java:343) ~[main/:?]
at com.webank.webase.node.mgr.front.frontinterface.FrontRestTools$$FastClassBySpringCGLIB$$a5c6faad.invoke(<generated>) ~[main/:?]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.26.jar:5.3.26]
at org.springframework.aop.framework.CglibAopProxy.invokeMethod(CglibAopProxy.java:386) ~[spring-aop-5.3.26.jar:5.3.26]
at org.springframework.aop.framework.CglibAopProxy.access$000(CglibAopProxy.java:85) ~[spring-aop-5.3.26.jar:5.3.26]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:704) ~[spring-aop-5.3.26.jar:5.3.26]
at com.webank.webase.node.mgr.front.frontinterface.FrontRestTools$$EnhancerBySpringCGLIB$$e50c39d8.getForEntity(<generated>) ~[main/:?]
at com.webank.webase.node.mgr.front.frontinterface.FrontInterfaceService.getConsensusStatus(FrontInterfaceService.java:411) ~[main/:?]
at com.webank.webase.node.mgr.node.NodeService.getPeerOfConsensusStatus(NodeService.java:327) ~[main/:?]
at com.webank.webase.node.mgr.node.NodeService.checkAndUpdateNodeStatus(NodeService.java:215) ~[main/:?]
at com.webank.webase.node.mgr.node.NodeService$$FastClassBySpringCGLIB$$2a65b731.invoke(<generated>) ~[main/:?]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.3.26.jar:5.3.26]
at org.springframework.aop.framework.CglibAopProxy.invokeMethod(CglibAopProxy.java:386) ~[spring-aop-5.3.26.jar:5.3.26]
at org.springframework.aop.framework.CglibAopProxy.access$000(CglibAopProxy.java:85) ~[spring-aop-5.3.26.jar:5.3.26]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:704) ~[spring-aop-5.3.26.jar:5.3.26]
at com.webank.webase.node.mgr.node.NodeService$$EnhancerBySpringCGLIB$$d27355a2.checkAndUpdateNodeStatus(<generated>) ~[main/:?]
at com.webank.webase.node.mgr.alert.task.NodeStatusMonitorTask.checkNodeStatusByGroup(NodeStatusMonitorTask.java:101) ~[main/:?]
at com.webank.webase.node.mgr.alert.task.NodeStatusMonitorTask.lambda$checkAllNodeStatusForAlert$0(NodeStatusMonitorTask.java:88) ~[main/:?]
- 最后定位为以下代码逻辑导致的问题,当前置节点被临时更新为观察节点,便无法再获取到这个前置节点的信息与区块进行通讯
public class FrontGroupMapCache {
@Transactional(isolation= Isolation.READ_COMMITTED)
public List<FrontGroup> getSealerOrObserverMap() {
MapListParam param = new MapListParam();
param.setType(ConsensusType.SEALER.getValue());
List<FrontGroup> targetMap = null;
targetMap = mapService.getList(param);
log.debug("get sealer map:{} param:{}", targetMap, param);
if (targetMap == null || targetMap.isEmpty()) {
param.setType(ConsensusType.OBSERVER.getValue());
targetMap = mapService.getList(param);
log.debug("get observer map:{} param:{}", targetMap, param);
}
log.debug("getSealerOrObserverMap targetMap:{}", targetMap);
return targetMap;
}
}
- 以上的代码存在逻辑缺陷,当我有群组1,群组2的前置节点为共识节点时,则不会进入if (targetMap == null || targetMap.isEmpty()) 逻辑,群组3的节点信息便拿不到,便无法更新群组3前置节点的区块高度,便群组3的前置节点一直为观察节点,无法自动恢复为共识节点