storm icon indicating copy to clipboard operation
storm copied to clipboard

[STORM-3713] fix race-condition by applying submitLock to leaderCallBack

Open RuiLi8080 opened this issue 5 years ago • 0 comments

What is the purpose of the change

Adding submitLock to leaderCallBack to avoid race-condition.

How was the change tested

First, we reproduce the NPE exception by adding 60s sleep right before this step. https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L222

When the sleep starts, we restart zookeeper to trigger leader-re-election and kill the test topo.

This lock can prevent the race-condition even with the 60s sleep. Look at the 60s gap on timestamp. Nimbus log:

2020-11-17 06:24:25.114 o.a.s.c.StormClusterStateImpl main-EventThread [INFO] syncRemoteAssignments sleeps for 60s
2020-11-17 06:24:36.126 o.a.s.d.n.Nimbus pool-34-thread-28 [INFO] TRANSITION: wc-1-1605594107 KILL null true
... 60s sleep ...
2020-11-17 06:25:26.704 o.a.s.d.n.Nimbus timer [INFO] TRANSITION: wc-1-1605594107 GAIN_LEADERSHIP null false
2020-11-17 06:25:26.742 o.a.s.d.n.Nimbus timer [INFO] Delaying event REMOVE for 30 secs for wc-1-1605594107
2020-11-17 06:25:55.149 o.a.s.d.n.Nimbus timer [INFO] TRANSITION: wc-1-1605594107 REMOVE null false
2020-11-17 06:25:55.154 o.a.s.d.n.Nimbus timer [INFO] Killing topology: wc-1-1605594107

Client console log:

-bash-4.2$ storm kill wc
Running: /home/y/share/yjava_jdk/java/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/home/y/lib64/storm/2.3.0.y -Dstorm.log.dir=/home/y/lib64/storm/2.3.0.y/logs -Djava.library.path=/home/y/lib64:/usr/local/lib64:/usr/lib64:/lib64: -Dstorm.conf.file= -cp /home/y/lib64/storm/2.3.0.y/*:/home/y/lib64/storm/2.3.0.y/lib/*:/home/y/lib64/storm/2.3.0.y/extlib/*:/home/y/lib64/storm/2.3.0.y/extlib-daemon/*:/home/y/lib64/storm/current/conf:/home/y/lib64/storm/2.3.0.y/bin org.apache.storm.command.KillTopology wc
06:24:35.567 [main] INFO  o.a.s.v.ConfigValidation - Will use [class org.apache.storm.DaemonConfig, class org.apache.storm.Config] for validation
06:24:35.715 [main] WARN  o.a.s.v.ConfigValidation - Field public static final java.lang.String org.apache.storm.DaemonConfig.STORM_RESOURCE_ISOLATION_PLUGIN does not have validator annotation
06:24:35.726 [main] WARN  o.a.s.v.ConfigValidation - topology.backpressure.enable is a deprecated config please see class org.apache.storm.Config.TOPOLOGY_BACKPRESSURE_ENABLE for more information.
06:24:35.868 [main] INFO  o.a.s.m.n.Login - Successfully logged in to context StormClient using /etc/grid-keytabs/jaas.conf
06:24:35.871 [Refresh-TGT] INFO  o.a.s.m.n.Login - TGT refresh thread started.
06:24:35.897 [Refresh-TGT] INFO  o.a.s.m.n.Login - TGT valid starting at:        Tue Nov 17 05:56:26 UTC 2020
06:24:35.897 [Refresh-TGT] INFO  o.a.s.m.n.Login - TGT expires:                  Wed Nov 18 05:56:26 UTC 2020
06:24:35.898 [Refresh-TGT] INFO  o.a.s.m.n.Login - TGT refresh sleeping until: Wed Nov 18 02:13:43 UTC 2020
06:24:36.077 [main] INFO  o.a.s.u.NimbusClient - Found leader nimbus : openstorm3blue-n4.blue.ygrid.yahoo.com:50560
... 60s sleep ...
06:25:25.181 [main] INFO  o.a.s.c.KillTopology - Killed topology: wc

RuiLi8080 avatar Nov 17 '20 06:11 RuiLi8080