flink icon indicating copy to clipboard operation
flink copied to clipboard

[FLINK-28817] NullPointerException in HybridSource when restoring from checkpoint

Open zhongqishang opened this issue 3 years ago • 2 comments

What is the purpose of the change

This pull request fix NullPointerException in HybridSource when restoring from checkpoint.

After the recovery action is triggered, only the source with sourceIndex = 1 is loaded in the switchedSources in the HybridSourceSplitEnumerator

For the new SourceReaderFinishedEvent that triggers the load of a new source, the default is to get the source with sourceIndex = 0, which triggers the NPE.

Please correct me if there is a mistake.

Brief change log

For the new SourceReaderFinishedEvent, get an available source.

Verifying this change

This change added tests and can be verified as follows:

  • Added testRestoreEnumeratorWith2ndSource in HybridSourceSplitEnumeratorTest

Does this pull request potentially affect one of the following parts:

  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes)

Documentation

  • Does this pull request introduce a new feature? (no)

zhongqishang avatar Aug 10 '22 07:08 zhongqishang

CI report:

  • 243e28f8ed3fffcce9eeb4ab5dda38addac59320 Azure: SUCCESS
Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

flinkbot avatar Aug 10 '22 07:08 flinkbot

@zhongqishang thanks for the PR, I'm going to take a look soon.

tweise avatar Aug 11 '22 01:08 tweise

@tweise Thanks for your review. I have addressed all your comments.

zhongqishang avatar Aug 12 '22 02:08 zhongqishang

@flinkbot run azure

zhongqishang avatar Aug 12 '22 02:08 zhongqishang

@flinkbot run azure

zhongqishang avatar Aug 12 '22 08:08 zhongqishang

@zhongqishang, could you please add test which refers to HybridSourceTest to verify whether to fix the bug of FLINK-26938?

SteNicholas avatar Aug 12 '22 09:08 SteNicholas

@SteNicholas, In fact I encountered the situation described in FLINK-26938, I will try to add a test case as soon as possible.

zhongqishang avatar Aug 12 '22 10:08 zhongqishang

@zhongqishang, it's better to add the test case for the situation described in FLINK-26938. cc @tweise

SteNicholas avatar Aug 12 '22 10:08 SteNicholas

@zhongqishang, it's better to add the test case for the situation described in FLINK-26938. cc @tweise

Let's address that in a separate PR, since it's a separate JIRA also.

tweise avatar Aug 12 '22 16:08 tweise

@zhongqishang thanks for the contribution! Can you please open a backport PR for release-1.15 also?

tweise avatar Aug 12 '22 23:08 tweise