openvino icon indicating copy to clipboard operation
openvino copied to clipboard

bugfix for dynamic-shape backedge for static input shape in loop operator intel GPU

Open timxu826 opened this issue 1 year ago • 5 comments

Fixed:

This PR include bugfix for the loop operator when the different shape of input is preformed on different iteration to a loop operation with static input shape.

Cause:

The issued commit introduces a consideration of memory predictor in function named set_memory_in_body_network in case where memory buffer has been over-allocated by shape predictor, memory layout might be unexpected shape. So when handling backedge memory copy for next iteration, memory layout is re-interpreted according to original layout .

But in this senario in TF_Faster_RCNN_Inception_ResNet_v2 , when batchsize is 2, the loop is not unrolled for each iteration deal with one batch, shown in picture below, the broadcast is used to create a array and each iteration write a part of that array. with the set_memory_in_body_network function, the 2nd iteration's input with the generated array is cutted off, which lose the first batch of data. image

Solution:

The bugfix functions in two places, first in graph generation and second in runtime.

in graph generation phrase the shape of input primitive take consider of the shape of from-node's backedge and mark it dynamic according to the from-node's shape. in the runtime, the set_memory_in_body_network will preform according to the shape of both sides of shape and compare with the pre-allocation memory which matching SINGLE_SHARED type.

A testcase is added to test this behavior.

this test case can not be passed on the issued commit( 236e1062b290e2d2345f1d1c319e78f15e0a311d) while can be passed when doing the change in the mentioned PR.

Tickets:

  • CVS-143684

timxu826 avatar Aug 13 '24 12:08 timxu826

1. functional test works after merge

when merging to the master(5a119fb2498f798571d58b0cb21bb8ede8bcf271) branch, the functional test case added above can pass.

2. existing issues on master commit:

but new error occurs in the e2e test for Faster_RCNN

in FP32 test

status = comparators.report_statuses()
> assert status, "inferred model results != reference results"
E AssertionError: inferred model results != reference results
E assert False

test.py:234: AssertionError

in FP16 test

def apply(self, data):
"""Parse object detection data."""
predictions = {}
postprocessed = False
target_layers = self.target_layers if self.target_layers else data.keys()
dict_keys = ['class', 'prob', 'xmin', 'ymin', 'xmax', 'ymax']
for layer in target_layers:
predictions[layer] = []
layer_data = np.squeeze(data[layer])

1 detection leads to 0-d array after squeeze, which is not iterable
if layer_data.ndim == 1:
layer_data = np.expand_dims(layer_data, axis=0)
assert len(layer_data.shape) <= 2, "Wrong data for postprocessing! Data length must be equal 2."
for obj in layer_data:
if type(obj) == np.float64:
log.debug(f" {obj}
has type np.float64")
break
> elif obj[0] == -1:
E IndexError: index 0 is out of bounds for axis 0 with size 0

../utils/e2e/postprocessors/object_detection.py:63: IndexError

3. e2e test on release/2024.3

Some other test on other commits is runThe FP32 test can be passed in FP32 test in release/2024.3 but FP16 test failed with accuracy check in FP16 in release 2024.3

timxu826 avatar Sep 03 '24 07:09 timxu826

Code is cleaned with the featured e2e test pass and testcase passed.

timxu826 avatar Sep 04 '24 16:09 timxu826

This PR will be closed in a week because of 2 weeks of no activity.

github-actions[bot] avatar Oct 06 '24 00:10 github-actions[bot]

Plz don't close the PR because a merge to latest master including new bug fix is in progress

timxu826 avatar Oct 08 '24 07:10 timxu826

@timxu826 , please rebase code

ahnyoung-paul avatar Oct 21 '24 04:10 ahnyoung-paul

@timxu826 , please rebase code

Sorry for the late reply, will do ASAP thanks.

timxu826 avatar Oct 21 '24 07:10 timxu826

branch has been updated

timxu826 avatar Oct 21 '24 11:10 timxu826

This PR will be closed in a week because of 2 weeks of no activity.

github-actions[bot] avatar Nov 13 '24 00:11 github-actions[bot]

image

The func test will fail because when b_data_braodcast is set to static shape(with a scalar), the transformation pipeline will optimized the input of b_mul2 (multiply op) to a scalar, which will truncate the changed shape (a vector) and produce a wrong result. insert a Reshape op to force the multiply op to a broadcasted shape to bypass the optimization(and draw a diagram to illstrate the flow). and now the test can pass.

timxu826 avatar Nov 20 '24 13:11 timxu826

build_jenkins

vladimir-paramuzov avatar Nov 22 '24 06:11 vladimir-paramuzov