oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

Working with the Reorder

Open CodeCrafter18 opened this issue 1 year ago • 9 comments

Hello! I have a question regarding the usage of reorders. For example, let's consider ref_sum.hpp.

if (need_output_reorder()) {
    CHECK(reorder_primitive_desc_create(
		reorder_pds_[n_], engine, dst_acc_md(), dst_md()));
}

In this case, the following function will be called:

status_t reorder_primitive_desc_create(std::shared_ptr<primitive_desc_t> &pd,
        engine_t *engine, const memory_desc_t *src_md,
        const memory_desc_t *dst_md, const primitive_attr_t *attr) {
    return reorder_primitive_desc_create(
            pd, engine, src_md, engine, dst_md, engine, attr);
}

However, in this scenario, create_resource(...) is never called. If I want to write my own Reorder that overrides the create_resource(...) function, as it's done in ACL, then I'll get a runtime error when I call ctx.get_resource_mapper()...

How can I work around this problem if my implementation uses resource_mapper?

Also, I have a question about why the Sum, Reorder, and Concat operations have separate logic, for example, for obtaining the list of implementations?​​​​​​​​​​​​​​​​

Thanks.

CodeCrafter18 avatar Jun 27 '24 14:06 CodeCrafter18

Hi @CodeCrafter18,

First of all, primitive descriptor has nothing to do with create_resource(...) and resource_mapper. It is the primitive that utilizes those. The primitive_t::create_resource is always called at the primitive initialization stage. So you just need to override create_resource(...) in your primitive and then use the resource_mapper from the execution context during the execution step.

densamoilov avatar Jul 02 '24 19:07 densamoilov

Hi @densamoilov,

As I pointed out above in a specific example (ref_sum). The “create” function will be called, but the overridden “create_resource” is not. As a result, the “resource_mapper” is empty at the execution stage. One way to get around this is to create an instance again in "execute" in the case of an “empty” for “resource_mapper”, but this looks like some kind of “crutch”.

CodeCrafter18 avatar Jul 03 '24 03:07 CodeCrafter18

@CodeCrafter18, in the example above you mentioned primitive descriptor and not the primitive, that's why I'm still confused. One thing to keep in mind is that create_resource is ALWAYS called from the main primitive. If the main primitive has nested primitives then it is the main primitive's responsibility to call create_resource from the nested primitives. Another thing to keep in mind is that the resource mapper in the execution context object (exec_ctx_t) that is passed to the execute function of the main primitive should be passed down to the nested primitives. In order to do that you need to create a new execution context for the nested primitives using the execution context for the main primitive: https://github.com/oneapi-src/oneDNN/blob/8d457a8ce898a032183ace8c541a70252351d8e2/src/gpu/generic/ref_sum.hpp#L171

densamoilov avatar Jul 03 '24 07:07 densamoilov

Let's do it again.

We are looking at a specific example: (oneDNN/src/cpu/ref_sum.hpp)

The init(...) function line 55-58:

if (need_output_reorder()) {
    CHECK(reorder_primitive_desc_create(
		reorder_pds_[n_], engine, dst_acc_md(), dst_md()));
}

In this case, the Reorder is selected and created from the general list. My implementation uses create_resource. But in this test, I never call create_resource. Therefore, at the execute stage, I encounter an error.

For regular reorder launches, for example, via benchen, there is no such problem.

CodeCrafter18 avatar Jul 03 '24 07:07 CodeCrafter18

If you want to create a resource for your reorder implementation that is used in ref_sum.hpp then you will need to add ref_sum_t::create_resource(…) that would call create_resource from each element of the reorders_ vector.

densamoilov avatar Jul 03 '24 09:07 densamoilov

Well, thank you. Why hasn't it been done already? Now the ACL has its own implementation for the Reorder operation and can be called. If it's not difficult, could you provide a modification for test_sum from ctest?

And what about the second question, why is there different logic for Reorder Sum and Concat?

CodeCrafter18 avatar Jul 03 '24 09:07 CodeCrafter18

Why hasn't it been done already?

It's a bug in the case when ACL reorder is available. The ref_sum_t class needs to have this: https://github.com/oneapi-src/oneDNN/blob/8d457a8ce898a032183ace8c541a70252351d8e2/src/cpu/ref_deconvolution.hpp#L407-L413

+@jondea

densamoilov avatar Jul 03 '24 17:07 densamoilov

It looks like simple_layer_normalization has the same problem.

CodeCrafter18 avatar Jul 04 '24 10:07 CodeCrafter18

I've made an issue to look into this. We also have work in progress to remove the use for create_resource for the ACL implementations, which I think should fix this issue.

jondea avatar Jul 05 '24 12:07 jondea

Closed by https://github.com/oneapi-src/oneDNN/pull/2064

theComputeKid avatar Sep 04 '24 14:09 theComputeKid