Results 6 comments of leasunhy

Here you are. It's a bit large... [profiler_log.zip](https://github.com/tensorflow/profiler/files/4509046/profiler_log.zip) Thanks!

I optimized my data pipeline and the kernel launch time dropped significantly, so my training code now does not suffer from performance issues anymore. Thanks for your explanation @ckluk !...

@eliasdorneles Thank you for your review! ;-) I know what you mean, but the issue in #363 is about "inlining selection logic between an operator and its reflected version", so...

Confirmed here. This is so because in Java `cstu` inherits from `org.python.types.Object`, not from `stu`, and the `*` operator calls `__mul__` on that `cstu` object, which is resolved to `org.python.types.Object.__mul__`....

I'm very interested in this task. It would be great should I be assigned this task to.

Same here. Please kindly use the tensors [here](https://github.com/leasunhy/fa2-curious-case) to reproduce the issue. * 2.5.8: invalid memory access * 2.4.3.post1: run without issue