gpuweb icon indicating copy to clipboard operation
gpuweb copied to clipboard

Profiling copyTextureToTexture() etc.

Open munrocket opened this issue 8 months ago • 3 comments

The only way to profile this GPU<->GPU functions is to attach external profiler, because writeTimestamp() was deprecated as not reliable. Not sure how many people want this, but here list of functions without profiling

copyBufferToBuffer(), copyBufferToTexture(), copyTextureToBuffer(), copyTextureToTexture(), copyExternalImageToTexture(), clearBuffer(), clearTexture().

munrocket avatar Jun 18 '25 08:06 munrocket

Personally I'd look into using native tools for this. It's appears there's an assumption that, if you had writeTimestamp you could time these but that assumption is false.

Example:

encoder.writeTimestamp();
encoder.copyBufferToBuffer(someBuffer);  
encoder.writeTimestamp();
p1 = encoder.beginRenderPass(...);  // does not use some buffer
p1...
p1.end();
p2 = encoder.beginRenderPass(...);  // does use some buffer
p2...
p2.end();

There's no guarantee those writeTimestamp times are timing the copy. The copy can be passed to the GPU/system to run asynchronously, as long as that completes before p2 it's safe.

This is arguably why writeTimestamp was removed.

greggman avatar Jun 18 '25 17:06 greggman

Maybe this approach will give some estimate

p1 = encoder.beginRenderPass({ timestampWrites: ... });
p1...
p1.end();

encoder.copyBufferToBuffer(someBuffer); 

p2 = encoder.beginRenderPass({ timestampWrites: ... });
p2...
p2.end();

<...>

copy_and_p2_binding_time = p2.begin - p1.end; // % BigInt(0xffffffff);

munrocket avatar Jun 19 '25 08:06 munrocket

That might work if both p1 and p2 depend on someBuffer. If p1 does not depend on someBuffer then the copy can happen in parallel with p1. And similarly, if p2 does not depend on someBuffer then p2 can happen in parallel with the copy.

greggman avatar Jun 19 '25 19:06 greggman

While writeTimestamp or similar APIs can seem like they would be able to time individual operations, like @greggman mentioned, the is a lot of overlapping of operations that happens. The number you'd get could be representative of real performance or totally decorrelated with no way of knowing which case you're in. This type of fine-grained profiling is definitely best done with vendor-specific tools that show the occupation of various hardware units.

Kangz avatar Oct 09 '25 07:10 Kangz