Consider an alternative strategy for media that does not rely on media elements directly
To allow range responses (206 HTTP status code) ORB remembers URLs of earlier responses that sniffed as audio/video. This works under an assumption that a middle-of-resource range responses are always preceded by a first-bytes response.
I am opening this bug as a heads-up, to point out that we've seen telemetry reports from Chrome that indicate that our earlier implementation of ORB blocked some 206 responses with video/mp4 MIME type. The only explanation I can think of for such reports, is that they represent middle-of-resource range response that hasn't been marked earlier as an allowed URL. The nature of the telemetry makes it difficult to quantify these reports - they were relatively low in volume, but non-zero (and sufficiently high to show up in our reports - this was the top case where ORB blocked a response and CORB didn't).
For now, I'll tweak Chrome's ORB implementation to allow any response with an audio/video MIME type. I understand that this is undesirable in the long-term (given the discussion in https://github.com/annevk/orb/issues/3#issuecomment-974334651), but in the short-term I want to minimize the risk of shipping ORB v0.1 in Chrome.
In principle I suppose a server could reply to a range 0- request with a non-first-byte response and only give more data upon subsequent requests. If we can enforce the MIME type for such responses it might not be too bad however.
The current hypothesis for the observed data is that there are 2 same-origin frames, each with a <video> element pointing to the same video file. Each of the 2 frames will have a separate URLLoaderFactory where ORB state is stored (e.g. storing a set of safe URLs that have been previously sniffed as video files). OTOH, the 2 frames will share some network caches (since the frames are same-origin + part of the same page). Given this, the following sequence of steps can cause one ORB instance to miss the request for the first few bytes of the video:
- Frame1 requests first bytes of the video resource.
- This populates stores the video URL in the 1st instance of ORB
- This also populates network caches (some of which are checked before going through
URLLoaderFactory)
- Frame2 requests first bytes of the video resource.
- This request is fullfilled from the network cache, without going through
URLLoaderFactoryand/or the 2nd instance of ORB
- This request is fullfilled from the network cache, without going through
- Frame2 requests middle bytes of the video resource.
- This request goes through
URLLoaderFactoryand the 2nd instance of ORB. - ORB will block this request because the 2nd instance of ORB didn't store the video URL as safe. We are guessing that this is the step that Chromium's telemetry would report as a 206,
video/mp4response that gets blocked by ORB
- This request goes through
I see, that does seem tricky to resolve. The safelisted URLs would have to be cached somehow in the ORB layer. And perhaps you can "refcount" the media elements by incrementing and decrementing a number in that cache to ensure cleanup. (This essentially goes back to our earlier design.)
(We could make ORB operate on HTTP cached resources as well, but I suppose the "memory cache" might come into play here as well at which point that would not be a solution as that's in the same process as the attacker.)
cc @farre
@anforowicz what do you think about the idea? Is that in line with the changes you were thinking of making here?
I haven't really thought much about what can be implemented beyond ORB v0,1 in Chrome.
In terms of the spec, we can either:
- Ignore this problem as an implementation detail (e.g. ignore Spectre and/or compromised content/renderer processes and just say that the media element keeps track of its requests - this is what we've discussed earlier in https://github.com/annevk/orb/pull/16#issuecomment-934264249).
- Suggest that there each NIK has only a single store of safe/validated media URLs. I am not really sure how feasible this is implementation-wise (in Chrome, or in Firefox).
@anforowicz I'm confused, doesn't this issue suggest we cannot use the first of these options?
doesn't this issue suggest we cannot use the first of these options?
If we trust the medial element to correctly report initial-VS-subsequent media request state, then everything seems to work fine. In particular, I don't see any step in the current ORB algorithm that would misbehave just because a certain request was missed by ORB (e.g. because it was fulfilled from a cache).
Missing the initial media request seems only to be an issue if an ORB implementation doesn't trust the media element and wants to verify itself the initial-VS-subsequent media request state (e.g. by tracking URLs that sniffed as media when their first bytes were requested).
Maybe I am missing something?
Yeah, I think you're right. Thanks!