Confusion surrounding use of the word "aperture"
So, I have a question. How do we want to deal with "aperture"?
In the book as is, the aperture is the diameter of the lens element (the virtual one, anyway). In photography, the "aperture stop" or "f-stop" is (focal_length/entrance_pupil) (Where the entrance_pupil is the diameter of the lens element as perceived when looking through the lens. This is not the same as just the diameter of the aperture). Unfortunately, the aperture-stop is basically never referred to as the "aperture-stop", and is colloquially shortened to just "aperture".
And so we have an overloading of a term here.
Photographers' use of the word "aperture" here is (((and I'm a photographer!!!)))... Stupid. The term "aperture" correlates with an inverse of the lens aperture ::face palm:: Whereas, we use the word correctly, we're using the word "aperture" to mean... the diameter of the aperture.
So, for photographers, an "increase" in "aperture" is associated with lesser defocus blur. Whereas in the book an "increase" in "aperture" is associated with greater defocus blur.
(So, for photographers, an "increase" in "aperture" is associated with greater depth-of-field. Whereas in the book an "increase" in "aperture" is associated with lesser depth-of-field.)
And, then, there is the matter of the utility of an f-stop. Any decent photographer will be able to map what a specific f-stop looks like. By scaling with focal length, the defocus blur of any lens will be largely determined by f-stop. All f1.4 lens will have similar defocus blur (same w f2.0 and f2.8, and ....)
And, ultimately, most photographers have a patchwork of wrong understandings on why these things work the way they do (see entrance_pupil above). So it wouldn't hurt to tread lightly here. I don't know if it's a good idea to have big rewrites for any photographers reading the books (and this may have only collided for me because of my exp in photography) but if a reader wants to continue down the path of filmic RT, then they're going to have to eventually understand f-stop and focal length as defined by photography (a good understanding of T-Stop might also be good). I could probably write a ~500 word appendix converting the lens to 35mm sensor equivalent.
I go back and forth with formalizing our rendering setup through a camera class (and possibly a renderer class). In the end, I keep abandoning this as it feels like a net increase in complexity, and because we're aiming to illustrate a minimal -- rather than good -- raytracer. Leaving some meat on the bone, as it were.
That said, if we had a camera class, then we open the door to an addendum to the series on constructing a photographic camera, which would be pretty cool. That would just need to swap out the camera type.
Similarly, if we bundled up the renderer, then we could open the door to a drop in replacement with a different sampling mechanism. This one's a bit tougher, because sampling is really coded deeply into materials and other classes.
This is also all related to adding command-line parameterization, which would make my life a lot easier, but again adds a ton of code that is not raytracing.
These three areas all exist on a continuum, so we can be deliberate in any changes we make. We just need to keep the overall goal of simplicity and minimalism in mind.
That said, if we had a camera class, then we open the door to an addendum to the series on constructing a photographic camera, which would be pretty cool. That would just need to swap out the
cameratype.
I did this when I went through the tutorial, so I've created camera classes for 3D (side-by-side) photo renders as well as a couple of different 360 VR renderers (with and without 3D, support for 3D in 360 VR when you can turn around was a fun problem to solve). And it's pretty nice to be able to do. I'd love to also have camera implementations for fish-eye, 35 mm film equivalent etc etc so I can mimic different things, but both my free time and my knowledge of cameras are limited so I've had to put that on ice :)
Wow, this one's been simmering for a while. Now we do have a camera class. That said, this feels to me like just another example of why I don't like nomenclature like "camera" and "eye" for computer graphics. My own pet peeve is that people conflate the projection point with the "eye" or "the position of the viewer", neither of which is correct, and both of which lead to bad models and associated errors. For example, you're much better off thinking of the image plane as the camera/viewer location, with the projection point moving around freely as a purely geometric point to construct a particular projection. Indeed, an orthographic "camera" has a definite location, but no "eye point" or projection point.
Anyway, I find the issue with the term "aperture" to be both fussy and completely relatable. I think that I'll keep the term "camera", as it's a useful term that clearly communicates the purpose of the class. However, I'd be happy to adopt other terms where confusion is possible.
So:
- ray origin disk?
- ray origin diameter?
- projection point disk?
- defocus / defocus diameter?
- blur / blur diameter?
Writing these down, I like the non-geometric terms, such as blur or defocus. The geometric construct itself is then limited to the few lines of implementing code.
How about something like angular resolution?
"Angular resolution" to me means something like Apple's "retina" naming — the angular field of a single pixel. It's quite common to specify resolution (the ability to distinguish two points or two lines) across an angular field of view. In this case, however, we're talking about the disk from which rays emanate. There's no real "resolving" going on here.
Hm, yes... It's more like a single sensor element's physical area.
Actually, the "single sensor element's physical area" corresponds to the pixel sample area: the random points around the pixel location through which we fire rays.
In this case, we're using "aperture" to talk about the diameter of the disk around the ray origin, which gives us defocus blur. For each ray in the scene, rays begin at some random location on a disk around the projection point, and fire through a random location in a box or disk around the pixel location — there are two random regions we're sample to construct a simple ray, further compounding the opportunity for confusion.
Ah, yes of course, pardon my confusion.
The book mentions the "thin lens approximation". In this model, the ray originating disk could be named the "aperture", as early text says that the aperture effectively alters the size of the lens. Or, you could just say that the flat disk is the thin/flat lens.
The current code refers to defocus angle, defocus disk, and defocus radius. So we have the following possibilities for naming:
- defocus angle / aperture angle / lens angle
- defocus disk / aperture / lens
- defocus radius / aperture radius / lens radius
For now, I'm proceeding with "defocus". I don't like "aperture", since that's an opening through which things pass, which doesn't match our model. I don't like "lens", since that's all about bending light rays, which we don't do in camera.
Just noticed that our images name this the "lens" instead of the "aperture". We should change either the figures or the code to match.
Additionally, we flip from talking about the viewport to talking about the "focus plane" / "image plane", without mentioning that the viewport is placed on this plane. Will fix that.