Guideline for how to declare variables in a range-based for loop
A question which comes up every now and then is what type of loop variable you should use for range-based for, particularly when no mutation takes place.
// case (1) - cheap to copy
for (int x : container) // disallowed in this form by Con.1, should be 'const int'
// vs
for (const int &x : container)
// case (2) - expensive to copy
for (string x : container) // disallowed in this form by ES.71, should be 'const string &x'
// vs
for (const string &x : container)
I think we should recommend to always declare the loop variable as a reference, never as a value, and here's why:
- The rule is simple and consistent.
- When mutating a
T&variable in the loop, you are already forced to follow it by the language. - If you aren't, and you have case (2), then ES.71 already tells you to.
- Otherwise, if you are following Con.1, then
const T xis only one character away fromconst T &x, so it costs very little effort to follow this rule. - It is possible and relatively easy to enforce with automatic tooling.
- It prevents bugs related to taking the address of the local variable, instead of taking the address of the object within the container.
- Even for weird iterators like
std::ranges::iota_view::iterator, where theirreferenceis actually a value, this method is robust becauseconst&allows for temporary materialization.
In essence, I don't see any benefit to non-reference loop variables, and the alternative is consistent, easy, correct by default, and already recommended in part.
Consider the following issues with using by-value range-for variables:
// Scenario (1): the author does modify 'x' within the loop, so they intentionally make 'x' mutable.
// This conforms with CppCoreGuidelines.
for (int x : container) {
x = ...; // POSSIBLE BUG, did they mean to modify the 'x' within the container?
int y = x;
y = ...; // OK, intent is clear :)
// It is obvious that the author meant to modify a local copy.
}
// Conclusion: the author should have used 'const int' or 'const int&' as a loop variable,
// and a mutable variable in the body.
// Scenario (2): the author takes the address of a local variable in the loop.
for (const int x : container) {
use_address(&x); // POSSIBLE BUG, did they mean to take the address of the 'x' within the container?
const int y = x;
use_address(&y); // OK, intent is clear :)
// It is obvious that the author meant to take the address of a local variable.
}
// Conclusion: the author should have used 'const int&' as a loop variable,
// and a 'const' variable in the body.
In both of these examples, if the author had consistently used const int&, their intent would have been conveyed much more clearly. Both these examples look suspiciously like a bug when working with by-value loop variables.
So i just browsed through the guidelines because I thought I remembered reading some pipeline specifically about for (auto& a : b). I distinctly remember getting confused over which type a would end up with, at one point. It turns out that this construct only turns up in examples. But maybe those can be used as a precedence?
Most of those examples (e.g. P.3, ES.42, ES.55 and ES.71) use either const auto& or auto&. The only example which did not use a reference as in ES.85, and that's a "bad" example.
There are also people who default to for (auto&& a : b) in all cases, to avoid one potential corner case.
related #1011
I was looking through the guidelines and noticed that there wasn't much advice in general about using auto. The closest one is ES.11 which says to use auto but not how. Maybe that rule can be expanded with special cases mentioned for for loops?
I think it will be a difficult rule to write because there are many different coding styles and corner cases. To be consistent maybe T vs const T& should match the guidance in F.16 in params?
Using auto* for pointers is common too. Many static analysis tools flag that (clang's readability-qualified-auto). Think that should be recommended?