Incorrect (or at least inconsistent) T cost for a TwoBitCSwap gate.
To summarize a discussion I had with @tanujkhattar, @mpharrigan, and @NoureldinYosri:
We currently count the T and Clifford costs for a TwoBitCSwap (Fredkin) gate using the construction from https://arxiv.org/pdf/1308.4134. However, we can implement this gate using two CNOT gates and a Toffoli gate, which has a four T-gate construction using ancilla qubits (https://arxiv.org/pdf/1212.5069).
I think we should default to the ancilla-assisted construction for the Fredkin gate.
The simplest solution might be to no longer treat a Fredkin gate as a leaf node at all and instead decompose it into a Toffoli gate and two CNOT gates. If not, then we should at least be consistent in how we treat the resource counts for Fredkin and Toffoli gates.
This is connected to #1121, which is related to the observation that we can do compute / uncompute pairs for Toffoli and Fredkin gates more cheaply than a naive implementation.
As a first step, I'd recommend writing a simple circuit as a bespoke method like
https://github.com/quantumlib/Qualtran/blob/b3000cb3a4f10f99d260f487c21792cc1b9a0ab1/qualtran/bloqs/basic_gates/swap.py#L156
and updating the references section to highlight the disparity.
As a leaf bloq, we can change the T cost of a cswap on the fly. See the ts_per_cswap argument to GateCounts.total_t_count and GateCounts.to_legacy_t_complexity. Those default to 7. It can be changed to 4. Although maybe the "legacy" method should keep using 7 since it's been that way for a while