itertools icon indicating copy to clipboard operation
itertools copied to clipboard

Integrate `joinable`?

Open aeshirey opened this issue 3 years ago • 2 comments

I wrote a small crate, joinable to do relational joins between two iterables. Someone suggested this may be a good addition to itertools, so I'd like to ask if this is functionality you'd like me to roll in?

Example usage:

use joinable::Joinable;
let joined = customers
    .iter()
    .inner_join(&orders[..], |cust, ord| cust.id.cmp(&ord.customer_id))
    .map(|(cust, ords)| {
        // Translate from (&Customer, Vec<&Order>)
        (
            &cust.name,
            ords.iter().map(|ord| ord.amount_usd).sum::<f32>(),
        )
    })
    .collect::<Vec<_>>();

aeshirey avatar Apr 08 '22 02:04 aeshirey

Hmm, Itertools has similar functionality, but it groups it differently.

For example, it has merge_join_by(...).filter_map(EitherOrBoth::both), which is essentially an inner join, albeit one that cares about cardinality differently from how SQL would do it, in order to avoid cloning, and needs ordered input for efficiency. (But it looks like your inner_join isn't quite what I'd expect from SQL either, since a SQL inner join would give an Iterator<Item = (&Customer, &Order)> -- it giving a Vec means it's also doing a GROUP BY.)

And it has .into_grouping_map().sum() for the "gather everything with a key and sum the values".

Maybe some more examples would help? But the RHS always needing to be a slice doesn't say itertools to me...

scottmcm avatar Apr 08 '22 04:04 scottmcm

it giving a Vec means it's also doing a GROUP BY.

Good point. This was desired behavior for my use case but may not be for others. At a minimum, I'll use this as feedback for improving my crate.

But the RHS always needing to be a slice doesn't say itertools to me...

True. My intent was that LHS is an iter and can join to any RHS slice without consuming it because each right record might match multiple left records, and the ordering of RHS isn't necessarily known or required.

I'm happy to provide some more examples if it helps, but if the behavior (currently, grouped records from RHS; using an iter + slice instead of two iters) doesn't fit here, that's fine too.

aeshirey avatar Apr 08 '22 13:04 aeshirey