nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

add docs about methods of GroupKey objects

Open bguo068 opened this issue 2 years ago • 1 comments

Bug report

This is not a bug report per se. It is mainly a request to add more docs about the GroupKey object.

Expected behavior and actual behavior

Expected behavior: the key created by calling groupKey (value, size) would be better to be converted back to the value itself after being used in the groupTuple operator. If not, it would be helpful to remind the user that the key after calling the groupTuple operator is an object of a special class GroupKey and document how to get the original value back from the GroupKey object.

Actual behavior: the key created by calling groupKey (value, size) is not converted back to the value itself after being used in the groupTuple operator. There is no documentation about the type of the key after calling the groupTuple operator. It takes me a while to realize the key is a special class GroupKey (when the grouped channel is combined with a channel with a by parameter it won't match with value). I have to look at the source code https://github.com/nextflow-io/nextflow/blob/master/modules/nextflow/src/main/groovy/nextflow/extension/GroupKey.groovy to figure out how to get the original value from the GroupKey object.

Steps to reproduce the problem

a = Channel.fromList([
    ["k1", 1],
    ["k1", 2],
    ["k1", 3],
])

// Not calling getGroupTarget() on groupkey
a_grp1 = a.map{key, value -> [groupKey(key, 3), value]}
        .groupTuple(sort: (a, b) -> a <=> b)
        .map{key, value -> [key, value, "additional info with a"]}

// Calling getGroupTarget() on groupkey
a_grp2 = a.map{key, value -> [groupKey(key, 3), value]}
        .groupTuple(sort: (a, b) -> a <=> b)
        .map{key, value -> [key.getGroupTarget(), value, "additional info with a"]}

b = Channel.fromList([
    ["k1","K1","A"],
    ["k1","K1","B"],
    ["k1","K1","C"],
    ["k1","K2","D"],
    ["k1","K2","E"],
    ["k1","K2","F"],
])

b_grp = b.map{key, KEY, value -> [groupKey([key,KEY], 3), value]}
        .groupTuple(sort: (a, b) -> a <=> b)
        .map{key, value -> [key[0], key[1], value, "additional info with b"]}

// does not work, get 0 item(s)
a_grp1.combine(b_grp, by:0).count().view{"Not calling getGroupTarget() on groupkey: get ${it} item(s)."}

// works, get 2 items
a_grp2.combine(b_grp, by:0).count().view{"Calling getGroupTarget() on groupkey:     get ${it} item(s)."}

// print types
a_grp1.take(1).view{"a_grp1, key type is ${it[0].getClass()}"}
a_grp2.take(1).view{"a_grp2, key type is ${it[0].getClass()}"}
b_grp.take(1).view{"b_grp,  key type is ${it[0].getClass()}"}

Program output

N E X T F L O W  ~  version 23.10.0
Launching `main.nf` [tender_mccarthy] DSL2 - revision: 1448229aa2
a_grp1, key type is class nextflow.extension.GroupKey
a_grp2, key type is class java.lang.String
b_grp,  key type is class java.lang.String
Not calling getGroupTarget() on groupkey: get 0 item(s).
Calling getGroupTarget() on groupkey:     get 2 item(s).

Environment

  • Nextflow version: 23.10.0
  • Java version: java 17.0.7 2023-04-18 LTS; Java(TM) SE Runtime Environment (build 17.0.7+8-LTS-224); Java HotSpot(TM) 64-Bit Server VM (build 17.0.7+8-LTS-224, mixed mode, sharing)
  • Operating system: cpe:/o:redhat:enterprise_linux:8.3:ga
  • Bash version: GNU bash, version 4.4.19(1)-release (x86_64-redhat-linux-gnu)

Additional context

When b channel is grouped by a one-element key, it will work.

bguo068 avatar Jan 07 '24 22:01 bguo068

FWIW - unwrapping" the groupKey after use matches my intuition. It would also match the groupBy semantics in R's tidyverse, for example.

robsyme avatar Jan 07 '24 23:01 robsyme