[Bug]: Documentation typo in rtables-scoring-functions section of pages
What happened?
Hello, the package documentation includes a fairly confusing typo which describes Scoring functions as table 'sorting' functions.
The typo occurs in the following section of Tern documentation: https://insightsengineering.github.io/tern/latest-tag/reference/index.html#rtables-scoring-functions.
I would be happy to resolve the issue myself, however, I do not know how to directly contribute to the code base.
sessionInfo()
No response
Relevant log output
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct.
Contribution Guidelines
- [X] I agree to follow this project's Contribution Guidelines.
Security Policy
- [X] I agree to follow this project's Security Policy.
sorting happens on the base of extracted values (may it be simply the cell values or something more complex) that I guess were termed scores to imply there are some other possible sorting statistics. If you look at a simple one:
> score_occurrences
function(table_row) {
row_counts <- h_row_counts(table_row)
sum(row_counts)
}
this extracts all values from a row (that you can reach with the right path in sort_at_path) and sum over them. This is the score, i.e. the row sum.
In the following section there is more info on how to create your own scoring function and sorting: https://insightsengineering.github.io/rtables/main/articles/sorting_pruning.html#sorting
Note that renaming this set of functions as sorting functions instead of scoring functions would be not precise as the scoring functions only extract values to be sorted. I am open to other suggestions though
Thanks, I had a closer look at the page and example you provided as well as the reference documentation for score_occurrences(). I would like to propose some changes to function naming, documentation, or both for your consideration. I am speaking at a user level rather than a technical level. However, I can appreciate the technical side of the implementation and I think I understand why it is named like this.
The way I found it confusing as a user reading the documentation is that:
- on face value, the term score implies that the function computes some score when given input. So, it might be like
z-scoresor somepsychometric scores; - as I was looking for ways to sort a table, I looked for terms such as
order,sort, orarrangespeaking purely from a user perspective; - if I would like to implement my custom order, I would use words such as rank, order, sort, arrange, reorder, or similar alongside the words such as algorithm, helper or custom, however, the term score is less common in my view;
- The way these functions are documented is that they describe internal behaviour rather than the observable outcome and I think this may confuse the user
Overall, I view these as the functions used as arguments in either fct_relevel() or fct_reorder() to borrow an example from forcats package. In these cases, such functions are themed as helpers for a particular parent function.
Based on the above, I suggest changing the documentation but ideally also the function naming:
-
My suggestion for documentation would be to state something along the lines:
These are helpers forsort_at_pathto customise the order in a TableTree.To make it explicit these are meant for this particular parent function. Likelast2()andfirst2()are helpers forfct_reorder2(). -
In terms of naming convention, which is a larger change. I would expect these to be used in
sort_at_path(), however, as a user I am looking to implement an algorithm for sorting, arranging, or ordering. Scoring is (in my view) a little misleading term as stated above.
So, for example:
sort_at_path(
tt,
path,
~~scorefun~~ > orderfun, sortfun, sort_algorithm, order_algorithm, sorting_stats, stats_sort, .f, .fn, fun, etc.?
decreasing = NA,
na.pos = c("omit", "last", "first"),
.prev_path = character()
)
Then perhaps the functions themselves could be classed in a way that omits the word score which is what I found confusing. There are many ways around this, e.g. using count_ or calculate_ or other more generic verbs like get_.
However, even just improving documentation for reference slightly would help. At the moment the user has to dig a bit too deep to be used inside of sort_at_path.
Hopefully, I understood this well and my comment isn't based on wrong assumptions here. I apologise for using the word typo earlier, that's not the case here. I just saw scoring and thought it was meant to be sorting.
We will update the docs to reflect this. It was already updated a couple of months ago as it was really unclear how to get it to work with dataRows. For the naming, I am wondering if sort function is better as it invokes the right associations. What do you think @shajoezhu @ayogasekaram @edelarua ? another non-mutually-exclusive possibility would be to have a couple of wrapped sorting functions that only sort the last node of the tree for each column or all (e.g. sort_table). Would it help too @martincadek?
@Melkiades, I think that updating the documentation will help.
I am not sure if I understood the second option. I suppose that from a user perspective using the function sort_table for simple sorting operations and sort_at_path (or similar) with a customised sorting algorithm would work well. Assuming I understood the suggestion properly :)