bumblebee icon indicating copy to clipboard operation
bumblebee copied to clipboard

First author sort and unicode leads to unexpected results

Open aaccomazzi opened this issue 6 years ago • 0 comments

Consider the following query: author:"simek, m" year:1985-1987, which yields 16 records many authored by Šimek, M.. If we sort by first author, we get an unexpected list: https://ui.adsabs.harvard.edu/search/p_=0&q=author%3A%22simek%2C%20m%22%20year%3A1985-1987&sort=first_author%20desc%2C%20bibcode%20desc

As you can see, the first papers in the list are authored by Šimek, M., followed by Znojil, V., followed by Simek, M. Since these are unicode strings, the sorting follows the proper unicode collation sequence, but from a user perspective it feels unnatural (one would expect the Šimek, M. and Simek, M. to be bunched together).

We could accomplish this by switching the sort from first_author to first_author_norm which is an ascii transliteration of the first_author field. Should we? And if not, why?

aaccomazzi avatar Jul 02 '19 15:07 aaccomazzi