Where are the missing language pairs?
There seem to be 417 language varieties represented in https://opus.nlpl.eu/JW300.php. This would imply 417C2 = 86,736 undirected language pairs. However, I only count 54,376 of them, and the paper confirms this number. Do you know where the missing 32,360 language pairs are, and would you be willing to provide them?
I notice that the adjacency matrix seems to have only one fully connected component, so e.g. although ady has no parallel data with en, it has parallel data with "jw_rmv", which has parallel data with en. So it seems likely that ady and en can be aligned. Just to demonstrate that it's conceptually possible, I found these two pairs in the respective corpora:
jw_rmv: Пала со амэ подаса дума андэ авэр статья ? ady: Сыда къыкІэлъыкІорэ статьям щызэхэтфыщтыр ?
jw_rmv: Пала со амэ подаса дума андэ авэр статья ? en: What will we consider in the following article ?
Implication: the following is a sentence pair between English and Adyghe:
ady: Сыда къыкІэлъыкІорэ статьям щызэхэтфыщтыр ? en: What will we consider in the following article ?
(Interestingly, jw_rmv, which actually seems to be Vlax Romany in Cyrillic script, is the one language that is aligned with the most other languages -- more than English!)
Useful theorem: every language in JW300 is parallel with English, Assamese, or both.