Cross-Language-Dataset
Cross-Language-Dataset copied to clipboard
Meaning of file name in masks
Hello again,
Another question about the mask file: in the following line from readme,
{"_id":{"$oid":"56bdbf0fe405a41c1f8b4569"},"0":0,"1":2,"2":"1462817114-25","3":"727911955-101"}
what does 25 in 1462817114-25 denote? (similarly, 101 in 727911955-101?) The readme says they are file names, but I checked the corpus and found there're only file names such as 1462817114 and 727911955, not including the ending part. Does 25 and 101 refer to line index (but oftentimes this number exceeds the total number of lines)?
Thanks again for your help!