MassBank-data icon indicating copy to clipboard operation
MassBank-data copied to clipboard

Duplicated records

Open Treutler opened this issue 6 years ago • 1 comments

I just stumbled over two records, which seem to be duplicates. Meta data as well as the spectrum is exactly the same. https://massbank.eu/MassBank/RecordDisplay.jsp?id=TY000228&dsn=Univ_Toyama https://massbank.eu/MassBank/RecordDisplay.jsp?id=TY000237&dsn=Univ_Toyama Maybe it is worth to search MassBank globally for such cases. I guess we will have to contact the contributors in any case.

How to tackle this? I suggest to introduce a "DEPRECATED" tag for records which are duplicated (this issue) or noisy (e.g. #51) or otherwise erroneous (#9).

Treutler avatar Mar 25 '19 08:03 Treutler

Yes to a DEPRECATED tag ... I think this will help us keep the record IDs live but communicate beyond COMMENT that there is an issue.... if we hide this in COMMENT tags information will get lost as several records have several COMMENTs

We should do a global check for duplicates, I found some UF cases that are likely duplicates too: Butylparaben UF4158** records and UF4234** records? I did not do a 1:1 match, but they were flagged by Oberacher and have identical "scores" in his results .. You can check by SPLASH?

I am going to comment some validation suggestions on the Validator issue shortly ...

schymane avatar Mar 25 '19 09:03 schymane