psm_utils fix parsing PSMs and complete protein names in XTandem

[edited after adding fix for PSM parsing]

As XTandem's protein names tend to be abbreviated in the protein "label" tag, change the origin to the "note" tag.
While XTandem saves only the highest scoring PSMs per spectrum, these can still be more than one PSM, with different peptidoforms, if the score is exact the same. This is not an extremely rare case, especially with equal peptides (think of a single AA flip in the sequence). This fix parses the identifications with same peptidoforms into one new PSM, with only the relevant proteins assigned to each PSM. Before, there were weird matches of proteins to peptides, which did not occur in the databases used by XTandem.
Also, it seems as the remark that only one protein per peptide/PSM is parsed is thus not true anymore.

May 03 '24 16:05 julianu

I updated the comment for the initial PR, as there were some further additions to it.

May 07 '24 14:05 julianu

Codecov Report

Attention: Patch coverage is 15.38462% with 11 lines in your changes missing coverage. Please review.

Project coverage is 63.97%. Comparing base (6e51896) to head (5d01b6f). Report is 2 commits behind head on main.

Files	Patch %	Lines
psm_utils/io/xtandem.py	15.38%	11 Missing :warning:

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #83      +/-   ##
==========================================
- Coverage   64.12%   63.97%   -0.16%     
==========================================
  Files          26       26              
  Lines        2492     2498       +6     
==========================================
  Hits         1598     1598              
- Misses        894      900       +6

Flag	Coverage Δ
unittests	`63.97% <15.38%> (-0.16%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jul 10 '24 11:07 codecov[bot]