SpectraTutorials icon indicating copy to clipboard operation
SpectraTutorials copied to clipboard

export massbank spectra object 2 mgf format encounter error.

Open gmhhope opened this issue 4 years ago • 2 comments

Sorry for bombarding questions here! This question is not urgent but I would like to see if you can tell what is going wrong.

When I try to export massbank spectra object to mgf file, I encounter this error:

Error in .export_mgf(x = x, con = file, mapping = mapping) : 
  Column(s) synonym contain multiple elements per row. Please either drop this column or reduce its elements to a single value per row.

I do find out that some of the SpectraVaribles may indeed have multiple elements per row?

Below I extract one massbank spectra object by calling spectraData() and then manually transformed it vertically.

msLevel	2
rtime	NA
acquisitionNum	NA
scanIndex	NA
dataStorage	<MassBank>
dataOrigin	NA
centroided	NA
smoothed	NA
polarity	0
precScanNum	NA
precursorMz	506
precursorIntensity	408.000305
precursorCharge	NA
collisionEnergy	NA
isolationWindowLowerMz	NA
isolationWindowTargetMz	NA
isolationWindowUpperMz	NA
spectrum_id	MCH00020
spectrum_name	Adenosine 5'-triphosphate; LC-ESI-ITFT; MS2; [M-H]-
date	2016.01.19 (Created 2011.01.06, modified 2011.08.03)
authors	Yoshikuni K, Tajiri M, Wada Y, Osaka Medical Center for Maternal and Child Health
license	CC BY-SA
copyright	NA
publication	NA
splash	splash10-0a4i-0000900000-e9a09b9360491c310280
compound_id	1
adduct	[M-H]-
ionization	NA
ionization_voltage	NA
fragmentation_mode	NA
collision_energy_text	NA
instrument	LTQ Orbitrap XL, Thermo Scientific
instrument_type	LC-ESI-ITFT
formula	C10H16N5O13P3
exactmass	506.99575
smiles	Nc(n3)c(n2)c(nc3)n(c2)[C@]([H])(O1)[C@]([H])(O)[C@]([H])(O)[C@@]([H])(COP(O)(=O)OP(O)(=O)OP(O)(O)=O)1
inchi	InChI=1S/C10H16N5O13P3/c11-8-5-9(13-2-12-8)15(3-14-5)10-7(17)6(16)4(26-10)1-25-30(21,22)28-31(23,24)27-29(18,19)20/h2-4,6-7,10,16-17H,1H2,(H,21,22)(H,23,24)(H2,11,12,13)(H2,18,19,20)/t4-,6-,7-,10-/m1/s1
inchikey	ZKHQWZAMYRWXGA-KQYNXXCUSA-N
cas	56-65-5
pubchem	CID:5957
synonym	c("Adenosine 5'-triphosphate"
precursor_mz_text	 "ATP")
compound_name	506
	Adenosine 5'-triphosphate

However, I do not know how to fix this. Could you shoot me anything that may help? Thanks very much!

Best, Minghao

gmhhope avatar Jun 04 '21 01:06 gmhhope

And BTW, why the precursorMz is integer? 506? Should it keep at least four digits after?

precursorMz	506
precursor_mz_text 506

gmhhope avatar Jun 04 '21 01:06 gmhhope

you would have to check if any of the spectraData variables is of type "list" or "List" and then either drop or collapse them. A possible solution could be:

.is_list <- function(x) {
    is.list(x) || inherits(x, "List")
}
SD <- spectraData(sps)
idx <- which(vapply(SD, .is_list, logical(1)))
for (i in idx) {
    SD[, i] <- vapply(SD[, i], paste, FUN.VALUE = character(1), collapse = ";")
}
spectraData(sps) <- SD

This code will replace all columns in the spectra data that are of type list or List into a character column with the elements per row being collapsed by ";", i.e. if before you had e.g. in a column "synonym" multiple values per row c("a", "b", "c"), these will be combined into a single value per row "a;b;c".

Regarding the precursorMz, it depends what the actual value is (i.e. what is reported in MassBank or your mzML files). What is the $precursorMz in the original spectra? is it already 506 there are is there another number?

jorainer avatar Jun 07 '21 06:06 jorainer