AbPyTools icon indicating copy to clipboard operation
AbPyTools copied to clipboard

KeyError: 'X'

Open m-abyzov opened this issue 6 years ago • 1 comments

File "/home/misha/anaconda3/lib/python3.6/site-packages/abpytools/core/chain.py", line 71, in load_from_string
    new_chain.load()
  File "/home/misha/anaconda3/lib/python3.6/site-packages/abpytools/core/chain.py", line 92, in load
    self.load()
  File "/home/misha/anaconda3/lib/python3.6/site-packages/abpytools/core/chain.py", line 101, in load
    self.hydrophobicity_matrix = self.ab_hydrophobicity_matrix()
  File "/home/misha/anaconda3/lib/python3.6/site-packages/abpytools/core/chain.py", line 217, in ab_hydrophobicity_matrix
    sequence=self._sequence)
  File "/home/misha/anaconda3/lib/python3.6/site-packages/abpytools/core/chain.py", line 434, in calculate_hydrophobicity_matrix
    else 0 for x in whole_sequence])
  File "/home/misha/anaconda3/lib/python3.6/site-packages/abpytools/core/chain.py", line 434, in <listcomp>
    else 0 for x in whole_sequence])
KeyError: 'X'
We've got an error while stopping in post-mortem: <class 'KeyboardInterrupt'>

Error is thrown in code below:

def calculate_hydrophobicity_matrix(whole_sequence, numbering, aa_hydrophobicity_scores, sequence):
    # instantiate numpy array (whole sequence includes all the amino acid positions of the VH/VL, 
    # even the ones
    # that aren't occupied -> these will be filled with zeros
    # hydrophobicity_matrix = np.zeros(len(whole_sequence))
    # #
    # # # iterate through each position
    # for i, position in enumerate(whole_sequence):
    # 
    #     if position in numbering:
    #         position_in_data = numbering.index(position)
    #         hydrophobicity_matrix[i] = aa_hydrophobicity_scores[sequence[position_in_data]]

    #  return hydrophobicity_matrix
    # same thing as above but in a comprehension list
    return np.array([aa_hydrophobicity_scores[sequence[numbering.index(x)]] if x in numbering
                     else 0 for x in whole_sequence])

seems that your "comprehension list" approach contains a bug

m-abyzov avatar May 23 '19 09:05 m-abyzov

This was thrown because of some amino acid sequence of FAB can contain unknown amino acids, which can be marked as X or ? (e.g. antibody 4OD1 contains such area in sequence: "..WWSDXXDFG.."). Why in this case is impossible to determine CDRs? Why the exception was thrown?

m-abyzov avatar May 23 '19 18:05 m-abyzov