language icon indicating copy to clipboard operation
language copied to clipboard

TempLAMA generation fails with UnicodeEncodeError

Open eric-mitchell opened this issue 4 years ago • 0 comments

When running the TempLAMA generation code on the official Ubuntu 18.04 docker image, non-ASCII characters cause a UnicodeEncodeError in sling2facts.py. I needed to change the write_kb method to write utf-8 encoded bytes rather than text in order to get this to work:

def write_kb(self, filename):
  """Write out all triples rel/subject/object, perhaos with qualifiers."""                                                                                                                                     
  with open(filename, 'wb') as fp:  # <--- open file in binary mode
    for f in self.frames(filter_english=FLAGS.skip_nonenglish):                                                                                                                                                
      for t in self.as_triples(SlotCollection(f)):                                                                                                                                                             
        fp.write((t + '\n').encode('utf-8'))  # <--- write utf-8 bytes

eric-mitchell avatar Oct 01 '21 16:10 eric-mitchell