node-feedparser icon indicating copy to clipboard operation
node-feedparser copied to clipboard

an angled bracket in title

Open piptan opened this issue 9 years ago • 2 comments

Hi,

If I put the following feed into the library -

` <rss version="2.0">

W3Schools Home Page http://www.w3schools.com Free web building tutorials RSS <<<Tutorial>>> http://www.w3schools.com/xml/xml_rss.asp New RSS tutorial on W3Schools `

The parsed output is -

{
  title: 'RSS >>',
  description: 'New RSS tutorial on W3Schools',
  summary: 'New RSS tutorial on W3Schools',
  date: null,
  pubdate: null,
  pubDate: null,
  link: 'http://www.w3schools.com/xml/xml_rss.asp',
  guid: 'http://www.w3schools.com/xml/xml_rss.asp',
  author: null,
  comments: null,
  origlink: null,
  image: {},
  source: {},
  categories: [],
  enclosures: [],
  'rss:@': {},
  'rss:title': { '@': {}, '#': 'RSS <<<Tutorial>>>' },
  'rss:link': { '@': {}, '#': 'http://www.w3schools.com/xml/xml_rss.asp' },
  'rss:description': { '@': {}, '#': 'New RSS tutorial on W3Schools' },
}

Please note how title contains the incorrect text, but rss:title has the right content.

piptan avatar Apr 01 '16 13:04 piptan

@danmactough is there a option to pass when calling feedparser to remove '{ '@': {}, '#': value} and just get the value? So instead of 'rss:link': { '@': {}, '#': 'http://www.w3schools.com/xml/xml_rss.asp' } to get 'rss:link: 'http://www.w3schools.com/xml/xml_rss.asp'?

theasteve avatar Mar 05 '19 22:03 theasteve

@theasteve 'rss:link' is a "raw" element, meaning it isn't normalized and retains all the information in the original XML. As a result, we need to retain both the attributes (the @) and the text node (the #).

But generally, the item's link property will have the value you want.

danmactough avatar Mar 06 '19 00:03 danmactough