article-extractor icon indicating copy to clipboard operation
article-extractor copied to clipboard

Improvements in data structure (keywords and image added) and corrections in href logic

Open morissonmaciel opened this issue 7 years ago • 0 comments

Hello, @thomastuts, I make some changes in article-extractor to improve de data structure returned from its main extractArticle function.

First I included two new attributes:

  1. keywords Obtained from tags related to "keywords" name and "swiftype > keys" variant (common used in most articles in internet (see. Engadget.com and all Vox Media articles pages)

  2. image Obtained from two sources: a scored rank from all <img> from <body> or <main> section.; otherwise from tags related to "swiftype > image" variant.

Also, make changes in obtaining author data using tags related to "swiftype > blogger_name" variant.

Note: author image can be obtained from blogger_image and may be pushed to a new metadata property in future improvement.

The documentation was slightly improved with these new fields and a increment in minor version was made: 1.1.0

morissonmaciel avatar Nov 10 '18 16:11 morissonmaciel