rss-bridge icon indicating copy to clipboard operation
rss-bridge copied to clipboard

Bridge request for TranslateFeedBridge

Open yamanq opened this issue 3 years ago • 6 comments

Bridge request

General information

This bridge is a FeedExpander that translates the title and content of the feed from one language to another. Additionally, parameters could be used to determine which translation service to be used (if multiple are implemented).

  • Host URI for the bridge (i.e. https://github.com): Depends on the translation service (LibreTranslate, Google Translate, and DeepL Translator are some that I found that can translate multiple languages).

  • Which information would you like to see? Same as input feed, but with translated title and content. Importantly, outer HTML should not be mangled by the translation.

Options

  • [X] Limit number of returned items
    • Default limit: No Limit
  • [X] Balance requests (RSS-Bridge uses cached versions to reduce bandwith usage)
    • Timeout (default = 5 minutes, max = 24 hours): Cache may have to be implemented to detect if a feed item has been changed before translating it again. This would help keep API usage low for the above translation services. I don't know if FeedExpander does this already.

yamanq avatar Apr 03 '22 17:04 yamanq

Thats a pretty cool idea! Are you working on that already? If so, open a draft PR and I might take a swing as well.

Bockiii avatar Apr 03 '22 18:04 Bockiii

Thanks. I haven't started yet but I will put a comment here if I try to implement it.

yamanq avatar Apr 03 '22 18:04 yamanq

general idea:

  • use libretranslate (add all the known urls for load spreading)
  • Pick up rss feed. Array the content by splitting it between html tags so you have
<html>
<body>
<p>
here be text
</p>

etc.

  • Send everything that is not an html tag to the translator. Either by preparing one big json call (to limits calls) or one call per line (will make it easier but more taxing on the translation server probably).
  • rebuild the content string

Bockiii avatar Apr 03 '22 19:04 Bockiii

A couple more ideas on how to traverse the HTML, though they are untested:

  • simplehtmldom's find('text'), which finds all "Text Blocks" (See "Text & Comments" tab).
  • Add a callback function that translates as it's parsing. We can also do this in 2 rounds: first callback function collects all the strings, then we translate with one big call, then we set the callback function to something else and replace the text.

Another thing to consider is isolating the actual translation behavior to a protected function so that derivative classes can easily inherit and modify the translation service.

yamanq avatar Apr 03 '22 23:04 yamanq

Instead of hammering public translators, it might make sense to spin up our own heroku-hosted version of the translator.

I also like the idea of making it a general function as soon as we worked it out :)

Bockiii avatar Apr 04 '22 11:04 Bockiii

https://old.reddit.com/r/rss/comments/14lwpos/servicetool_that_offers_rss_feed_translation/

dvikan avatar Jun 30 '23 17:06 dvikan