Bridge request for TranslateFeedBridge
Bridge request
General information
This bridge is a FeedExpander that translates the title and content of the feed from one language to another. Additionally, parameters could be used to determine which translation service to be used (if multiple are implemented).
-
Host URI for the bridge (i.e.
https://github.com): Depends on the translation service (LibreTranslate, Google Translate, and DeepL Translator are some that I found that can translate multiple languages). -
Which information would you like to see? Same as input feed, but with translated title and content. Importantly, outer HTML should not be mangled by the translation.
Options
- [X] Limit number of returned items
- Default limit: No Limit
- [X] Balance requests (RSS-Bridge uses cached versions to reduce bandwith usage)
- Timeout (default = 5 minutes, max = 24 hours): Cache may have to be implemented to detect if a feed item has been changed before translating it again. This would help keep API usage low for the above translation services. I don't know if FeedExpander does this already.
Thats a pretty cool idea! Are you working on that already? If so, open a draft PR and I might take a swing as well.
Thanks. I haven't started yet but I will put a comment here if I try to implement it.
general idea:
- use libretranslate (add all the known urls for load spreading)
- Pick up rss feed. Array the content by splitting it between html tags so you have
<html>
<body>
<p>
here be text
</p>
etc.
- Send everything that is not an html tag to the translator. Either by preparing one big json call (to limits calls) or one call per line (will make it easier but more taxing on the translation server probably).
- rebuild the content string
A couple more ideas on how to traverse the HTML, though they are untested:
- simplehtmldom's
find('text'), which finds all "Text Blocks" (See "Text & Comments" tab). - Add a callback function that translates as it's parsing. We can also do this in 2 rounds: first callback function collects all the strings, then we translate with one big call, then we set the callback function to something else and replace the text.
Another thing to consider is isolating the actual translation behavior to a protected function so that derivative classes can easily inherit and modify the translation service.
Instead of hammering public translators, it might make sense to spin up our own heroku-hosted version of the translator.
I also like the idea of making it a general function as soon as we worked it out :)
https://old.reddit.com/r/rss/comments/14lwpos/servicetool_that_offers_rss_feed_translation/