Langstrings as Resources or not?

Open joepio opened this issue 5 years ago • 1 comments

Providing a standard for language strings (like RDF does) has some great benefits. For example, it allows for smart clients to show the right translation.

But how should it be modeled? There are at least a couple of options, and each one has some serious benefits and drawbacks:

Adding a language field in all Atoms

This is basically what RDF does - add a separate field in every single statement. This solves the issue, but adding a column to the Core model (of Subject, Property, Value) is very costly in many regards. The "triple" suddenly becomes a "quad" - and every part of the ecosystem has to explicitly deal with that. Serialization formats, libraries... Most importantly, the mental model becomes more complex. In Atomic Data, it also collides with the Subject Property uniqueness - how would you add two translated strings for one S P combination? You'd have to replace S P uniqueness with S P L uniqueness, again making everything more complex. It also makes translation-heavy resources very large.

Adding some custom serialization in the Value

This basically means - create some custom datatype with some custom parsing. Again, every single library has to deal with this. Even if we choose something simple (e.g. a JSON array with objects containing lang and text tags), we still require all Atomic Data parsers to also have some JSON parser, and implement some custom logic.

Another downside, is that this doesn't play nice with Atomic Mutations - it would be impossible to add a single translation, you'd have to replace the entire Value.

Just ignore it, let someone else (or some other proposed standard) deal with this issue

Tempting, but no. Not offering a default go-to solution in this book will probably lead to a fragmented landscape, incompatible formats and a lot of frustration.

Every single translation is a Resource

This actually makes a lot of sense, and does not require any weird parsing tricks. However, it requires clients to create these resources (and their respective identifiers) which can be a hassle. It also requires a model in between to provide the collections of translation resources themselves (like an array?), and that poses a new challenge: how do we make sure that the client is not required to fetch and parse every single translation, if its only interested in a single translation? Which brings us to...

Bundled translation Resources (all translations in 1 resource)

We introduce a class for Translations, and create a property for every single language. Similar to the method above, this does not require weird parsing tricks. It creates far less Resources than the method above, which is also nice. The resulting resource could get quite big, though, and clients need to fetch every single one. Combined with the Atomic Data Shortnames, it would offer some cool and clean query options:

harryPotter1.title.en => Harry Potter and the Philosopher's Stone

harryPotter1.title.nl => Harry Potter en de Steen der Wijzen

Could be nested in a Resource, so you don't need new Subjects
Search results will point to the translation, not the actual resource above it
We add a useLocalString hook, which will know to fetch the URL of the linked Translation and render the locale variant

All in all, this final option seems like the best for now, but if I'm missing some options or important insights - let me know below!

Jun 24 '20 21:06 joepio

Maybe allow the languagestring datatype to either be a string, or an object with languages?

May 18 '21 13:05 joepio