Internationalization of free-form text descriptions
There are some free-form text descriptions specified in the GBFS (example : cross_street field in station_information.json). I think it would be great to have the possibility to describe these fields in various languages. So far, I think the only way to do so would be to publish/import different sets of GBFS files for different languages (which is heavy, and adds a risk of conflict between the sets of files.) Did you consider this possibility already ? Thanks, Edouard
@EdouardBavoux The format of translations in GBFS surprised me too. As it shows in https://github.com/NABSA/gbfs/blob/master/gbfs.md#gbfsjson, you duplicate all the files to have different languages:
{
"last_updated": 1434054678,
"ttl": 0,
"version": "2.0",
"data": {
"en": {
"feeds": [
{
"name": "system_information",
"url": "https://www.example.com/gbfs/1/en/system_information"
},
{
"name": "station_information",
"url": "https://www.example.com/gbfs/1/en/station_information"
}
]
},
"fr" : {
"feeds": [
{
"name": "system_information",
"url": "https://www.example.com/gbfs/1/fr/system_information"
},
{
"name": "station_information",
"url": "https://www.example.com/gbfs/1/fr/station_information"
}
]
}
}
}
I'm curious if anyone knows the rationale behind this original design decision, and if others feel like having a more lightweight translation mechanism would be helpful.
I don't recall exactly how this was arrived at but at the time there were multiple providers who were already publishing feeds in this way. I think we probably just went along with what was already the established practice.
On Wed, Sep 2, 2020 at 8:29 AM Sean Barbeau [email protected] wrote:
@EdouardBavoux https://github.com/EdouardBavoux The format of translations in GBFS surprised me too. As it shows in https://github.com/NABSA/gbfs/blob/master/gbfs.md#gbfsjson, you duplicate all the files to have different languages:
{ "last_updated": 1434054678, "ttl": 0, "version": "2.0", "data": { "en": { "feeds": [ { "name": "system_information", "url": "https://www.example.com/gbfs/1/en/system_information" }, { "name": "station_information", "url": "https://www.example.com/gbfs/1/en/station_information" } ] }, "fr" : { "feeds": [ { "name": "system_information", "url": "https://www.example.com/gbfs/1/fr/system_information" }, { "name": "station_information", "url": "https://www.example.com/gbfs/1/fr/station_information" } ] } } }
I'm curious if anyone knows the rationale behind this original design decision, and if others feel like having a more lightweight translation mechanism would be helpful.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NABSA/gbfs/issues/262#issuecomment-685737582, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUHWFLHIUG27UGSZD2XXZTSDZCEPANCNFSM4QSVNO5Q .
--
Mitch Vars | en | Senior Data Modeler | MobilityData IO | mobilitydata.org
[email protected] | +1 612 788 7627 | Minneapolis, Minnesota, USA
Just to give an example of an alternate translation design, GTFS-realtime uses a single file for data in all languages (so non-human readable things like booleans and integers aren't duplicated), but for each human-readable field in Alerts it includes two values: the language code and the translation for each language.
The end result looks something like this:
alert {
"active_period": {
"start": 1284457468,
"end": 1284468072
},
...
"description": [
{
"text": "The Elm street the stop is closed.",
"language": "en"
}, {
"text": "L'arrêt de la rue Elm est fermé.",
"language": "fr"
}
]
}
@barbeau : That's definitely a better approach for translations, indeed.
This disucssion has been automatically marked as stale because it has not had recent activity. It will be closed in 60 days if no further activity occurs. Thank you for your contributions.
This discussion has been closed due to inactivity. Discussions can always be reopened after they have been closed.
Is there any chance that this will be considered for 3.x @mplsmitch ?
@testower we've got this on our roadmap for Q3 this year so if we're able to pass something it would go in 3.x
I'd like to reopen this discussion - it was closed by the StaleBot. No additional work has been done on this issue. Is this something we should be looking at for v3.0?
This is definitely on our wish list
As part of our extension process MobilityData has conducted a needs assessment for this issue by doing a feed census to understand the ways in which various producers choose to internationalize their feeds and how big of a need this is.
The next step will be to create an extension proposal based on the conclusion we arrive to in this discussion. Are there other ways in which you have seen internationalization done in GBFS? How about in other specs (like GTFS-realtime above)?
@josee-sabourin Just like to chime in that on our side (Entur) internationalization is way for data providers to comply with our requirement for Norwegian language in human facing texts.
This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 60 days if no further activity occurs. Thank you for your contributions.
Any news on this @heidiguenin ?
Hi @testower - I stopped working for MobilityData in February this year, so this one's not in my hands anymore. Maybe @josee-sabourin can help?
Hi @testower,
From our side at MobilityData we are still very much interested in supporting the community to explore potential solutions to improve translations in GBFS (i.e. internationalization).
Currently we are undergoing some work on other parts of the spec that demand some of our capacity at the moment, but considering that this is a breaking change there’s an opportunity to incorporate this in v3.0 if a proposal can be agreed upon and voted on, thus, we would like to encourage the community to champion this.
If it is within your possibilities (or those of any other member of the community) to develop a proposal for discussion, we would be happy to support the champion along this discussion and the subsequent change process (PR opening and vote calling).
Rest assured we’ll closely follow the discussion around this issue.
@Sergiodero I would be happy to do this, but I may need some guidance wrt where to start. I think @barbeau's suggestion is as good a starting point as any other. The road to a PR is unclear to me. Do we even agree which fields are considered "human readable" and eligible for translation?
Hello @testower - I'm also stuck on where to begin on this. It's a big job. I would say any text that could be expressed in multiple languages, right down to the name of the system should be considered, for example:
"name": [
{
"text": "Blue Bikes",
"language": "en"
}, {
"text": "Blå sykler",
"language": "no"
}
]
}
We need a way to designate which fields are open to translation. All of the fields that this would apply to are currently typed as String but obviously not all of those strings would need translation. We could make up a new field type, like Language String that would be applied to those that could be expressed in multiple languages. The challenge is that in the example above, name, which currently has a type of string would have a type of array. The new Language String type would need to be applied to the text field within the name array.
The way we've designated the fields that are contained in arrays is already confusing to some people so maybe this is a opportunity to fix that. We could re-write everything using the format of this page so it would look like:
| Field Name | Required | Type | Defines |
|---|---|---|---|
name |
array | Yes | An array of objects where each object contains a text string and it's corresponding language code |
name[ ].text |
Yes | Language String | The public name of the station for display in maps, digital signage, and other text applications. Names SHOULD reflect the station location through the use of a cross street or local landmark. Abbreviations SHOULD NOT be used for names and other text (for example, "St." for "Street") unless a location is called by its abbreviated name (for example, “JFK Airport”). See Text Fields and Naming. |
name[ ].language |
Yes | Language | IETF BCP 47 language code |
This gets clunky in places where there are nested arrays and objects, for example
geofencing_zones.features[].properties.rules[].ride_allowed
For clarity, compare to the current style:
| Field Name | Required | Type | Defines |
|---|---|---|---|
name |
Yes | Array | An array of objects where each object contains a text string and it's corresponding language code |
- text |
Yes | String | The public name of the station for display in maps, digital signage, and other text applications. Names SHOULD reflect the station location through the use of a cross street or local landmark. Abbreviations SHOULD NOT be used for names and other text (for example, "St." for "Street") unless a location is called by its abbreviated name (for example, “JFK Airport”). See Text Fields and Naming. |
- language |
Yes | Language | IETF BCP 47 language code |
(Note that I have used String, not Language String, as I don't see the need for a separate type for this, since the content is already restricted by the object it's contained within.)
that works - you're right, we should stick with the current format and focus on the language problem. If it becomes an issue we can always change it later
Here's a draft outline of changes based on what's on master (v3.0-Draft)
gbfs.json
The structure here should change so that the data object contains a single field feeds instead of one field per language.
system_information.json
- The
languagefield should be changed to list supported languages with an Array of Language.
Fields open for translation:
-
name -
short_name -
operator -
attribution_organization_name(?)
vehicle_types.json
-
name
station_information.json
-
name -
short_name
system_regions.json
-
name
system_pricing_plans.json
-
name -
description
system_alerts.json
-
summary -
description
geofencing_zones.json
-
features.properties.name
Other considerations
I think we also need to consider that when linking to content that is human readable, such as privacy policies, terms of use, or alert information, a link for each supported language should be provided.
This looks really good - I don't see any other fields that would need translating.
I think we also need to consider that when linking to content that is human readable, such as privacy policies, terms of use, or alert information, a link for each supported language should be provided.
I agree this worth doing. It can be handled in the same way as the other translatable fields (with arrays). Fields it would apply to:
system_information.json
-
terms_url -
privacy_url -
brand_terms_url -
license_url
Happy to help if I can, let me know
Probably also url in system_alerts.json
The localization section also needs revision (https://github.com/MobilityData/gbfs/blob/master/gbfs.md#localization), or maybe it's no longer needed?
There are a few more fields in vehicle_types.json that should be considered: make, model, and color. I'm guessing that make and model may not be translatable within any given system (but should we make that assumption), while color probably is.
terms_urlprivacy_urlbrand_terms_urllicense_url
Out of these, could we possibly rule out brand_terms_url, and license_url as these are not intended for end-users but consumers of the data?
Hi guys, thanks a lot for carrying on this topic. I have changed company and domain, so I am a bit far away from transportation topics now, but I read through the discussions and here is are my few inputs :
- Any url could have several translations, for example maps, or images could have text on it, so several translation available (Although I agree that these are not the priority, so if it involves too much work, it could be left aside - The urls you specified being the priority)
- Indeed, color is an hexa code in system_information.json but a human text description in vehicle_types.json for instance, so it should de translatable wherever it is free text.
- Should we allow the GBFS producer to indicate which is the default language in the format itself (to ease integration) ? Is it the new role of the language field in system_information.json ? (In this case, maybe just an additional sentence in "Defines" column, to state that this language is the default one, being overrided by eventual translations ?
Closing this to ensure all discussion ends up in #460