gbfs icon indicating copy to clipboard operation
gbfs copied to clipboard

Internationalization of free-form text descriptions

Open EdouardBavoux opened this issue 5 years ago • 13 comments

There are some free-form text descriptions specified in the GBFS (example : cross_street field in station_information.json). I think it would be great to have the possibility to describe these fields in various languages. So far, I think the only way to do so would be to publish/import different sets of GBFS files for different languages (which is heavy, and adds a risk of conflict between the sets of files.) Did you consider this possibility already ? Thanks, Edouard

EdouardBavoux avatar Sep 02 '20 07:09 EdouardBavoux

@EdouardBavoux The format of translations in GBFS surprised me too. As it shows in https://github.com/NABSA/gbfs/blob/master/gbfs.md#gbfsjson, you duplicate all the files to have different languages:

{
  "last_updated": 1434054678,
  "ttl": 0,
  "version": "2.0",
  "data": {
    "en": {
      "feeds": [
        {
          "name": "system_information",
          "url": "https://www.example.com/gbfs/1/en/system_information"
        },
        {
          "name": "station_information",
          "url": "https://www.example.com/gbfs/1/en/station_information"
        }
      ]
    },
    "fr" : {
      "feeds": [
        {
          "name": "system_information",
          "url": "https://www.example.com/gbfs/1/fr/system_information"
        },
        {
          "name": "station_information",
          "url": "https://www.example.com/gbfs/1/fr/station_information"
        }
      ]
    }
  }
}

I'm curious if anyone knows the rationale behind this original design decision, and if others feel like having a more lightweight translation mechanism would be helpful.

barbeau avatar Sep 02 '20 13:09 barbeau

I don't recall exactly how this was arrived at but at the time there were multiple providers who were already publishing feeds in this way. I think we probably just went along with what was already the established practice.

On Wed, Sep 2, 2020 at 8:29 AM Sean Barbeau [email protected] wrote:

@EdouardBavoux https://github.com/EdouardBavoux The format of translations in GBFS surprised me too. As it shows in https://github.com/NABSA/gbfs/blob/master/gbfs.md#gbfsjson, you duplicate all the files to have different languages:

{ "last_updated": 1434054678, "ttl": 0, "version": "2.0", "data": { "en": { "feeds": [ { "name": "system_information", "url": "https://www.example.com/gbfs/1/en/system_information" }, { "name": "station_information", "url": "https://www.example.com/gbfs/1/en/station_information" } ] }, "fr" : { "feeds": [ { "name": "system_information", "url": "https://www.example.com/gbfs/1/fr/system_information" }, { "name": "station_information", "url": "https://www.example.com/gbfs/1/fr/station_information" } ] } } }

I'm curious if anyone knows the rationale behind this original design decision, and if others feel like having a more lightweight translation mechanism would be helpful.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NABSA/gbfs/issues/262#issuecomment-685737582, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUHWFLHIUG27UGSZD2XXZTSDZCEPANCNFSM4QSVNO5Q .

--

Mitch Vars | en | Senior Data Modeler | MobilityData IO | mobilitydata.org

[email protected] | +1 612 788 7627 | Minneapolis, Minnesota, USA

mplsmitch avatar Sep 02 '20 14:09 mplsmitch

Just to give an example of an alternate translation design, GTFS-realtime uses a single file for data in all languages (so non-human readable things like booleans and integers aren't duplicated), but for each human-readable field in Alerts it includes two values: the language code and the translation for each language.

The end result looks something like this:

 alert {
    "active_period": {
      "start": 1284457468,
      "end": 1284468072
    },
    ...
     "description": [
        {
          "text": "The Elm street the stop is closed.",
          "language": "en"
        }, {
          "text": "L'arrêt de la rue Elm est fermé.",
          "language": "fr"
        }
      ]
  }

barbeau avatar Sep 08 '20 20:09 barbeau

@barbeau : That's definitely a better approach for translations, indeed.

EdouardBavoux avatar Sep 09 '20 13:09 EdouardBavoux

This disucssion has been automatically marked as stale because it has not had recent activity. It will be closed in 60 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 07 '21 14:01 stale[bot]

This discussion has been closed due to inactivity. Discussions can always be reopened after they have been closed.

stale[bot] avatar Mar 08 '21 14:03 stale[bot]

Is there any chance that this will be considered for 3.x @mplsmitch ?

testower avatar Apr 20 '21 17:04 testower

@testower we've got this on our roadmap for Q3 this year so if we're able to pass something it would go in 3.x

mplsmitch avatar Apr 20 '21 19:04 mplsmitch

I'd like to reopen this discussion - it was closed by the StaleBot. No additional work has been done on this issue. Is this something we should be looking at for v3.0?

mplsmitch avatar Sep 16 '21 19:09 mplsmitch

This is definitely on our wish list

testower avatar Sep 17 '21 07:09 testower

As part of our extension process MobilityData has conducted a needs assessment for this issue by doing a feed census to understand the ways in which various producers choose to internationalize their feeds and how big of a need this is.

The next step will be to create an extension proposal based on the conclusion we arrive to in this discussion. Are there other ways in which you have seen internationalization done in GBFS? How about in other specs (like GTFS-realtime above)?

josee-sabourin avatar Nov 12 '21 16:11 josee-sabourin

@josee-sabourin Just like to chime in that on our side (Entur) internationalization is way for data providers to comply with our requirement for Norwegian language in human facing texts.

testower avatar Nov 15 '21 20:11 testower

This discussion has been automatically marked as stale because it has not had recent activity. It will be closed in 60 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 12 '22 22:08 stale[bot]

Any news on this @heidiguenin ?

testower avatar Oct 12 '22 07:10 testower

Hi @testower - I stopped working for MobilityData in February this year, so this one's not in my hands anymore. Maybe @josee-sabourin can help?

heidiguenin avatar Oct 12 '22 17:10 heidiguenin

Hi @testower,

From our side at MobilityData we are still very much interested in supporting the community to explore potential solutions to improve translations in GBFS (i.e. internationalization).

Currently we are undergoing some work on other parts of the spec that demand some of our capacity at the moment, but considering that this is a breaking change there’s an opportunity to incorporate this in v3.0 if a proposal can be agreed upon and voted on, thus, we would like to encourage the community to champion this.

If it is within your possibilities (or those of any other member of the community) to develop a proposal for discussion, we would be happy to support the champion along this discussion and the subsequent change process (PR opening and vote calling).

Rest assured we’ll closely follow the discussion around this issue.

Sergiodero avatar Oct 14 '22 17:10 Sergiodero

@Sergiodero I would be happy to do this, but I may need some guidance wrt where to start. I think @barbeau's suggestion is as good a starting point as any other. The road to a PR is unclear to me. Do we even agree which fields are considered "human readable" and eligible for translation?

testower avatar Oct 16 '22 20:10 testower

Hello @testower - I'm also stuck on where to begin on this. It's a big job. I would say any text that could be expressed in multiple languages, right down to the name of the system should be considered, for example:

 "name": [
        {
          "text": "Blue Bikes",
          "language": "en"
        }, {
          "text": "Blå sykler",
          "language": "no"
        }
      ]
  }

We need a way to designate which fields are open to translation. All of the fields that this would apply to are currently typed as String but obviously not all of those strings would need translation. We could make up a new field type, like Language String that would be applied to those that could be expressed in multiple languages. The challenge is that in the example above, name, which currently has a type of string would have a type of array. The new Language String type would need to be applied to the text field within the name array.

The way we've designated the fields that are contained in arrays is already confusing to some people so maybe this is a opportunity to fix that. We could re-write everything using the format of this page so it would look like:

Field Name Required Type Defines
name array Yes An array of objects where each object contains a text string and it's corresponding language code
name[ ].text Yes Language String The public name of the station for display in maps, digital signage, and other text applications. Names SHOULD reflect the station location through the use of a cross street or local landmark. Abbreviations SHOULD NOT be used for names and other text (for example, "St." for "Street") unless a location is called by its abbreviated name (for example, “JFK Airport”). See Text Fields and Naming.
name[ ].language Yes Language IETF BCP 47 language code

This gets clunky in places where there are nested arrays and objects, for example geofencing_zones.features[].properties.rules[].ride_allowed

mplsmitch avatar Oct 16 '22 22:10 mplsmitch

For clarity, compare to the current style:

Field Name Required Type Defines
name Yes Array An array of objects where each object contains a text string and it's corresponding language code
- text Yes String The public name of the station for display in maps, digital signage, and other text applications. Names SHOULD reflect the station location through the use of a cross street or local landmark. Abbreviations SHOULD NOT be used for names and other text (for example, "St." for "Street") unless a location is called by its abbreviated name (for example, “JFK Airport”). See Text Fields and Naming.
- language Yes Language IETF BCP 47 language code

(Note that I have used String, not Language String, as I don't see the need for a separate type for this, since the content is already restricted by the object it's contained within.)

testower avatar Oct 17 '22 07:10 testower

that works - you're right, we should stick with the current format and focus on the language problem. If it becomes an issue we can always change it later

mplsmitch avatar Oct 19 '22 16:10 mplsmitch

Here's a draft outline of changes based on what's on master (v3.0-Draft)

gbfs.json

The structure here should change so that the data object contains a single field feeds instead of one field per language.

system_information.json

  • The language field should be changed to list supported languages with an Array of Language.

Fields open for translation:

  • name
  • short_name
  • operator
  • attribution_organization_name (?)

vehicle_types.json

  • name

station_information.json

  • name
  • short_name

system_regions.json

  • name

system_pricing_plans.json

  • name
  • description

system_alerts.json

  • summary
  • description

geofencing_zones.json

  • features.properties.name

Other considerations

I think we also need to consider that when linking to content that is human readable, such as privacy policies, terms of use, or alert information, a link for each supported language should be provided.

testower avatar Oct 26 '22 07:10 testower

This looks really good - I don't see any other fields that would need translating.

I think we also need to consider that when linking to content that is human readable, such as privacy policies, terms of use, or alert information, a link for each supported language should be provided.

I agree this worth doing. It can be handled in the same way as the other translatable fields (with arrays). Fields it would apply to:

system_information.json

  • terms_url
  • privacy_url
  • brand_terms_url
  • license_url

Happy to help if I can, let me know

mplsmitch avatar Oct 26 '22 17:10 mplsmitch

Probably also url in system_alerts.json

testower avatar Oct 27 '22 09:10 testower

The localization section also needs revision (https://github.com/MobilityData/gbfs/blob/master/gbfs.md#localization), or maybe it's no longer needed?

testower avatar Oct 27 '22 10:10 testower

There are a few more fields in vehicle_types.json that should be considered: make, model, and color. I'm guessing that make and model may not be translatable within any given system (but should we make that assumption), while color probably is.

testower avatar Oct 28 '22 09:10 testower

  • terms_url
  • privacy_url
  • brand_terms_url
  • license_url

Out of these, could we possibly rule out brand_terms_url, and license_url as these are not intended for end-users but consumers of the data?

testower avatar Oct 28 '22 10:10 testower

Hi guys, thanks a lot for carrying on this topic. I have changed company and domain, so I am a bit far away from transportation topics now, but I read through the discussions and here is are my few inputs :

  1. Any url could have several translations, for example maps, or images could have text on it, so several translation available (Although I agree that these are not the priority, so if it involves too much work, it could be left aside - The urls you specified being the priority)
  2. Indeed, color is an hexa code in system_information.json but a human text description in vehicle_types.json for instance, so it should de translatable wherever it is free text.
  3. Should we allow the GBFS producer to indicate which is the default language in the format itself (to ease integration) ? Is it the new role of the language field in system_information.json ? (In this case, maybe just an additional sentence in "Defines" column, to state that this language is the default one, being overrided by eventual translations ?

EdouardBavoux avatar Jan 04 '23 09:01 EdouardBavoux

Closing this to ensure all discussion ends up in #460

josee-sabourin avatar Jan 19 '23 13:01 josee-sabourin