collectionsonline icon indicating copy to clipboard operation
collectionsonline copied to clipboard

Dash (-) in category names causing no results to be returned

Open jamieu opened this issue 8 years ago • 5 comments

https://collection.sciencemuseum.org.uk/search/categories/penn-gaskell-collection

unknown

jamieu avatar Sep 13 '17 09:09 jamieu

May not be fixable but we need to dedicate some time to reviewing and finding/proposing solutions - time estimate before proceeding.

iteles avatar Mar 11 '19 13:03 iteles

We are replacing spaces by "-" when building the url in the query parameter. This is to avoid having %20in the url (which represent spaces). This work well until the real name of the parameter contains a "-". On the backend we are converting back - to sapces to search with Elasticsearch. linked to https://github.com/TheScienceMuseum/collectionsonline/pull/951 and https://github.com/TheScienceMuseum/collectionsonline/issues/515#issuecomment-314836894

The issue here is Elasticsearch try to match the category penn gaskell collection instead of penn-gaskell collection.

Category didn't match, ensure the exact category title was used

Looking for a way to match the category when the name contains -

SimonLab avatar Mar 13 '19 13:03 SimonLab

To allow testing via the website a category containing a "-" we need first to display more categories in the list of filter (by default elasticsearch return the 10 first aggregation). We can add a size value to the aggregation, for example: terms: {size: 500, field: 'categories.name' } in https://github.com/TheScienceMuseum/collectionsonline/blob/736a0f7c4dbd30100dc7bd71dc841833aaf73c05/lib/facets/aggs-all.js#L20

This will display all the existing category (there are less than 500): image

The other categories containing a dash are:

  • X-rays
  • Medical Ceramic-ware
  • Medical Glass-ware
  • Pharmacy-ware
  • Penn-Gaskell Collection

SimonLab avatar Mar 13 '19 13:03 SimonLab

Sounds fine, not sure if this page also helps. http://collection.sciencemuseum.org.uk/categories

On Wed, 13 Mar 2019 at 14:01, Simon [email protected] wrote:

To allow testing via the website a category containing a "-" we need first to display more categories in the list of filter (by default elasticsearch return the 10 first aggregation). We can add a size value to the aggregation, for example: terms: {size: 500, field: 'categories.name' } in

https://github.com/TheScienceMuseum/collectionsonline/blob/736a0f7c4dbd30100dc7bd71dc841833aaf73c05/lib/facets/aggs-all.js#L20

This will display all the existing category (there are less than 500): [image: image] https://user-images.githubusercontent.com/6057298/54284474-e4a66880-4597-11e9-8531-fe073e5065a2.png

The other categories containing a dash are:

  • X-rays
  • Medical Ceramic-ware
  • Medical Glass-ware
  • Pharmacy-ware
  • Penn-Gaskell Collection

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/TheScienceMuseum/collectionsonline/issues/1008#issuecomment-472432925, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFk5ew-pCPih9VYzzoi57_QzlHvC9zzks5vWQSxgaJpZM4PV0SZ .

jamieu avatar Mar 13 '19 14:03 jamieu

We can first make sure that the category description is displayed. This information is taken from the json file: https://github.com/TheScienceMuseum/collectionsonline/blob/master/description-boxes/category.json So we need to make sure the keys (containing dash) match the categories. For that we can match the key by replacing all the "-" by spaces. This can be done in https://github.com/TheScienceMuseum/collectionsonline/blob/736a0f7c4dbd30100dc7bd71dc841833aaf73c05/lib/create-description-box.js#L8-L23

by changing the function to:

  function keysToLowerCase (obj) {
    var keys = Object.keys(obj);
    var n = keys.length;
    while (n--) {
      var key = keys[n]; // "cache" it, for less lookups to the array
      if (key !== key.toLowerCase().split("-").join(" ")) { // might already be in its lower case version
        obj[key.toLowerCase().split("-").join(" ")] = obj[key]; // swap the value to a new lower case key
        // store the original version as the title if one hasn't been specified
        if (!obj[key.toLowerCase().split("-").join(" ")].title) {
          obj[key.toLowerCase().split("-").join(" ")].title = key;
        }
        delete obj[key]; // delete the old key
      }
    }
    return (obj);
  }

Then I think the best solution to make sure to get the correct result from elasticsearch and to filter with the "real" name is to create a map to make sure the name is correct:

function formatCategoryNames(categories) {
  const categoryNames = {
    "x rays": "x-rays",
    "medical ceramic ware": "medical ceramic-ware",
    "medical glass ware": "medical glass-ware",
    "pharmacy ware": "pharmacy ware",
    "penn gaskell collection": "penn-gaskell collection"
  }
  return categories.map( c => {
      return categoryNames[c.toLowerCase()] || c;
  })
}

Changing the url to add back the dashes might break the current url system and take some time to fix/change the current feature.

SimonLab avatar Mar 13 '19 15:03 SimonLab