opensearch-api-specification icon indicating copy to clipboard operation
opensearch-api-specification copied to clipboard

Porter stem

Open ebraminio opened this issue 1 year ago • 10 comments

@dblock Following #444 let's add a test for a currently available filter.

Actually I wasn't familiar with the testing system and so and it was fun!

This one currently fails with the following as I haven't figured out how to hook any filter. As far as I know I should configure an analyzer but not sure how and couldn't find something similar so I can learn from but the whole thing looks nice!

FAILED  _core/search/porter_stem_token_filter.yaml (/Users/ebrahim/opensearch-api-specification/tests/_core/search/porter_stem_token_filter.yaml)
    FAILED  CHAPTERS
        FAILED  Search with a match query field.
            PASSED  PARAMETERS
                PASSED  index
            PASSED  REQUEST BODY
            PASSED  RESPONSE STATUS
            FAILED  RESPONSE PAYLOAD BODY (expected hits.total.value='1', got '0', missing hits.hits[0]='[object Object]')
            PASSED  RESPONSE PAYLOAD SCHEMA

ebraminio avatar Jul 22 '24 20:07 ebraminio

Interesting that passes on CI as it actually fails locally with

image

The test is brought from https://github.com/opensearch-project/OpenSearch/blob/6227dc6ae70d82b7826f8f08bcc57b277c254056/modules/analysis-common/src/test/java/org/opensearch/analysis/common/StemmerTokenFilterFactoryTests.java#L83

ebraminio avatar Jul 22 '24 20:07 ebraminio

@ebraminio Run with -- --verbose how does it fail?

You may need a refresh for search to consistently return results.

dblock avatar Jul 22 '24 22:07 dblock

Changes Analysis

Commit SHA: 3b60a472b153c6d4b59b8ff5a31bc372ba0778a5 Comparing To SHA: bf2772ad020f3bb3b891f0161ddb33aaff935180

API Changes

Summary

NO CHANGES

Report

The full API changes report is available at: https://github.com/opensearch-project/opensearch-api-specification/actions/runs/10061450423/artifacts/1731274121

API Coverage

Before After Δ
Covered (%) 490 (47.99 %) 490 (47.99 %) 0 (0 %)
Uncovered (%) 531 (52.01 %) 531 (52.01 %) 0 (0 %)
Unknown 24 24 0

github-actions[bot] avatar Jul 22 '24 22:07 github-actions[bot]

For the spell checker, either use an existing movie or add to .cspell.

See other lint failures & stuff, we can skip changelog for tests.

dblock avatar Jul 22 '24 22:07 dblock

@ebraminio Run with -- --verbose how does it fail?

The problem is I expect it to current fail as I don't how to configure the index yet to use porter_stem yet and it fails but on your CI it doesn't fail which is weird.

FAILED  _core/search/porter_stem_token_filter.yaml (/Users/ebrahim/opensearch-api-specification/tests/_core/search/porter_stem_token_filter.yaml)
    PASSED  PROLOGUES
        PASSED  POST /movies/_doc
    FAILED  CHAPTERS
        FAILED  Search with a match query field.
            PASSED  PARAMETERS
                PASSED  index
            PASSED  REQUEST BODY
            PASSED  RESPONSE STATUS
            FAILED  RESPONSE PAYLOAD BODY (expected hits.total.value='1', got '0', missing hits.hits[0]='[object Object]')
            PASSED  RESPONSE PAYLOAD SCHEMA
    PASSED  EPILOGUES
        PASSED  DELETE /movies

[INFO] => POST /movies/_doc ({
  "refresh": true
}) [application/json] | {
  "director": "Bennett Miller",
  "title": "Moneyball",
  "year": 2011
}
[INFO] <= 201 (application/json; charset=UTF-8) | {
  "_index": "movies",
  "_id": "ViQf3pAB2iv_XREEwTRb",
  "_version": 1,
  "result": "created",
  "forced_refresh": true,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}
[INFO] => POST /movies/_search ({
  "seq_no_primary_term": true
}) [application/json] | undefined
[INFO] <= 200 (application/json; charset=UTF-8) | {
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "movies",
        "_id": "ViQf3pAB2iv_XREEwTRb",
        "_seq_no": 0,
        "_primary_term": 1,
        "_score": 1,
        "_source": {
          "director": "Bennett Miller",
          "title": "Moneyball",
          "year": 2011
        }
      }
    ]
  }
}
[INFO] {
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "movies",
        "_id": "ViQf3pAB2iv_XREEwTRb",
        "_seq_no": 0,
        "_primary_term": 1,
        "_score": 1,
        "_source": {
          "director": "Bennett Miller",
          "title": "Moneyball",
          "year": 2011
        }
      }
    ]
  }
}
[INFO] => DELETE /movies ({}) [application/json] | undefined
[INFO] <= 200 (application/json; charset=UTF-8) | {
  "acknowledged": true
}

ebraminio avatar Jul 23 '24 05:07 ebraminio

The problem is I expect it to current fail as I don't how to configure the index yet to use porter_stem yet and it fails but on your CI it doesn't fail which is weird.

I don't see a mention of the porter-stem in your test, what am I missing?

dblock avatar Jul 23 '24 12:07 dblock

I don't see a mention of the porter-stem in your test, what am I missing?

Just that I don't know how best I should reference porter_stem yet, I was even trying to set the language to porter2 with no luck (honestly I don't even know what Porter language is) but if I figure this out I can add test for this and for the merged patch and can even hook another stemmer in OpenSearch but this time there is an established example on how best it can be done.

ebraminio avatar Jul 23 '24 14:07 ebraminio

I couldn't find an OpenSearch doc, but I think it's the same as in https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-porterstem-tokenfilter.html.

Do open an issue in https://github.com/opensearch-project/documentation-website to document how to use token filters.

dblock avatar Jul 23 '24 14:07 dblock

Thank you so much for the help, apparently I'm getting somewhere but not there yet, I tried swapping consolingly with consolingli which is brought from https://github.com/opensearch-project/OpenSearch/blob/6227dc6ae70d82b7826f8f08bcc57b277c254056/modules/analysis-common/src/test/java/org/opensearch/analysis/common/StemmerTokenFilterFactoryTests.java#L83 maybe I just don't know something about the language, will try persian_stem also,

[INFO] => PUT /movies ({}) [application/json] | {
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "porter_stem"
          ]
        }
      }
    }
  }
}
[INFO] <= 200 (application/json; charset=UTF-8) | {
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "movies"
}
[INFO] => POST /movies/_doc ({
  "refresh": true
}) [application/json] | {
  "director": "Consolingly",
  "title": "Moneyball",
  "year": 2011
}
[INFO] <= 201 (application/json; charset=UTF-8) | {
  "_index": "movies",
  "_id": "LbYe4JAB6G__aQMm1-87",
  "_version": 1,
  "result": "created",
  "forced_refresh": true,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}
[INFO] => POST /movies/_search ({}) [application/json] | {
  "size": 1,
  "query": {
    "match": {
      "director": "Consolingli"
    }
  }
}
[INFO] <= 200 (application/json; charset=UTF-8) | {
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}
[INFO] {
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}
[INFO] => DELETE /movies ({}) [application/json] | undefined
[INFO] <= 200 (application/json; charset=UTF-8) | {
  "acknowledged": true
}

FAILED  _core/search/porter_stem_token_filter.yaml (/Users/ebrahim/opensearch-api-specification/tests/_core/search/porter_stem_token_filter.yaml)
    PASSED  PROLOGUES
        PASSED  PUT /movies
        PASSED  POST /movies/_doc
    FAILED  CHAPTERS
        FAILED  Search with a match query field.
            PASSED  PARAMETERS
                PASSED  index
            PASSED  REQUEST BODY
            PASSED  RESPONSE STATUS
            FAILED  RESPONSE PAYLOAD BODY (expected hits.total.value='1', got '0', missing hits.hits[0]='[object Object]')
            PASSED  RESPONSE PAYLOAD SCHEMA
    PASSED  EPILOGUES
        PASSED  DELETE /movies

ebraminio avatar Jul 23 '24 15:07 ebraminio

It's getting somewhere! I am not sure why the filter isn't working, but i you can get any filter working that would be a useful contribution.

We should start organizing things better too, so I would move tests/_core/search/porter_stem_token_filter.yaml to something like tests/_core/search/tokenizers/filters/porter_stem.yaml, this way it will be easy to copy-paste a test for another one.

dblock avatar Jul 23 '24 16:07 dblock

Done in #592. Thank you so much

ebraminio avatar Oct 01 '24 05:10 ebraminio