Porter stem
@dblock Following #444 let's add a test for a currently available filter.
Actually I wasn't familiar with the testing system and so and it was fun!
This one currently fails with the following as I haven't figured out how to hook any filter. As far as I know I should configure an analyzer but not sure how and couldn't find something similar so I can learn from but the whole thing looks nice!
FAILED _core/search/porter_stem_token_filter.yaml (/Users/ebrahim/opensearch-api-specification/tests/_core/search/porter_stem_token_filter.yaml)
FAILED CHAPTERS
FAILED Search with a match query field.
PASSED PARAMETERS
PASSED index
PASSED REQUEST BODY
PASSED RESPONSE STATUS
FAILED RESPONSE PAYLOAD BODY (expected hits.total.value='1', got '0', missing hits.hits[0]='[object Object]')
PASSED RESPONSE PAYLOAD SCHEMA
Interesting that passes on CI as it actually fails locally with
The test is brought from https://github.com/opensearch-project/OpenSearch/blob/6227dc6ae70d82b7826f8f08bcc57b277c254056/modules/analysis-common/src/test/java/org/opensearch/analysis/common/StemmerTokenFilterFactoryTests.java#L83
@ebraminio Run with -- --verbose how does it fail?
You may need a refresh for search to consistently return results.
Changes Analysis
Commit SHA: 3b60a472b153c6d4b59b8ff5a31bc372ba0778a5 Comparing To SHA: bf2772ad020f3bb3b891f0161ddb33aaff935180
API Changes
Summary
NO CHANGES
Report
The full API changes report is available at: https://github.com/opensearch-project/opensearch-api-specification/actions/runs/10061450423/artifacts/1731274121
API Coverage
| Before | After | Δ | |
|---|---|---|---|
| Covered (%) | 490 (47.99 %) | 490 (47.99 %) | 0 (0 %) |
| Uncovered (%) | 531 (52.01 %) | 531 (52.01 %) | 0 (0 %) |
| Unknown | 24 | 24 | 0 |
For the spell checker, either use an existing movie or add to .cspell.
See other lint failures & stuff, we can skip changelog for tests.
@ebraminio Run with -- --verbose how does it fail?
The problem is I expect it to current fail as I don't how to configure the index yet to use porter_stem yet and it fails but on your CI it doesn't fail which is weird.
FAILED _core/search/porter_stem_token_filter.yaml (/Users/ebrahim/opensearch-api-specification/tests/_core/search/porter_stem_token_filter.yaml)
PASSED PROLOGUES
PASSED POST /movies/_doc
FAILED CHAPTERS
FAILED Search with a match query field.
PASSED PARAMETERS
PASSED index
PASSED REQUEST BODY
PASSED RESPONSE STATUS
FAILED RESPONSE PAYLOAD BODY (expected hits.total.value='1', got '0', missing hits.hits[0]='[object Object]')
PASSED RESPONSE PAYLOAD SCHEMA
PASSED EPILOGUES
PASSED DELETE /movies
[INFO] => POST /movies/_doc ({
"refresh": true
}) [application/json] | {
"director": "Bennett Miller",
"title": "Moneyball",
"year": 2011
}
[INFO] <= 201 (application/json; charset=UTF-8) | {
"_index": "movies",
"_id": "ViQf3pAB2iv_XREEwTRb",
"_version": 1,
"result": "created",
"forced_refresh": true,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
[INFO] => POST /movies/_search ({
"seq_no_primary_term": true
}) [application/json] | undefined
[INFO] <= 200 (application/json; charset=UTF-8) | {
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "movies",
"_id": "ViQf3pAB2iv_XREEwTRb",
"_seq_no": 0,
"_primary_term": 1,
"_score": 1,
"_source": {
"director": "Bennett Miller",
"title": "Moneyball",
"year": 2011
}
}
]
}
}
[INFO] {
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "movies",
"_id": "ViQf3pAB2iv_XREEwTRb",
"_seq_no": 0,
"_primary_term": 1,
"_score": 1,
"_source": {
"director": "Bennett Miller",
"title": "Moneyball",
"year": 2011
}
}
]
}
}
[INFO] => DELETE /movies ({}) [application/json] | undefined
[INFO] <= 200 (application/json; charset=UTF-8) | {
"acknowledged": true
}
The problem is I expect it to current fail as I don't how to configure the index yet to use porter_stem yet and it fails but on your CI it doesn't fail which is weird.
I don't see a mention of the porter-stem in your test, what am I missing?
I don't see a mention of the porter-stem in your test, what am I missing?
Just that I don't know how best I should reference porter_stem yet, I was even trying to set the language to porter2 with no luck (honestly I don't even know what Porter language is) but if I figure this out I can add test for this and for the merged patch and can even hook another stemmer in OpenSearch but this time there is an established example on how best it can be done.
I couldn't find an OpenSearch doc, but I think it's the same as in https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-porterstem-tokenfilter.html.
Do open an issue in https://github.com/opensearch-project/documentation-website to document how to use token filters.
Thank you so much for the help, apparently I'm getting somewhere but not there yet, I tried swapping consolingly with consolingli which is brought from https://github.com/opensearch-project/OpenSearch/blob/6227dc6ae70d82b7826f8f08bcc57b277c254056/modules/analysis-common/src/test/java/org/opensearch/analysis/common/StemmerTokenFilterFactoryTests.java#L83 maybe I just don't know something about the language, will try persian_stem also,
[INFO] => PUT /movies ({}) [application/json] | {
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"porter_stem"
]
}
}
}
}
}
[INFO] <= 200 (application/json; charset=UTF-8) | {
"acknowledged": true,
"shards_acknowledged": true,
"index": "movies"
}
[INFO] => POST /movies/_doc ({
"refresh": true
}) [application/json] | {
"director": "Consolingly",
"title": "Moneyball",
"year": 2011
}
[INFO] <= 201 (application/json; charset=UTF-8) | {
"_index": "movies",
"_id": "LbYe4JAB6G__aQMm1-87",
"_version": 1,
"result": "created",
"forced_refresh": true,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
[INFO] => POST /movies/_search ({}) [application/json] | {
"size": 1,
"query": {
"match": {
"director": "Consolingli"
}
}
}
[INFO] <= 200 (application/json; charset=UTF-8) | {
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
[INFO] {
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
[INFO] => DELETE /movies ({}) [application/json] | undefined
[INFO] <= 200 (application/json; charset=UTF-8) | {
"acknowledged": true
}
FAILED _core/search/porter_stem_token_filter.yaml (/Users/ebrahim/opensearch-api-specification/tests/_core/search/porter_stem_token_filter.yaml)
PASSED PROLOGUES
PASSED PUT /movies
PASSED POST /movies/_doc
FAILED CHAPTERS
FAILED Search with a match query field.
PASSED PARAMETERS
PASSED index
PASSED REQUEST BODY
PASSED RESPONSE STATUS
FAILED RESPONSE PAYLOAD BODY (expected hits.total.value='1', got '0', missing hits.hits[0]='[object Object]')
PASSED RESPONSE PAYLOAD SCHEMA
PASSED EPILOGUES
PASSED DELETE /movies
It's getting somewhere! I am not sure why the filter isn't working, but i you can get any filter working that would be a useful contribution.
We should start organizing things better too, so I would move tests/_core/search/porter_stem_token_filter.yaml to something like tests/_core/search/tokenizers/filters/porter_stem.yaml, this way it will be easy to copy-paste a test for another one.
Done in #592. Thank you so much