Refactor & test deployment configuration (e.g. RulesTxtDeploymentService) for Elasticsearch support
Deployment possibilities for SMUI have grown rapidly. The configuration is hard to understand & corresponding code is hard to maintain - this includes:
- RulesTxtDeploymentService
- application.conf
especially.
Approach:
Step#1: document all deployment possibilities, that should be supported by SMUI (already take future Elasticsearch support , #43 , into account). Step#2: derive a config schema (for application.conf). Step#3: refactor the code (breaking change)
The major goal of this story is to:
- Refactor the deployment part of SMUI, so that the complexity of 2nd search engine (Elasticsearch) supported, can be handled & maintained in the future.
- Include a ES deployment procedure as a proof of concept (and by that prepare #43).
Constraints:
- As the deployment configuration needs to evolve a breaking change to
v4is suggested. Prior versions of SMUI need to be adjusted to that new configuration specification. - Within the scope of this issue, the deployment options within SMUI should be reworked and realise a deployment to Elasticsearch with status-quo rules.txt files comming from SMUI. Adjusting & validating Solr- vs Elasticsearch-rules should be part of #43.
SMUI's deployment options (plan):
-
localdeployment: for DEV setup, no need for a Querqy-enabled search engine. -
solr-localdeployment: evolution ofconf/smui2solr.sh. -
git-repositorydeployment: evolution ofconf/smui2git.sh. -
elasticsearchdeployment: new deployment procedure to support Elasticsearch. - Furthermore custom deployment procedures should be supported. The deployment procedure from Chorus is a good example. It should be adopted as a proof of concept for a custom deployment procedure.
The following refactoring steps are suggested in order to sustain maintainability for SMUI with respect to the deployment options:
- All shell scripts realising the different deployment procedures should be bundled in:
conf/deployment). Renaming should be done accordingly (e.g.conf/deployment/solr-local.sh, see above). - All deployment shell scripts should be operatable in an "ECHO mode" (echoing an identification and all its command line params) in order to be testable without the target system setup (e.g. Solr) being present (see e.g.:
test/services/RulesTxtDeploymentServiceConfigVariantsSpec.scala). - The concept of a "solr index" should evolve to a "rules collection". This includes:
- the database schema (SQL),
- the scala model (e.g.:
app/models/SolrIndex.scalaand all of its references), - the REST API (e.g.:
/api/v1/solr-index, seeconf/routes), - the documentation (setup instructions and Chorus),
- the frontend code.
- Resolve all deployment configurations done in
app/models/FeatureToggleModel.scalainto an explicit deployment model underapp/models/config. - Refactor
app/services/RulesTxtDeploymentService.scalaaccordingly. - Temp files should be an implementation detail of SMUI (if necessary). There should exist an explicit
exportfolder forrules.txtfiles (as a default to all the deployment scripts above). -
localshould be the default deployment (especially for a "Quickstart", see documentation).
Explicit deployment configuration:
- Configuration options are unstructured and "all over the place" in
v3. Those should be resolved (see above) and made explicit (using custom parameters specific to the deployment procedure), e.g.:
smui.deployment.PRELIVE = {
'procedure': 'conf/deployment/git-repository.sh',
'params': {
'repo': 'https+ssh://my-repo-on.domain.tld'
...
}
}
- The interface to the deployment script could like this:
{SMUI_DEPLOYMENT_PROCEDURE}.sh {DEPLOYMENT_INSTANCE} {RULES_COLLECTION_NAME} {EXPORT_PATH} {RULES_TXT_FILE(S) as ordered comma separated list} {PROCEDURE_SPECIFIC_PARAMS as --key=value}
e.g.:
git-repository.sh PRELIVE ecommerce /export common-rules.txt,decompound-rules.txt,spelling-rules.txt --repo=https+ssh://my-repo-on.domain.tld ...
- There could also be the possibility to address a native Scala deployment procedure, e.g.:
smui.deployment.PRELIVE = {
'procedure': 'services.deployment.ElasticsearchDeployment',
'params': {
'url': 'https://my-elasticsearch-instance-on.domain.tld'
...
}
}
- Configuration should be done in SMUI through an explicit
smui.conffile (like Chorus does it). There should be no env var option as the configuration is too complex (localdeployment will remain default). - Explicit validation of SMUI startup (presenting an error on misconfiguration) is needed.
- As the change for SMUI - with this planned
v4- is breaking it should be considered to split configuration into "setup" & "customisation" in general, where only "setup" configurations can be controlled via env vars, and all "customisation" configurations should be done via asmui.conf(see above, this could account for e.g.toggle.activate-spelling). This should include the tag configuration, now being done via an explicit, extra JSON file. - The documentation on querqy-docs for SMUI must be adopted accordingly.
Note: As time of planning this major change, SMUI refactorings (splitting frontend & backend implementation) take place. The following branches are relevant:
- https://github.com/querqy/smui/tree/refactoring_2020
- https://github.com/querqy/querqy-docs/tree/smui_v3_12/docs/source/smui
I'm planning on removing the jackhanna script in favour of the single upload capability for ConfigSets, which should probably be how the zk-solr-cloud.sh interacts with Solr! Maybe rename it to solr-cloud.sh? See https://github.com/querqy/chorus/issues/22.
@epugh @pbartusch Please keep in mind that https://github.com/querqy/querqy/issues/76 will be a breaking change: the rules.txt as no longer be deployed as such but the rules will be embedded into a JSON HTTP request (very similar to Querqy for ES). Also, the direct interaction with ZK or any direct interaction with the configset will be removed (and the collection reload as well).
It is very likely, that we can test a release candidate in production as soon as January. I think we need this kind of 'beta version' this time given the scope of the change.
Long story short: please do not invest any time into making the current deployment of rules.txt to Solr better - it will be replaced very soon.
@renekrie , thanks for the hint.
Long story short: please do not invest any time into making the current deployment of rules.txt to Solr better - it will be replaced very soon.
that is not the plan. the focus of the concept described above lies on different deployment options in general.
that is not the plan. the focus of the concept described above lies on different deployment options in general.
@pbartusch I was a bit worried because earlier you said:
Chorus should be adjusted to the newly adopted zk-solr-cloud deployment procedure as a first proof of concept.
I assume that zk-solr-cloud deployment will become outdated very soon.
ah. got it. ok , it wasnt ment to the be the focus, but I understand the concern. Thanks , @renekrie .
Then it seems better to make the smui2solrcloud.sh a proof of concept for a custom deployment procedure. I will adjust https://github.com/querqy/smui/issues/56#issuecomment-745107324 accordingly.
@epugh , now I got your point as well. Regarding:
[...] in favour of the single upload capability for ConfigSets, which should probably be how the zk-solr-cloud.sh interacts with Solr
I suggest to add this deployment procedure (once its available in Solr/Querqy) to SMUI instead of Chorus as the solr-cloud.sh you suggested.
I will not make this part of this issue/story (obviously ;-)), but we should develop it within the scope of SMUI and adjust Chorus accordingly.
@renekrie , will there stay the solr-local deployment procedure possibility in Solr? (meaning: cp the rules.txt and then perform a core reload)
Or will that be deprecated as well?
This will be the same HTTP call like for SolrCloud
Just a heads-up: I've just merged a PR for https://github.com/querqy/querqy/issues/116 to querqy-core.
This would give you the option to manage ES/Solr specifics via templates in the rules file. For example, a down boost on a field could look like this:
notebook =>
UP(10): asus
<< field_down: factor=20 || fieldname=category || value=accessories >>
At the beginning of the file, you would have to prepend the search-engine-specific template:
# either Solr:
def field_down(factor, fieldname, value):
DOWN($factor): * $fieldname:(value)
# or Elasticsearch:
def field_down(factor, fieldname, value):
DOWN($factor): * "match": { "$fieldname": { "query": "$value" }}
If it helps, we could probably add docstring documentation to the templates à la:
def field_down(factor, fieldname, value):
"""Use this to penalise documents that contain a certain value in the specified field.
:param factor: the penalisation factor
:param fieldname: the field name
:param value: the field value
:type factor: float
:type fieldname: string
:type value: string
"""
DOWN($factor): * $fieldname:(value)
This would probably enable SMUI to generate a form input in the UI from the template. At the most advanced end, we could let users create and manage their own templates in SMUI, including for more complex function queries.
Do you think it might be useful to have the ability to define a raw query to a rule as well (i.e. everything after the '*')? E.g. as a specific option in the UI instead of choosing from suggested fields and putting a field for a value. The advantage would be to enable basically all use cases for rules through SMUI. It could enable Elastic Rules completely as a first step and circumvent the templates discussion and similar approaches. Tradeoff being the higher risk of human error when writing raw query syntax unless there is validation added to these inputs.
Update: It seems to be already possible through toggle.ui-concept.all-rules.with-solr-fields=false which renders the Term as is and does not throw any validation errors. So import, UI edit, export seems to be all working with Elastic Rules.
@pbartusch Is there some activity planned on this issue? While refactoring, could the concept of SOLR_BASE_URL (e.g. http://localhost:8983/solr) versus SOLR_HOST (that then gets hardcoded build to the SOLR_BASE_URL). The advantage of the SOLR_BASE_URL would be that it will enable the customer to use http and https (and a possible different application root replacing "/solr").
See #82 which is specific to @pbartusch comment back in December 2020!