optimade-python-tools Allow to use the server code as a library in the context of another application

The server part of OPT is designed as a standalone application, which makes it hard to use it as a library, e.g. include it in another application. I in particular want to use OTP server as part of our own fastapi app to offer optimade alongside our other APIs. There are several aspects that are hard to integrate:

config: I have some config parameters that I want to set from our own config system. But this is hard, when a config file is the only option to provide a config. Also the way the config is written, makes it especially hard to somehow monkey patch it.
logging: We have our own hierarchy of loggers/handlers and need a way to integrate the OTP logging.
entry_collections: As a non mongo implementation, we need to set our own entry_collection class.
customized error handling: This is related to logging, as we also want to log certain API HTTP errors.

To give you an idea, this how we need to monkey patch the current OTP to make it work for us:

import os
import sys
import importlib

# patch optimade python tools config (patched module most be outside this module to force import before optimade)
os.environ['OPTIMADE_CONFIG_FILE'] = os.path.join(os.path.dirname(__file__), 'optimade_config.json')

# patch optimade logger (patched module most be outside this module to force import before optimade)
sys.modules['optimade.server.logger'] = importlib.import_module('nomad.app_fastapi.optimade_logger')

# patch optimade base path
from nomad import config, utils  # nopep8
from optimade.server.config import CONFIG  # nopep8
CONFIG.root_path = "%s/optimade" % config.services.api_base_path

from optimade.server import main as optimade  # nopep8
from optimade.server.main import app as optimade_app  # nopep8
from optimade.server.routers import structures  # nopep8

# remove all the test data
from optimade.server.routers import ENTRY_COLLECTIONS  # nopep8

for name, collection in ENTRY_COLLECTIONS.items():
    if name == 'links':
        collection.collection.drop()
        collection.collection.insert_one({
            "id": "index",
            "type": "links",
            "name": "Index meta-database",
            "description": "Index for NOMAD databases",
            "base_url": "http://providers.optimade.org/index-metadbs/nmd",
            "homepage": "https://nomad-lab.eu",
            "link_type": "root"
        })
    else:
        collection.collection.drop()

# patch the structure collection with out elasticsearch implementation
from .elasticsearch import ElasticsearchStructureCollection  # nopep8
from .filterparser import parse_filter  # nopep8

structures.structures_coll = ElasticsearchStructureCollection()
optimade.add_major_version_base_url(optimade.app)

# patch exception handlers
logger = utils.get_logger(__name__)
exception_handlers = sys.modules['optimade.server.exception_handlers']
original_handler = getattr(exception_handlers, 'general_exception')


def general_exception(request, exc, status_code=500, **kwargs):
    if getattr(exc, 'status_code', status_code) >= 500:
        logger.error(
            'unexpected exception in optimade implementation',
            status_code=status_code, exc_info=exc, url=request.url)

    return original_handler(request, exc, status_code, **kwargs)


setattr(exception_handlers, 'general_exception', general_exception)

This is obviously not ideal and depends on a lot of OTP internals that are prone to change.

I do not have any specific suggestions on how to fix yet. I feel there should be a discussion about wether you want to move OTP in the library direction first.

Jan 05 '21 12:01 markus1978

Hi @markus1978, thanks for the thorough explanation. Any suggestions to make the app more reusable are very welcome, as at the moment every implementation using OPT has just used the app code as a template to make their own. I would hope that the example app code is now fairly stable, and that additions will all be in the implementation details of the underlying classes, so hopefully the app code itself can still be used as a reasonable template, if you would consider that route.

It looks like finishing off #339 and making the collection backend more configurable inside OPT will only remove a couple of lines of monkey-patching for you, though of course it should remove some of the development burden from your end regarding some OPT features (e.g. property aliasing).

Just to go through your points:

config: I have some config parameters that I want to set from our own config system. But this is hard, when a config file is the only option to provide a config. Also the way the config is written, makes it especially hard to somehow monkey patch it.

This is one place we could definitely improve, would a set_config(...) method that is happy to take any pydantic BaseSettings work? That way you should be able to mixin the default ServerConfig class with your own quite easily.

logging: We have our own hierarchy of loggers/handlers and need a way to integrate the OTP logging.

Not too sure what we can do here, so would be happy to hear suggestions...

entry_collections: As a non mongo implementation, we need to set our own entry_collection class.

Hopefully this is covered by my 2nd paragraph, if we can add elasticsearch as a new backend then hopefully things will become much more general here.

customized error handling: This is related to logging, as we also want to log certain API HTTP errors.

Similar to the logging issue, I guess this would need a more general solution. The existing OPTIMADE_EXCEPTIONS tuple is very picky about ordering, but we may be able to get away with a dictionary that should allow access via e.g. OPTIMADE_EXCEPTIONS[Exception] = nomad_exception_handler(...).

Jan 05 '21 13:01 ml-evs

If you're attending the OPTIMADE meeting tomorrow perhaps we could chat briefly afterwards (with @shyamd and @CasperWA too) as to what the best plan of action is?

Jan 05 '21 13:01 ml-evs

The server part of OPT is designed as a standalone application, which makes it hard to use it as a library, e.g. include it in another application. I in particular want to use OTP server as part of our own fastapi app to offer optimade alongside our other APIs. There are several aspects that are hard to integrate:

config: I have some config parameters that I want to set from our own config system. But this is hard, when a config file is the only option to provide a config. Also the way the config is written, makes it especially hard to somehow monkey patch it.

logging: We have our own hierarchy of loggers/handlers and need a way to integrate the OTP logging.

entry_collections: As a non mongo implementation, we need to set our own entry_collection class.

customized error handling: This is related to logging, as we also want to log certain API HTTP errors.

So most of these things are as intended, as far as I know.

Config

This can be set in a multitude of ways, see the documentation here. The major ways include:

A JSON config file.
Setting environment variable starting with OPTIMADE_ (I think this is what you want).
Changing the configuration after importing optimade.server.config.CONFIG.

The latter is intertwined a bit with logging due to some technicalities, but it shouldn't be a problem?

Logging

As the logging is using the standard Python logging package, it should be straightforward to alter and shape it as you wish. The logger is named as "optimade" here.

Entry collections

The point here is that one needs to create an entry collection class for each specific backend, as this is where all the meat of the operation is. Here one sets up and handles the query parameters, retrieves the entry listing models from the specific backend and compiles the response. As query parameters should be handled differently from backend to backend, this is a necessary step when using these tools for a "new" backend. See, e.g., my work with AiiDA here for an example of a non-Mongo entry collection. I don't sub-class the OPTIMADE Python Tools' base entry collection, but that is a work in progress and should save me a lot of lines of code when that work is done.

Error handlers

Any additional, specific handling of errors or warnings should always be set up for the specific service. Hence, error handlers should be easily added to the application based on newly created Exception classes.

If there are general new error handlers and exceptions (as well as warnings) it would be great to have them contributed upstream to this repository!

For custom Warnings, one can subclass OptimadeWarning and they should automatically appear in the response under meta -> warnings.

This is obviously not ideal and depends on a lot of OTP internals that are prone to change.

I do not have any specific suggestions on how to fix yet. I feel there should be a discussion about wether you want to move OTP in the library direction first.

I personally already use optimade as a package/library for both the Materials Cloud server as well as the OPTIMADE client and other things. I don't see an argument for this not already being usable as a general package.

Jan 05 '21 13:01 CasperWA

Thanks for your answers suggestions. A few more thoughts:

Config: I guess @CasperWA 's last suggestions is fine (and I actually did this setting the root path). But you have to be careful to import other server modules after. I remember, I was a little frustrated not being able to alter the file location without setting an env var.

Logging: With the current OTP logging, I need to replace the handler that you created (ok, not too bad), but also do you create directory/log files on import that I don't want. It would be good if all handler setup would be optional.

Entry collections: I remember that you were already working on this. I'll wait.

Error handling: It is not so much about adding handling for new exceptions, but more altering the behaviour of the existing OTP handlers. In particular, I wanted to create a log entry on >=500 requests. Currently this is only done with CONFIG.debug. Tight into the logging issue: I also want to log errors differently with additional parameters for url, parameters, status_code, etc. We use structlog on top of python logging which allows us to create structured log entries. Each log entry is basically a dict that is put into elasticsearch and can be searched/analyzed with kibana; see ELK. Being able to hook a callback into general_exception would be nice. But I agree, this is a very special need.

Jan 05 '21 15:01 markus1978

If you're attending the OPTIMADE meeting tomorrow perhaps we could chat briefly afterwards (with @shyamd and @CasperWA too) as to what the best plan of action is?

Sure, I would love to discuss face to face.

Jan 05 '21 15:01 markus1978

Cheers @markus1978, I think I have a much better grasp of your issues and use case now. And indeed, a meeting would be great. I should have time after tomorrow's OPTIMADE meeting for sure 👍

Jan 05 '21 16:01 CasperWA