opentelemetry-specification icon indicating copy to clipboard operation
opentelemetry-specification copied to clipboard

Database semantic conventions may violate namespacing guidelines

Open tigrannajaryan opened this issue 3 years ago • 5 comments

Problem Description

We have database conventions like this: db.name, db.statement, etc. Here the db is the namespace and we have attributes that are common for all database under this namespace. We have a large number of these.

We also have conventions like this: db.mssql.instance_name where db.mssql is the namespace. The implied idea is that database specific attributes are placed in db.<database-name> namespace, although it is not explicitly called out anywhere.

This is a problem. The enumeration is not bounded and can contain any value in the future. In the future we may need to add database-specific conventions support for a database that has a name that matches any of the numerous attributes under the db namespace.

However, it will be impossible because it will be a violation of namespacing guidelines, which say:

Names SHOULD NOT coincide with namespaces. For example if service.instance.id is an attribute name then it is no longer valid to have an attribute named service.instance because service.instance is already a namespace. Because of this rule be careful when choosing names: every existing name prohibits existence of an equally named namespace in the future, and vice versa: any existing namespace prohibits existence of an equally named attribute key in the future.

We have a situation when future evolution of semantic conventions may be impossible because of the current design.

Possible Solutions

I list a couple solutions below. If you can think of another way please comment so that we can discuss that too.

Solution 1

Move all database specific conventions to a properly isolated namespace, e.g. instead of using db.<database-name> as the namespace use db.special.<database-name> as the namespace or some other namespace that is guaranteed not to clash with any other attributes in other namespaces.

The downside is that we need to change existing conventions and also that database-specific conventions will use somewhat longer attribute names (db.special.cassandra.page_size is longer and less readable than db.cassandra.page_size that we use currently).

Solution 2

Explicitly call out that certain database names are disallowed. This list will contain everything that is already an attribute under db namespace. We can probably also reserve some names for use either as attributes under db namespace (and thus disallow them as database names) or as database names (and thus disallow them as attribute names).

Any future database that has a name that clashes with existing attribute under db namespace will need to have its name transformed such that it no longer conflicts with an attribute name.

For example if a hypothetical future database called "system" needs to have some specific attributes in the conventions then we can place such attribute under db.systemdb namespace to make sure it does not conflict with db.system generic attribute.

The benefit of this solution is that we don't need to change existing conventions.

tigrannajaryan avatar Sep 28 '22 14:09 tigrannajaryan

I think this is a problem that might stay theoretical. So I am in favor of solution 2. We could also suggest a common transformation for such cases. Appending db seems fine, we could also prepend tech_ or similar. We could then specify that "top-level" names must avoid a name ending in db for example.

Oberon00 avatar Sep 28 '22 15:09 Oberon00

@open-telemetry/specs-approvers any thoughts on this?

tigrannajaryan avatar Sep 29 '22 14:09 tigrannajaryan

+1 with what @Oberon00 said https://github.com/open-telemetry/opentelemetry-specification/issues/2847#issuecomment-1261083741

reyang avatar Sep 29 '22 15:09 reyang

I'd also be in favor of # 2, but I'd prefer any standard to the current undefined way to handle such a collision. I mostly agree with # 2 because it's less arduous to implement. I do like the explicitness of the "db.special" namespacing, but the pragmatist in me feels it would be too much disruption to solve an admittedly unlikely problem to occur.

If we do # 2, I propose we explicitly reserve the "*db" suffix for the sake of formality.

hughesjj avatar Oct 03 '22 19:10 hughesjj

OK, I think we are all in agreement so far that #2 is the way to go. We need a PR that makes this clarification in the spec, both in database.md specifically and in attribute-naming.md generally.

tigrannajaryan avatar Oct 04 '22 15:10 tigrannajaryan