loguru icon indicating copy to clipboard operation
loguru copied to clipboard

StrRecord Design

Open t-mart opened this issue 6 years ago • 4 comments

I'm concerned about the design of StrRecord and its record attribute. It feels wrong in a few ways:

  • From a object design perspective, when StrRecord is used as a str, it is representation of the items in its record attribute that have been formatted. But it's string value is immutable and the items in record are not. Therefore, it's possible for these 2 realities to be out-of-sync by modifying the items in record.
  • AFAIK, the items in record are fixed. For example, you can expect a level key and a time key, etc. But a dict type isn't really for well-defined objects. It's for mapping, adding, popping, iteration, etc. With a dict, anything goes. Therefore, when working with records, you have to check for each key every time to be truly safe and have no type guarantees. For more reading on why dicts are not ideal in this case, check out this section in the attrs project documentation.
  • From an API perspective, StrRecord is passed to sinks, but the contextual information dict (the record attribute) is passed to filters. This makes working with log records feel inconsistent.

It'd be more consistent to have a class, say Record, with the items in the contextual dictionary as attributes instead. To produce formatted message, a method should be offered, such as format(), or even just override __str__(). A method would better signal that this value is to be generated from the attributes of the instance.

And with a class, the attributes become well-defined, so you'd be providing a fixed API to users about what a record is. To be clear, above, I'm not necessarily advocating for attrs here, but I definitely think a class is the right idea.

(With a Record class, you'd still have to check for keys in the extra attribute because those are user-provided.)

Finally, you could then pass Record instances to both sinks and filters and make working with them more consistent.

I do appreciate how loguru chose to (mostly) use standard library types to represent these concepts, but I don't think it's the right choice here.

I'm bringing this up for discussion with love and for constructive criticism. Not trying to hate. (I really like this project.)

t-mart avatar Nov 21 '19 04:11 t-mart

I'm bringing this up for discussion with love and for constructive criticism. Not trying to hate. (I really like this project.)

Don't worry, I really appreciate you taking the time to write all this, sharing your concerns and suggesting improvements! That's really great, thanks. :wink:

So, as I understand, you are raising two issues:

  • The StrRecord should be replaced with a Record associated to a formatting method
  • The record should not be a dict as it does not reflect well the static nature of the structure

About the first point, I guess it's a question of practicality versus purity. Sure, it's probably more elegant if the syncing between the record and its representation is enforced by having the user calling himself the formatting method. However, in my opinion, it's much more enjoyable to receive a already formatted string, because that's what the handler is configured for. Loguru gives you the guarantee that the record and its string are not out-of-sync. Only reason for them to be out-of-sync is because the user purposefully decided to modify the record dict, so it should not surprise him. Actually, if the record does not inherit from str, this makes calling sys.stderr.write(record) impossible. So at that point, I think it's a dead end. :/

About the inconsistency between the sink and filter: if the log record is filtered out, there is not point to format it. That's why the filter function receives a simple dict and not the StrRecord. At least, it's consistent with the format function. :smile:

Finally, about the second point. I do agree that a dict is not the proper data structure. If I had to do it again, I would probably use some kind dataclass. As you have guessed, I first choose a dict because it is the most well-known Python data type. It's everywhere, everybody knows what to expect from a dict, so this makes less things to explain. The record does not need any other method, so instead of creating another new type, I just decided to use a dict. At some point, I considered making this immutable, but I no longer care much. If the user decides to add or remove a key, let it be, he probably has a good reason to do so. As it stands, I don't see myself implementing such breaking change, as it would not bring much but rather restrict the user slightly more. Again, I would not deny it was an inaccurate design choice, but I can live with that, especially considering such dict has actually its own type: TypedDict.

Delgan avatar Nov 22 '19 01:11 Delgan

@Delgan

Only reason for them to be out-of-sync is because the user purposefully decided to modify the record dict, so it should not surprise him.

I agree. You'd have to shoot yourself in the foot to get out of sync.

practicality versus purity

I agree here too.

breaking change

Yeah, this would require a major version upgrade and probably a lot of work for a theoretical concern. (Actually, I'm not sure what versioning scheme you're following, but if it's the semantic one, and we're pre-v1, this could hypothetically be added at any time, but nevermind. Nonetheless, what are your goals to get to v1?)

t-mart avatar Nov 22 '19 06:11 t-mart

Do you see any practical benefits of replacing the dict with a dataclass, apart from correctness? I think the use of TypedDict partly solves the static structure problem.

I try to follow semantic versioning, but Loguru has been in 0.x.y since one year. Although there is no guarantee per semver specs, the API looks quite stable at that point.

I would like to publish a version implementing C extension for improved logging performance before releasing v1.

Delgan avatar Nov 22 '19 10:11 Delgan

Your mention of TypedDict is the first I'd heard about it (seems cool!), but reading that PEP, yes, it seems like a step in the right direction:

  • Value types can be checked
  • Unspecified new keys will cause a type checker error to be raised
  • If total=True, the default, all keys must exist or else type checker error is raised
  • (I wonder if a type checker would complain about deleteing a specified key? Not in front of an editor at the moment to check)

One thing that caught my eye in the PEP is:

Dataclasses are a more recent alternative to solve this use case, but there is still a lot of existing code that was written before dataclasses became available, especially in large existing codebases where type hinting and checking has proven to be helpful.

It kind of seems like they're saying code written since dataclasses have been released should be using dataclasses for this use case. Hehe, sorry. I'm not completely happy with TypedDict, but am definitely happiER.

t-mart avatar Nov 22 '19 11:11 t-mart