azure-sdk-for-java icon indicating copy to clipboard operation
azure-sdk-for-java copied to clipboard

Lower levels of two log statements

Open Blackbaud-JasonBodnar opened this issue 1 year ago • 3 comments

Description

Lower the levels of two logging statements. According to the Log4J docs WARN should be used for "an event that might possible lead to an error". Starting up a client is not an event that could lead to an error.

All SDK Contribution checklist:

  • [ X ] The pull request does not introduce [breaking changes]
  • [ X ] CHANGELOG is updated for new features, bug fixes or other significant changes.
  • [ X ] I have read the contribution guidelines.

General Guidelines and Best Practices

  • [ X ] Title of the pull request is clear and informative.
  • [ X ] There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • [ NA ] Pull request includes test coverage for the included changes.

Blackbaud-JasonBodnar avatar Jun 26 '24 15:06 Blackbaud-JasonBodnar

Thank you for your contribution @Blackbaud-JasonBodnar! We will review the pull request and get back to you soon.

github-actions[bot] avatar Jun 26 '24 15:06 github-actions[bot]

Thanks @Blackbaud-JasonBodnar for this PR, while it makes sense to treat this log as info, however, I think the reason this was put as warn is because of the importance of client creation event. We have seen customers switching off the info logs in their application and ends up creating multiple cosmos clients without noticing it. That's why this was changed from info to warn here - https://github.com/Azure/azure-sdk-for-java/pull/39169

@FabianMeiswinkel please provide more information on this change if the above doesn't make sense!

kushagraThapar avatar Jun 26 '24 16:06 kushagraThapar

Thanks @Blackbaud-JasonBodnar for this PR, while it makes sense to treat this log as info, however, I think the reason this was put as warn is because of the importance of client creation event. We have seen customers switching off the info logs in their application and ends up creating multiple cosmos clients without noticing it. That's why this was changed from info to warn here - #39169

@FabianMeiswinkel please provide more information on this change if the above doesn't make sense!

But you are breaking standards. WARN should only be used for potential problems. If there is an issue with multiple clients being created then that should be logged as WARN.

Organizations take WARN and ERROR messages very seriously often alerting employees when too many are seen during a time window. My organization reviews our WARN and ERROR messages every day and these messages just obscure real messages that need to looked into.

Looking at the commit you refer to I see in one file:

  private def logInfoOrWarning(msg: => String): Unit = {
    if (this.verboseLoggingAfterReEnqueueingRetriesEnabled.get()) {
      log.logWarning(msg)
    } else {
      log.logInfo(msg)
    }
  }

Why wasn't something similar used for the CosmosClientBuilder? Or better yet, log it at the appropriate INFO or DEBUG level and when there is an issue tell customers to set their logging appropriately so they can see these messages. That's why we have different log levels and it is so easy to change the level of what is logged for each instance.

Blackbaud-JasonBodnar avatar Jun 26 '24 16:06 Blackbaud-JasonBodnar

The other issue with logging these as WARN is that costs us money. We retain INFO, WARN and ERROR logs for 90 days. Other levels are only retained for a week. We're deployed in 4 zones in Azure with at least two nodes in each zone. If we release 4 times a day that's a minimum of 64 messages of useless information a day that we're now keeping for 90 days vs 7 days.

Blackbaud-JasonBodnar avatar Jul 18 '24 21:07 Blackbaud-JasonBodnar

@Blackbaud-JasonBodnar - we understand the concern, however, this specific log level is very important for our debugging purposes, because customers unintentionally create multiple clients, and sometimes not log info messages.

That being said, for your use case, we can change these log levels behind an environment flag, which will allow our customers to opt-out of these log levels as default warn and log them as info. Would you be interested to make that follow up change in this PR? You can check our Configs.java file which contains all the environment variables and how they are mapped.

kushagraThapar avatar Jul 19 '24 16:07 kushagraThapar

@kushagraThapar : What percentage of your customers are unintentionally creating multiple clients? And what percentage of them are not seeing INFO level logs? And what percentage of them are then contacting you? And what do you tell them? Check your logs? Could you not tell them "Make sure you are logging INFO level messages and then check your logs?"

My point is it seems like you're forcing the burden of a small number of customers onto the rest of your customers.

Sure, I can change the PR so this is configurable that's like asking people to opt out of spam when you really should be asking people to opt into spam if they really want or need it.

Blackbaud-JasonBodnar avatar Jul 19 '24 16:07 Blackbaud-JasonBodnar

@Blackbaud-JasonBodnar this is not just limited to the number of clients (which more often than not - I would say majority of customers create it :)) - but this is also about the information that is logged as part of this log - which contains many important aspects like tcp info, preferred regions, and many other important things that customers should consider while building their applications because this changes the behavior of their applications.

It is not about just seeing this log or ignoring it, rather a proactive warning showing what all configurations you are using to build the client which many customers seem to ignore.

If it helps, you can turn off the warning logging for these 2 classes in your application log4j properties as well but logging them as warning is important for the SDK use case and business.

kushagraThapar avatar Jul 19 '24 18:07 kushagraThapar

@kushagraThapar : What you just described is why they should be logged as INFO.

INFO Designates informational messages that highlight the progress of the application at coarse-grained level. WARN Designates potentially harmful situations.

If it helps, you can turn off the warning logging for these 2 classes in your application log4j properties as well but logging them as warning is important for the SDK use case and business.

Yes, but what happens when these classes start logging real warnings that really need attention? I don't see them. That's why you have different logging levels and why WARN is used for warnings and INFO is used for providing information about connections and such.

Why don't you log a warning (or even error) when a second client is created? You could even include text informing them to enable INFO logs to see all client creation.

Blackbaud-JasonBodnar avatar Jul 19 '24 18:07 Blackbaud-JasonBodnar

Hi @Blackbaud-JasonBodnar. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

github-actions[bot] avatar Sep 20 '24 05:09 github-actions[bot]

Hi @Blackbaud-JasonBodnar. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

Still active. Waiting for it to be merged.

Blackbaud-JasonBodnar avatar Sep 20 '24 13:09 Blackbaud-JasonBodnar

I completely agree with @Blackbaud-JasonBodnar. We closely monitor all warnings and errors in our production environment and those logs generates false positives which is very annoying. Is there a reason why it is sill not merged?

dvanackere-lpg avatar Nov 06 '24 15:11 dvanackere-lpg

@Blackbaud-JasonBodnar @dvanackere-lpg I understand your use case and concerns about logging these as WARN messages. However, unfortunately we can't make this change as it is, reason being whenever a customer reaches out to us as a SEV-A ticket with any issues they are seeing, we really need to have access to these couple of logs as they show how the client was initialized with all the config and other important settings. We cannot ask the customer to enable this log (if it was INFO) and ask them to come back again if the issue happens again. We have to honor our supportability and customer trust.

That being said, I will make a change where we can expose a JVM property which can log this in INFO if you would like, however, a general note, this log is of utmost importance and should not be ignored. Will keep you posted with my change, and it should unblock your use case.

kushagraThapar avatar Nov 06 '24 17:11 kushagraThapar

Is it not possible to log these only when there's an error -- IE multiple clients are created in the same process? For those of us doing things correctly why should we have to see this log?

jcbodnar avatar Nov 06 '24 17:11 jcbodnar

@jcbodnar to provide more clarity on this, creating multiple clients is not an error, rather a practice against recommendation. It is not entirely an issue, if your compute can handle multiple Cosmos Clients in the same JVM, however, we recommend against that.

Another thing, this particular log prints out details that are required even if there is only a single cosmos client, for example machineId, Correlation Id, and other internal details like nativeTransport, fastClientOpen, etc. which are very critical for debugging if an issue arises in any condition like network issues, intermittent availability hits, etc.

Imagine your production system is down, and I ask you to re-enable this log and reach out to me next time when your production is down again, how frustrating that is for any customer :)

That being said, adding a JVM flag and hiding this log warn behind that flag should solve your issue, or do you have any other suggestions? I am open to feedback.

kushagraThapar avatar Nov 06 '24 18:11 kushagraThapar

Hi @Blackbaud-JasonBodnar. Thank you for your interest in helping to improve the Azure SDK experience and for your contribution. We've noticed that there hasn't been recent engagement on this pull request. If this is still an active work stream, please let us know by pushing some changes or leaving a comment. Otherwise, we'll close this out in 7 days.

github-actions[bot] avatar Jan 10 '25 05:01 github-actions[bot]

Hi @Blackbaud-JasonBodnar. Thank you for your contribution. Since there hasn't been recent engagement, we're going to close this out. Feel free to respond with a comment containing /reopen if you'd like to continue working on these changes. Please be sure to use the command to reopen or remove the no-recent-activity label; otherwise, this is likely to be closed again with the next cleanup pass.

github-actions[bot] avatar Jan 17 '25 08:01 github-actions[bot]