couchdb icon indicating copy to clipboard operation
couchdb copied to clipboard

Add units to all stats

Open wohali opened this issue 7 years ago • 6 comments

We currently don't expose the units for each stat that we publish under /_node/<node>/_stats. We should.

I bet this is a pretty straightforward update to all of the stats_descriptions.cfg files for someone who knows what all of the stats are.

wohali avatar Apr 09 '18 16:04 wohali

Hi @wohali.

I was looking at this issue, and the referenced file. It seems like there are units referenced in the description:

{[couchdb, auth_cache_hits], [
    {type, counter},
    {desc, <<"number of authentication cache hits">>}
]}.

Or were you thinking something like

{[couchdb, auth_cache_hits], [
    {type, counter},
    {unit, Integer},
    {desc, <<"number of authentication cache hits">>}
]}.

tabeth avatar Feb 11 '20 18:02 tabeth

Hey @tabeth , thanks for looking at this.

There's a lot of files:

./src/couch/priv/stats_descriptions.cfg
./src/dreyfus/priv/stats_descriptions.cfg
./src/couch_log/priv/stats_descriptions.cfg
./src/mem3/priv/stats_descriptions.cfg
./src/couch_replicator/priv/stats_descriptions.cfg
./src/ddoc_cache/priv/stats_descriptions.cfg
./src/chttpd/priv/stats_descriptions.cfg
./src/rexi/priv/stats_descriptions.cfg
./src/couch_mrview/priv/stats_descriptions.cfg
./src/global_changes/priv/stats_descriptions.cfg
./src/fabric/priv/stats_descriptions.cfg

It's a concern primarily for anything that isn't a counter - for instance, looking at the histograms for length of time, are those in milliseconds? microseconds?

wohali avatar Feb 11 '20 19:02 wohali

Ah, I see @wohali

So this is more investigative, where we want (for all of the stats_descriptions.cfg files)

This:

{[dreyfus, rpc, search], [
    {type, histogram},
    {desc, <<"length of a search RPC worker">>}
]}.

To be something more like this:

{[dreyfus, rpc, search], [
    {type, histogram},
    {desc, <<"length (milliseconds) of a search RPC worker ">>}
]}.

So to successfully complete this we need to traverse each file and then figure out what the true values are and update the descriptions. Is that about right?

tabeth avatar Feb 11 '20 19:02 tabeth

You got it! Anything that isn't a counter should be reviewed and ensure the description contains the correct unit.

wohali avatar Feb 11 '20 19:02 wohali

@wohali OK. I'll take a look at the files and start trying to update the descriptions.

So for example (to make sure I'm doing this right): We have this:

{[dreyfus, index, search], [
    {type, histogram},
    {desc, <<"length of an dreyfus_index search request">>}
]}.

Which seems to be implemented here. dreyfus_util:time/2 seems to call timer:now_diff/2 here, which returns microseconds. However it's divided by 1000, so the answer to the original question posed in the beginning should be milliseconds.

So our original example can be

{[dreyfus, index, search], [
    {type, histogram},
    {desc, <<"length (in milliseconds) of an dreyfus_index search request">>}
]}.

Is that right?

tabeth avatar Feb 11 '20 20:02 tabeth

@tabeth Careful. On Windows, erlang:system_time/0 will return time only in ms, not us. You'll need to check the code to be sure this does the right thing on that platform. We may need a source code change here if it does the wrong thing, similar to https://github.com/apache/couchdb/commit/24e2c8bf61a4e82504ea6e6ba2caa779980b9597#diff-7470c79716cb3e7f1b3d572732277d94R349

/cc @rnewson

But yes, that's the right approach.

wohali avatar Feb 11 '20 22:02 wohali