materialize icon indicating copy to clipboard operation
materialize copied to clipboard

[Epic] Expose Storage Utilization Data

Open nmeagan11 opened this issue 3 years ago • 15 comments

Initiative and Theme

Materialize is Dependable; Materialize makes observability easy

Problem

There is currently very little visibility into storage utilization, which affects the following use cases:

  • It's a poor user experience if we charge customers for what we write to S3 without any explanation. We can probably get these costs in bulk from AWS, but it's possible we may not be able to break them down in any way.
  • For customer-facing observability, users want to see how much storage they are using for their own diagnosis.

Success Criteria

There is a design document detailing the changes required to storage to support metering storage usage at the level required by the cloud team, and we implement these changes.

Time Horizon

3 weeks

Blockers

None

Blocks

https://github.com/MaterializeInc/cloud/issues/3200 https://github.com/MaterializeInc/cloud/issues/3259

nmeagan11 avatar Jun 02 '22 23:06 nmeagan11

Create a new persist object which will write, tag, and manage the full lifecycle of all files, which will then expose this information.

I don't have my head around this! What level of granularity are we targeting? S3 usage by source? Or something even more granular?

benesch avatar Jun 03 '22 04:06 benesch

@benesch I purposefully left it vague because I want to see what we need from the cloud billing epic.

nmeagan11 avatar Jun 03 '22 14:06 nmeagan11

Oh, hah, I was going to say that it seemed quite specific in a direction that I did not expect!

If the overarching goal here is to monitor per source storage usage, there may be ways to do that that don’t involve any changes to persist. For example maybe S3 storage lens can provide us the visibility that we need: https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage_lens_basics_metrics_recommendations.html#storage_lens_basics_metrics_types

benesch avatar Jun 03 '22 14:06 benesch

That would be nifty!

nmeagan11 avatar Jun 03 '22 16:06 nmeagan11

Ok, sorry, so then I'm confused about the phrasing of this epic! Is the idea that it is a placeholder work item for designing whatever changes may be required in storage to support the level of billing granularity settled on in #3200? If so, may I propose removing the references to persist? Perhaps something like:

[Epic] Integrate storage with cloud billing metering

Success criteria: a design document detailing the changes required to storage to support metering storage usage at the level required by the cloud billing system.

benesch avatar Jun 05 '22 16:06 benesch

I think you meant to tag https://github.com/MaterializeInc/cloud/issues/3200? And that sounds good, I'll make the updates now.

nmeagan11 avatar Jun 06 '22 17:06 nmeagan11

I think you meant to tag MaterializeInc/cloud#3200? And that sounds good, I'll make the updates now.

Oops, I did, thanks!

benesch avatar Jun 07 '22 00:06 benesch

Snowflake is instructive here. They have an ACCOUNT_USAGE schema which contains several different tables containing usage information for the account: https://docs.snowflake.com/en/sql-reference/account-usage.html. They also have a [TABLE_STORAGE_USAGE] view which is documented like so:

This view displays table-level storage utilization information, which is used to calculate the storage billing for each table in the account, including tables that have been dropped, but are still incurring storage costs.

The TABLE_STORAGE_USAGE table is documented to update every hour or two, which gives us some insight into how their internal systems work.

This blog post from Snowflake about Storage Profiling was helpful in surfacing some of these views.


Ultimately I think we should build towards something similar: an mz_storage_usage table that breaks down bytes used per storage collection. But I think we can aim for something simpler to solve Materialize Cloud's billing needs in the short term.

benesch avatar Jun 12 '22 21:06 benesch

Put another way, I think this blocks https://github.com/MaterializeInc/cloud/issues/3259 but not https://github.com/MaterializeInc/cloud/issues/3200!

benesch avatar Jun 12 '22 21:06 benesch

Thinking through what we need for cloud for billing (P1) and observability (P3) @nmeagan11 How does this sound?

  • P1 Customer account-level storage usage snapshot (in bytes) which we can use to bill customers. Every 24 hours is probably workable. But something closer to hourly would be great. An alternative would be to capture this information more often (say hourly) but only store a daily average.
  • P1 working with Cloud team to surface this information to Orb for billing.
  • P3 storage object-level usage snapshot (in bytes) (see below) For every object which incurs storage (source, sink, table), we record the object name, object type (source, sink, table), timestamp, bytes used.
  • P3 working with @jpepin and other teams to figure out how to put this information into system tables.

I think we should shoot for providing information in a table mz_storage_usage along these lines:

storage_object_name timestamp storage_owned
table_sales_calls 12/21/20 05:04:34 349873000
my-kafka-source-1 12/21/20 05:04:34 8769873000

Plus maybe some columns which could be used to join the information with mz_sources, mz_tables, etc.

Ideally we can store information for 60 days (that way, the user can always see information for an open bill on their account). Happy to discuss if that is onerous/expensive though.

Questions for Eng:

  • Do we store information about customers' databases (the database name, schema, views, roles, in tables beyond the system tables?
  • Can we separate out storage the customer intended (the sources/tables/sinks they ingested/set up) from the information we collect in system tables? I want to know if we can avoid charging customers for that.

hlburak avatar Jun 13 '22 19:06 hlburak

surface this information to Orb

Is there a specific format in which we need to provide to Orb?

Ideally we can store information for 60 days

Is 60 days arbitrary, or based on some compliance requirement or industry norm?

avoid charging customers for that

Do other products charge for metadata? I did a quick search but didn't find anything definitive.

nmeagan11 avatar Jun 13 '22 21:06 nmeagan11

For Orb we essentially create an "event" with the billable metric. I think your team can put the information in any number of places including a system table, but I'll check in with Eng on this.

60 days allows users to look at the numbers which went into their most recent (open, unpaid, not yet due) bill. We don't need to store every metric this way, but I think allowing folks to correlate their usage and top-level bill line items is essential.

WRT Metadata, it's unclear to me too. Snowflake has system tables, but its not clear from the breakdowns whether customers get charged for storing them. I'll try to get access and verify.

hlburak avatar Jun 13 '22 21:06 hlburak

We are punting this out of M2 because we believe the cloud team requires no additional work from us to satisfy the M2 billing epic (cc @benesch @hlburak).

nmeagan11 avatar Jun 14 '22 19:06 nmeagan11

The work to collect and store s3 Storage Usage is being tracked in MaterializeInc/cloud#1467

jpepin avatar Jul 07 '22 23:07 jpepin

Reopening because this needs tests and QA signoff! @jpepin, it probably makes sense for you to get some time with @philip-stoev and talk about how best to test this.

benesch avatar Aug 10 '22 07:08 benesch

Issues required for completion of this epic:

  • https://github.com/MaterializeInc/cloud/issues/3739
  • https://github.com/MaterializeInc/cloud/issues/3737

Related but not necessarily required for billiing:

  • https://github.com/MaterializeInc/cloud/issues/3726
  • https://github.com/MaterializeInc/cloud/issues/3716

jpepin avatar Aug 11 '22 18:08 jpepin

Closing, as the spirit of the epic is complete. Open question about what to do with the mz_storage_usage view, but https://github.com/MaterializeInc/materialize/issues/17180 can track that independently of this epic.

benesch avatar Jul 04 '23 03:07 benesch