Add from_pretrained telemetry
This telemetry will mostly just allow us to gauge the proportion of local vs hub pipeline loading.
No potentially sensitive info about the (custom) pipeline class or the model path is logged. In case of any network errors the function exits gracefully without any interruptions to the user.
The documentation is not available anymore as the PR was closed or merged.
The ES logs collector runs nightly, so I won't merge this until tomorrow to make sure that it works
Please make sure you advertise this prominently in the docs, because it's the kind of thing people get real cranky about when they find out by surprise.
They'll also want an opt-out toggle. Oh, I see there is a DISABLE_TELEMETRY environment variable there, but if I'm reading this right, it doesn't prevent send_telemetry from making the request?
It should respect HF_HUB_OFFLINE as well.
Also consider making that request asynchronous. Some network failures take a while to decide the host can't be resolved, or it's unreachable, or they can't TLS handshake with it or whatever. And if you add something to from_pretrained that mysteriously results in someone's models taking an extra thirty seconds to load with no indication of why because that send method fails silently and eats the exception, it is going to take them all day to debug that shit. Again, very cranky.
Pretty annoyed by this - not the change itself, but not really listening to @keturn . The note in the installation file is good, but there is no mention of adding telemetry anywhere in the release notes at https://github.com/huggingface/diffusers/releases - it should have been flagged as a major change, but it's not even in the changelog.
I assume this was a mistake, but it looks like it's a deliberate attempt to hide a potentially unpopular change - or at least that's how the people that will be upset by this change will see it.
Thanks for the comment @hafriedlander , we move the feature back as stated in the issue above & try to be more precise with such "sensitive" PRs in the future.
One note though, I think we did listen to the feedback above no? No one mentioned the release notes & in our opinion it's not a PR that is worth featuring next to a model addition like Stable Diffusion 2 (which was maybe the mistake here).
RE: "deliberate attempt to hide a potentially unpopular change" -> I see why it might look this way, but it'd be nice to assume a bit more good faith here. a) hiding is impossible in open source which is the beauty of open-source :-) b) we are a small team of core-maintainers and move very fast which is why bugs are introduced in PRs, docs are outdated very quickly etc...
Very sorry for any inconvenience caused by this change! The local vs hub loading metrics were requested by the team to have more visibility into model usage, but we didn't consider all of the negative factors of doing this. Proposing a rollback of all local telemetry here: https://github.com/huggingface/diffusers/pull/1702 and remove it from the packages with the next release.
Just to confirm from my personal opinion what @patrickvonplaten already explained: we move as fast as we can and try our best to bring to diffusers a lot of features, models, pipelines, optimizations, etc. It's a huge effort but it's totally worthwhile because the field is moving forward at an incredible pace and we are thrilled to be part of it. We know we make mistakes and we are sorry, but we also try to correct them soon. Thanks for your patience and for voicing your concerns!
Hi,
Just to expand on my previous comment @patrickvonplaten (mostly because I think I left too much context out):
My comment came because I could see history avoidably repeating. I've seen the storm around other projects adding telemetry (including personally trying to add it to stuff I've done before), and I've noticed two things:
-
People tend to fall into two groups WRT telemetry A) People who don't really think it's a big deal, and see the value the data could bring B) People who really, really don't like it, on a fundamental, philosophical level
-
The people in group A tend to underestimate the depth and strength of feelings of group B. They try and argue their side (that it's not that much data, and it's all pretty anonymous, and it's really useful) and that never sways group B. Faulty analogy warning: you can't get a vegetarian to eat meat this one time by arguing it makes the dish taste better.
So my comment was more around the meta-warning from @keturn in the first sentence that you need to be super careful with telemetry, go slow and not make mistakes, because group B are predisposed to see any mistake as malicious.
(Honestly, I'm not sure it's possible to avoid this becoming a contentious issue. So maybe it wouldn't have been possible to avoid backlash.)