protovalidate-python icon indicating copy to clipboard operation
protovalidate-python copied to clipboard

[BUG] Generated buf.validate message are not part of the pypi package

Open lukasbindreiter opened this issue 6 months ago • 18 comments

Description

The current version of the protovalidate python package on pypi is not importable due to a missing buf.validate module.

>>> import protovalidate
Traceback (most recent call last):
  File "<python-input-0>", line 1, in <module>
    import protovalidate
  File "/Users/lukasbindreiter/protovalidate-test/.venv/lib/python3.13/site-packages/protovalidate/__init__.py", line 15, in <module>
    from protovalidate import config, validator
  File "/Users/lukasbindreiter/protovalidate-test/.venv/lib/python3.13/site-packages/protovalidate/validator.py", line 19, in <module>
    from buf.validate import validate_pb2  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'buf'

Steps to Reproduce

It's reproducable in a fresh python venv:

  1. uv init
  2. uv add protovalidate
  3. uv run python
  4. >>> import protovalidate
  5. raises ModuleNotFoundError: No module named 'buf'

Expected Behavior

When installing protovalidate with pip/uv it should also be importable and useable without additional workarounds of adding other dependencies externally.

Actual Behavior

The import fails, failing to resolve buf.validate

Environment

  • Operating System: macOS
  • Protovalidate Version: 0.13.0

Possible Solution

I think the best possible solution would be to just bundle the contents of the gen folder in the generated python package (by specifying it as source in pyproject.toml), to make sure that pip install protovalidate also creates the buf/validate namespace package in lib/python3.X/site-packages correctly.

Alternatively, protovalidate could depend on another python package that just provides buf.validate as module. However it is important that that module is in the official pypi registry, and not on https://buf.build/gen/python (reasoning see below).

Additional Context

We could work around this by using the generated SDK feature from buf, e.g.

python3 -m pip install bufbuild-protovalidate-grpc-python==1.73.1.1.20250717185734+6c6e0d3c608e --extra-index-url https://buf.build/gen/python

However, we are developing a library, not an application, so we cannot do that, because we cannot add bufbuild-protovalidate-grpc-python as dependency to our library, otherwise it would no longer be pip installable.

Alternatively, we could set include_imports=True when generating our stubs.

This will generate the required files in a package that looks like our_namespace.buf.validate.

With that everything does work - with one huge caveat however: We are now incompatible with every other library that also uses protovalidate and works around this issue in the same way.

Because as soon as some code attempts to import the generated stub from multiple python modules, e.g.

import our_namespace.buf.validate
import other_library.buf.validate
import buf.validate  # or even an application using the generated SDK featrue

this will error during import, because of a name conflict in the global protobuf descriptor pool.

Therefore I think it is best to just bundle buf.validate directly in the protovalidate pypi package.

lukasbindreiter avatar Jul 22 '25 09:07 lukasbindreiter

I just saw that in the protovalidate python quickstart the workaround also is to let users specify --extra-index-url https://buf.build/gen/python

However for us (developing a library that should be pip installable by others) that is not a real solution, we do not want to forward python packaging complexity to our users.

lukasbindreiter avatar Jul 22 '25 09:07 lukasbindreiter

Hi @lukasbindreiter,

I'd love to get this figured out as it's been a perennial issue for protovalidate-python. As you mentioned, the biggest concern is figuring out how to avoid polluting the global protobuf descriptor pool with multiple protovalidate instances, so until now we've been suggesting that people generate their own protovalidate protos for usage (unfortunately removed from our README). Does that work for you?

otherwise it would no longer be pip installable.

I'm curious about this and actually haven't tried myself: are you not able to specify an alternative package registry for a given dependency? I'm assuming based on https://pip.pypa.io/en/stable/reference/requirement-specifiers/#requirement-specifiers the answer is no, and you don't want to have to have your library's users specify an extra-index-url when installing?

stefanvanburen avatar Jul 22 '25 14:07 stefanvanburen

As you mentioned, the biggest concern is figuring out how to avoid polluting the global protobuf descriptor pool with multiple protovalidate instances, so until now we've been suggesting that people generate their own protovalidate protos for usage (unfortunately removed from our README). Does that work for you?

We did try this approach (with --include-imports), but there are some downsides to this:

A typical python library (some_library called here) has one top level module - so all its code (included generated code) should be scoped under it.

So ideally we want our generated protobuf stubs to be in a some_library.gen submodule, which can be achieved by the following buf.gen.yaml. And with the include_imports approach it would look like the following:

version: v2
plugins:
- remote: buf.build/protocolbuffers/python
  out: some_library/gen
  include_imports: True

Some Library with include_imports=True and everything scoped under a single namespace

which generates the following (which does make a lot of sense and is what we want from a python packaging perspective)

some_library
├── __init__.py
└── gen
    ├── buf
    │   └── validate
    │       └── validate_pb2.py
    └── some_library
        └── v1
            └── some_library_pb2.py

There is immediately one issue with this though: some_library_pb2.py contains the following import:

from buf.validate import validate_pb2 as buf_dot_validate_dot_validate__pb2

this doesn't work - because in this case, it should be:

from some_library.gen.buf.validate import validate_pb2 as buf_dot_validate_dot_validate__pb2

With that change, everything works.

Conflict if another library does the same thing

However, as soon as another library other_library does the same thing, they suddenly become incompatible with each other:

some_library
├── __init__.py
└── gen
    ├── buf
    │   └── validate
    │       └── validate_pb2.py
    └── some_library
        └── v1
            └── some_library_pb2.py

other_library
├── __init__.py
└── gen
    ├── buf
    │   └── validate
    │       └── validate_pb2.py
    └── other_library
        └── v1
            └── other_library_pb2.py

If we now try to use both, we get the following error:

import some_library  # works
import other_library  # raises TypeError: Couldn't build proto file into descriptor pool: duplicate file name buf/validate/validate.proto

Note: _descriptor_pool.Default().AddSerializedFile( seems to have some built-in caching going on - so if both libraries have the exact same serialized protovalidate then it works, but that's not something to rely on (versions of included protovalidate will for sure differ)

Obviously we don't want to rely on our users not using any other library that also uses protovalidate - so this is not a valid option.

Workaround: Provide multiple top-level modules in one python package

Another option would be the following: Change our package structure to expose two top level modules: some_library and buf

Contents of the some-library package

buf
└── validate
    └── validate_pb2.py

some_library
├── __init__.py
└── gen
    └── some_library
        └── v1
            └── LibraryB_pb2.py

With this setup, we still encounter the same TypeError: Couldn't build proto file into descriptor pool: duplicate file name buf/validate/validate.proto error - unless other_library also uses the exact same setup.

In which case both pacakges provide a buf.validatepython module. This will result in import shadowing, and in pythons import system picking one or the other (pretty much randomly) - but at least it will generally work.

If protovalidate package includes the buf module

I think a good solution to all of those would be if the protovalidate python package includes the buf/validate/validate_pb2.py module instead. Because then it's quite clear how to fix this import issue (buf.validate MissingImport) - by just adding protovalidate as dependency. And it is much less likely that other libraries get this wrong and result in issues.

Separate python package just containing buf/validate

Another option obviously is to depend on the bufbuild-protovalidate-grpc-python python package to provide this. This would maybe even be the most flexible, since now the versions of protovalidate and bufbuild-protovalidate-grpc-python are not coupled together.

However, that requires bufbuild-protovalidate-grpc-python to be available on pypi.

otherwise it would no longer be pip installable

because of this exactly.

I'm curious about this and actually haven't tried myself: are you not able to specify an alternative package registry for a given dependency?

No - unfortunately not, this is a huge pain point of python packaging. Which is why every tool specifies that in its own way:

just to name a few.

Users of our library generally don't care about our internal dependency setup - so moving that extra packaging complexity to them (they would always need to specify an extra-index-url, or an uv index, or a poetry package source when installing our library) is not something we are willing to do.

Suggestions

Based on this, in my opinion the best solution would be to:

  • publich bufbuild-protovalidate-grpc-python also to pypi
  • include buf.validate module in the protovalidate package

The reason why such issues typically never come up with google.protobuf.Timestamp is exactly this - it is included in the protobuf package - everyone imports it as google.protobuf.timestamp_pb2 and all works well.

Potentially also worth exploring would be some option to not use a global descriptor pool (especially with include_imports) but somehow scope them to a package name, that could theoretically help avoid such issues entirely. But that is way out of scope for this issue.

lukasbindreiter avatar Jul 23 '25 08:07 lukasbindreiter

hey @lukasbindreiter, really do appreciate the thoroughness of the comments!

  • publich bufbuild-protovalidate-grpc-python also to pypi

The issue with doing this is that we're always contending with the fact that there's two versions we care about: the protovalidate version (e.g. 0.14.0) and the python plugin version (e.g. 31.1).

We could say that when we publish a release, we always publish with the latest existing version of the protocolbuffers/python plugin. But we don't want to have to release a new version of this package every time either protovalidate updates its protos or protocolbuffers/python updates. So my general feeling is that a separate package is probably not the way to go.

include buf.validate module in the protovalidate package

We could do this, but similar to above, it makes me nervous that we'd be having users asking us to tag a new protovalidate-python version just to bump the version of protobuf + generate the new gencode with the updated plugin.

Similarly, users who need to hang back on their protobuf version might be SOL, because if their protobuf version is behind the generated code we include in our package, they'd potentially run into the New Gencode + Old Runtime issue.

I think if we did include the gencode in our package, we'd need to more precisely pin to a minimum protobuf version that matched the gencode version (rather than pinning to the more general major version), in order to avoid that issue.

Lastly, I think if we did include the gencode in our package, and a user of protovalidate also pulled in a generated SDK that depended on bufbuild/protovalidate, we'd run into either the descriptor pool error ... or import shadowing? Either way, seems Not Great.


Is there something I'm missing with the picture above? Just trying to make sure I'm considering all the options.

stefanvanburen avatar Jul 23 '25 15:07 stefanvanburen

Unfortunately I don't think there is a singly best option for every case - otherwise this wouldn't be such an issue for so many tools/libraries.

Restricting protobuf runtime versions in theory may make a lot of sense, but it's also a bit tricky - e.g. opentelemetry did that - which caused a lot of issues prohibiting people (including us) from upgrading their own tooling, see: https://github.com/open-telemetry/opentelemetry-python/issues/4639

One neat trick that came up there for the New Gencode + Old Runtime issue was this one from wandb: https://github.com/wandb/wandb/blob/7c8ead46482d3beafaeb8ba9f579e056b3812a68/wandb/proto/wandb_telemetry_pb2.py

import google.protobuf

protobuf_version = google.protobuf.__version__[0]

if protobuf_version == "3":
    from wandb.proto.v3.wandb_telemetry_pb2 import *
elif protobuf_version == "4":
    from wandb.proto.v4.wandb_telemetry_pb2 import *
elif protobuf_version == "5":
    from wandb.proto.v5.wandb_telemetry_pb2 import *

simply generating the protobuf files for multiple major runtime versions and including all of them in the package could help solve a lot of issues there potentially.

The issue with doing this is that we're always contending with the fact that there's two versions we care about: the protovalidate version (e.g. 0.14.0) and the python plugin version (e.g. 31.1)

I'm not too familiar with the extent of the differences between the generated code of different python plugin versions - it is related to the protobuf runtime version I think, right? I also do hope with protobuf editions the whole situation will hopefully be better in the future in that regard. But somehow just implied from the package name I'd assume the version there would match the protovalidate version.

We could do this, but similar to above, it makes me nervous that we'd be having users asking us to tag a new protovalidate-python version just to bump the version of protobuf + generate the new gencode with the updated plugin.

Yeah that is probably bound to happen for sure - but depending on the release cadance you have planned for protovalidate-python it may not be an issue anyway if there are fairly recent releases (at least 1-2 times a year or so I assume, in which also the generated code would be updated)

Lastly, I think if we did include the gencode in our package, and a user of protovalidate also pulled in a generated SDK that depended on bufbuild/protovalidate, we'd run into either the descriptor pool error ... or import shadowing?

The main issue with it not being included right now is that this exactly incentivizes other libraries that do use protovalidate to do exactly that - which will lead to import shadowing or descriptor pool conflicts further down the road. Therefore I still think a canonical package that does provide the protovalidate gencode is the best way forward, that way third party SDKs can just depend on that from the get go.

But anyway - thanks already for taking so much time to discuss this issue - it seems all possibilities have a whole lot of caveats to be aware of, so picking one to go with seems like a tough decision here.

lukasbindreiter avatar Jul 24 '25 13:07 lukasbindreiter

hey @lukasbindreiter, sorry for the slow response. Again, appreciate the thoroughness of your replies. I've been thinking about this quite a lot to see what we can do to improve the situation.

Looking at https://github.com/open-telemetry/opentelemetry-python/issues/4639 (and its solution, https://github.com/open-telemetry/opentelemetry-python/pull/4620), and thinking more about https://protobuf.dev/support/cross-version-runtime-guarantee/#backwards, I do think an approach we could take is including the generated module in our package, generated by the earliest plugin version that supports the major version of protobuf that we're locked to, and we could relax the protobuf pin to effectively > 5, < 7.

We'd only regenerate the gencode when ratcheting up to the next protobuf version, e.g. > 6, < 8. (At that point, we could consider that W&B approach of multiple exports to support multiple protobuf versions, although since we use protobuf internally we'd probably have to perform some shenanigans to make that continue to work.)


The only remaining issue with doing that is people using generated SDKs with deps on bufbuild/protovalidate will have multiple packages that both install a buf.validate module into their projects, which feels ... not great :(. I wish there was a standard way in Python to say "prefer the module xyz from package abc instead of package def"; as I understand it right now this is basically a race condition in package installers for which package was installed last. Is that your understanding as well?

I'm mostly nervous about a user installing a version of protovalidate alongside a generated SDK; they think they're getting the old gencode from protovalidate with a newer protobuf version in their app, but the gen SDK with newer gencode is newer than the version of protobuf they're using, and the poison pills cause a bad interaction in their app.

stefanvanburen avatar Jul 29 '25 14:07 stefanvanburen

hey @lukasbindreiter, sorry for the slow response. Again, appreciate the thoroughness of your replies. I've been thinking about this quite a lot to see what we can do to improve the situation.

Thank you, much appreciated! We've encountered all sorts of protobuf versioning related issues in the past already - that's why I'm really interested to find a nice solution here which we can apply ourselves as well and hopefully save us a lot of headaches down the road.

I do think an approach we could take is including the generated module in our package, generated by the earliest plugin version that supports the major version of protobuf that we're locked to, and we could relax the protobuf pin to effectively > 5, < 7

That's exactly what we ended up doing for a long time, our buf.gen.yaml pinned to plugin version to make sure to generate protobuf v5 files, to support as much runtime versions as possible.

plugins:
  - remote: buf.build/protocolbuffers/python:v29.3 # v30.0 onwards requires protobuf > 6

However, recently that has started to cause issues, because protobuf decided to emit really annoying warnings when the runtime version is too far ahead of the gencode version:

Image

That's why instead we now updated our library to require protobuf >= 6 instead: https://github.com/tilebox/tilebox-python/commit/f54c238cc6de1cd15442aeae31fd64325bc3ee39#diff-eec7dd8a8dbbc784c2b5dffed7844c0998f8cb70ac697aa6f20012ff86a9bd8bR9

We also considered just wrapping the imports inside a with warnings.suppress, because it feels pretty weird to require a dependency pin just because of warnings, but fortunately most tools seem to already support protobuf v6 finally (opentelemetry was the slowest to update in our dependency chain) so that seemed fine to do.

At that point, we could consider that W&B approach of multiple exports to support multiple protobuf versions

I was also considering that as a potential solution - we might adopt that for future protobuf major version upgrades after v6, or even do it for v5 as well if some use-case requires backwards compatibility with it. Also I wanted to wait a bit and see how this discussion here progresses - in case there is an elegant solution we just haven't found yet 😄

The only remaining issue with doing that is people using generated SDKs with deps on bufbuild/protovalidate will have multiple packages that both install a buf.validate module into their projects, which feels ... not great

That's true - and that's why I would very much caution against anyone using --include-imports when generating their gencode. While it may seem like a quick workaround, it is way to easy to shoot yourself in the foot with that approach further down the road, especially if libraries start doing that as well.

For the meantime - until we can find a nice solution for this issue - we actually ended up generating the protovalidate gencode and bundle it into our own package, but we at least added a package prefix to avoid the shadowing issue, e.g. it is exposed as tilebox.datasets.buf.validate. However this is still dangerous and something I would like to get rid of, since it still runs into the global descriptor pool name conflict.

as I understand it right now this is basically a race condition in package installers for which package was installed last. Is that your understanding as well?

Yes that's my understanding as well. Whatever happens to be in .venv/lib/python3.XX/site-packages/buf/validate - which typically is the last package that was installed - is what gets imported by import buf.validate

Often times when tackling very python specific issues I like to take a step back and see why/how it's solved in other languages. We actually have a Go SDK as well - and I was curious just now why the same thing was never an issue there.

It seems both our library, and protovalidate-go just depend on the same package providing the gencode: buf.build/gen/go/bufbuild/protovalidate/protocolbuffers/go

see:

  • here: https://github.com/bufbuild/protovalidate-go/blob/main/go.mod#L6
  • and here: https://github.com/tilebox/tilebox-go/blob/main/go.mod#L31

That way there is always exactly one version of the gencode available, and the go import system / package management always guarantees that.

The equivalent solution in python would be to have both protovalidate-python and our library depend on the bufbuild-protovalidate-grpc-python package. However, because python packaging is so messy this requires the pip --extra-index-url buf.build hack, which is why it's not an option for us.

Potentially you could publish a bufbuild-protovalidate-grpc-python package to the public Pypi.org, which could utilize the wandb approach and just bundle different gencode versions / plugin versions, e.g. the minimum plugin version for every protobuf major version.

Then protovalidate-python could depend on it, and always be quite lenient on the version constraints. We would then also depend on it in our library, and also be quite lenient on the version constraint - that would hopefully leave it to the pip/uv/poetry version resolver to find a valid version that works - and otherwise the end users of the packages could still pin an exact version themselves if required.

Just if you already fore see that pretty much every version update of the protovalidate proto files will also require an update of the protovalidate-python then it could potentially make sense to just bundle it directly as part of the same package rather than creating its own pacakge for it.

I'm mostly nervous about a user installing a version of protovalidate alongside a generated SDK; they think they're getting the old gencode from protovalidate with a newer protobuf version in their app, but the gen SDK with newer gencode is newer than the version of protobuf they're using, and the poison pills cause a bad interaction in their app.

Yes I absolutely share that sentiment - which is why I think its best to strongly caution against anyone actually generating and bundling the protovalidate gencode into their own packages - that was the initial trigger why I created this github issue.

Because I think it's fine if just their own SDK is newer gencode, bu the protovalidate a bit older, e.g. the following setup:

  • protobuf runtime version in python: 6.31
  • buf/validate gencode: 5.29
  • peoples own proto gencode: 6.XX (for all XX <= 31)

should always work, right (assuming the version of the actual .proto files is compatible)?

lukasbindreiter avatar Jul 30 '25 13:07 lukasbindreiter

Potentially you could publish a bufbuild-protovalidate-grpc-python package to the public Pypi.org, which could utilize the wandb approach and just bundle different gencode versions / plugin versions, e.g. the minimum plugin version for every protobuf major version.

Then protovalidate-python could depend on it, and always be quite lenient on the version constraints. We would then also depend on it in our library, and also be quite lenient on the version constraint - that would hopefully leave it to the pip/uv/poetry version resolver to find a valid version that works - and otherwise the end users of the packages could still pin an exact version themselves if required.

I like this idea (s/bufbuild-protovalidate-grpc-python/bufbuild-protovalidate-protocolbuffers-python+bufbuild-protovalidate-protocolbuffers-pyi, probably), but don't know how best to go about mirroring those packages from buf.build/gen/python -> pypi.org. Doing some quick searching around didn't lead to anything — have you seen that done before?

stefanvanburen avatar Jul 30 '25 15:07 stefanvanburen

I like this idea (s/bufbuild-protovalidate-grpc-python/bufbuild-protovalidate-protocolbuffers-python+bufbuild-protovalidate-protocolbuffers-pyi, probably), but don't know how best to go about mirroring those packages from buf.build/gen/python -> pypi.org

I don't know any tools or haven't seen anyone so far mirroring their own package registry to pypi, typically it's only the other way around, mirroring from pypi.

But I think setting up a small script fetching wheels from buf.build with curl and pushing them to pypi using something like twine should be pretty straightforward.

However I'm not sure if it makes sense to have three packages on pypi for it instead of just one. I just checked how the three packages on your own registry are set up, and I noticed there is a dependency from bufbuild-protovalidate-protocolbuffers-python -> bufbuild-protovalidate-protocolbuffers-pyi, right?

I guess this multi-package setup is because you map one generator plugin = one python package?

I think in general this should be completely fine and I don't think it causes any issues to have two packages write to the same site-packages/buf/validate folder if they are guaranteed to have a distinct set of file names in there.

But I'm not sure if it's ever really wanted to just install bufbuild-protovalidate-protocolbuffers-python without the type stubs - so potentially it makes sense to just simplify it into a single pypi package, this also helps avoid any potential cases where there is a version mismatch between the two. Also I don't think protovalidate uses any services or rpcs, so the grpc stuff is not needed for it I guess?

lukasbindreiter avatar Jul 31 '25 08:07 lukasbindreiter

But I'm not sure if it's ever really wanted to just install bufbuild-protovalidate-protocolbuffers-python without the type stubs - so potentially it makes sense to just simplify it into a single pypi package, this also helps avoid any potential cases where there is a version mismatch between the two. Also I don't think protovalidate uses any services or rpcs, so the grpc stuff is not needed for it I guess?

Yep, no gRPC stuff so no need for that plugin. I could definitely see combining the stubs & impl; I don't think there's a good reason for separating the two other than the historical bits.

But I think setting up a small script fetching wheels from buf.build with curl and pushing them to pypi using something like twine should be pretty straightforward.

Yeah, shouldn't be too hard to do. Going to leave this issue open for now to revisit when I get a chance; will see if there's appetite internally for getting this work done. (Would also be happy if someone else contributed it, but we'd probably end up wanting to dep on a package w/ a repo that's controlled by our org.)

stefanvanburen avatar Jul 31 '25 13:07 stefanvanburen

If the package on pypi should contain both the protobuf gencode and the .pyi stubs in the same package then I think just downloading the wheels from buf.build and uploading them to pypi is anyway not the way to go.

I think setting it up as a really simple standalone package, with the source code generated by a simple buf generate command would make a lot of sense - if you want I am happy to contribute that 😄

I think it makes sense for the whole thing to be inside this repository as a subfolder, I'm only unsure about how to best trigger the build?

How about setting up a https://github.com/peter-evans/repository-dispatch, from the https://github.com/bufbuild/protovalidate to a workflow in this repository updating the pypi package whenever the main protovalidate repo changes / has a release?

lukasbindreiter avatar Aug 04 '25 08:08 lukasbindreiter

I think it makes sense for the whole thing to be inside this repository as a subfolder, I'm only unsure about how to best trigger the build?

How about setting up a https://github.com/peter-evans/repository-dispatch, from the https://github.com/bufbuild/protovalidate to a workflow in this repository updating the pypi package whenever the main protovalidate repo changes / has a release?

I'd prefer this to be self-contained to this repo if possible — would a workflow with a cron job running daily (+ a workflow dispatch stanza for ad-hoc runs) be reasonable? I think a max 24 hour delay is reasonable (and could tune the delay over time if we wanted it publishing faster).

The only other thing we'd want to be looking for is whenever a new protocolbuffers/python plugin version is published to the BSR, which would trigger generating/publishing for the new plugin versions.

Lastly, it'd be great to set up publishing to test.pypi.org just to confirm the workflow works... (I know we don't currently do this for protovalidate, but for a net-new package I'd love to have a way to test it without needing to recreate the whole thing / yank a bunch of versions).

Seem reasonable?

stefanvanburen avatar Aug 04 '25 14:08 stefanvanburen

I'd prefer this to be self-contained to this repo if possible — would a workflow with a cron job running daily

A package release workflow on a CRON schedule does seem a bit strange I have to say, but I guess its just one way of implementing polling for new changes so I don't see why not. Though this means the workflow has to have some sort of state of what the last tagged version of protovalidate was that it published, to figure out if it has to publish a new one now. Any idea how to implement that with github actions?

The only other thing we'd want to be looking for is whenever a new protocolbuffers/python plugin version is published to the BSR, which would trigger generating/publishing for the new plugin versions.

I think for now the plugin version should probably just be pinned to v30.0 - if it's always the latest one people always have to upgrade their runtime immediately, rather than all proto v6 runtimes being support if it stays at v30. Or even better, have the gencode of multiple plugin versions in there, ala the wandb approach.

Lastly, it'd be great to set up publishing to test.pypi.org just to confirm the workflow works

Yeah that makes a lot of sense.

I'm hoping to find some spare time the next few weeks, in which case I'm happy to contribute some of that setup 😄

lukasbindreiter avatar Aug 12 '25 14:08 lukasbindreiter

Though this means the workflow has to have some sort of state of what the last tagged version of protovalidate was that it published, to figure out if it has to publish a new one now. Any idea how to implement that with github actions?

I would probably just have GitHub actions call a (Bash? Python?) script that pulls down the list of versions on the buf.build/gen/python index and compares them to the list of versions of the package on the PyPI registry, and then mark any missing ones from PyPI as "need to publish" 😄. The script could then shell out to twine or whatever's best for uploading downloaded packages.

We have some automation similar to this in our plugins repo (e.g. fetching plugin versions), which may or may not be helpful as it's all written in Go :).

stefanvanburen avatar Aug 13 '25 13:08 stefanvanburen

Ah so you prefer to have an exact mirror of the packages on buf.build on pypi, e.g. right now just pushing bufbuild-protovalidate-protocolbuffers-pyi and bufbuild-protovalidate-protocolbuffers-python straight to pypi.org?

My idea based on our discussion here was to instead create a bufbuild-protovalidate-protocolbuffers package here in this repo, have it generate the code of both plugins in one package and push that, using version v30.0 of the plugin and a version number of the published package that reflects the protovalidate commit: https://buf.build/bufbuild/protovalidate/commits

The disadvantage of just straight up mirroring is that if you let your tool always pick the most recent version (which in general I think you should do unless there are specific conflicts you're trying to avoid) this always forces a very recent protobuf version on you, e.g. running in an empty new project the following:

uv add 'bufbuild-protovalidate-protocolbuffers-python' --index https://buf.build/gen/python

installs this wheel, which has a hard dependency on the exact latest minor version of protobuf~=6.32.0: https://buf.build/gen/python/bufbuild-protovalidate-protocolbuffers-python/bufbuild_protovalidate_protocolbuffers_python-32.0.0.1.20250717185734+6c6e0d3c608e-py3-none-any.whl

# download the wheel
curl -O -L https://buf.build/gen/python/bufbuild-protovalidate-protocolbuffers-python/bufbuild_protovalidate_protocolbuffers_python-32.0.0.1.20250717185734+6c6e0d3c608e-py3-none-any.whl

# and inspect it
uv tool install wheel-doctor
wheel-doctor show-dependencies bufbuild_protovalidate_protocolbuffers_python-32.0.0.1.20250717185734+6c6e0d3c608e-py3-none-any.whl
bufbuild_protovalidate_protocolbuffers_python-32.0.0.1.20250717185734+6c6e0d3c608e-py3-none-any.whl
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┓
┃ Dependency                                                          ┃ Version ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━┩
│ bufbuild-protovalidate-protocolbuffers-pyi>=32.0.0.1.20250717185734 │ <none>  │
│ protobuf~=6.32.0                                                    │ <none>  │
└─────────────────────────────────────────────────────────────────────┴─────────┘

For us for now either option A (mirroring the exact buf.build packages to pypi) or option B (building a new package with a more relaxed plugin version and the potential to even include multiple protobuf major version gencodes ala wandb) would work. Somehow my gut feeling is that option B would lead to less version related issues down the road, so that would be my slight preference, however I'm happy to contribute one or the other if you have a clear preference.

One follow up question for option A: Pypi has a json API as well which you can use to easily query all versions of a package, e.g. https://pypi.org/pypi/protobuf/json

is that by any chance also available in your private pypi, https://buf.build/gen/python? I tried to guess the URL, e.g https://buf.build/gen/python/pypi/bufbuild-protovalidate-protocolbuffers-python/json but that doesn't work.

Otherwise I'd resort to parsing the HTML output of https://buf.build/gen/python/bufbuild-protovalidate-protocolbuffers-python for that option, which I think should also work.

lukasbindreiter avatar Aug 19 '25 12:08 lukasbindreiter

@lukasbindreiter ah, yep, sorry for the rabbit hole on option A — lost context in the thread. I think option B is a reasonable way to go.

stefanvanburen avatar Aug 19 '25 13:08 stefanvanburen

@stefanvanburen I'm just about to get started on this now.

I just saw that now there is also https://buf.build/google/cel-spec/ in gen.

As far as I can tell, that's just needed for tests, and it's not required for users of the protovalidate-python lib, right? Also do you know what it's relation is to https://pypi.org/project/cel-python, and why protovalidate-python pins this to version 0.2.* specifically?

But anyway I think that's not blocking me now to get started, just wanted to clarify this too.

lukasbindreiter avatar Aug 28 '25 10:08 lukasbindreiter

As far as I can tell, that's just needed for tests, and it's not required for users of the protovalidate-python lib, right?

yep!

Also do you know what it's relation is to https://pypi.org/project/cel-python, and why protovalidate-python pins this to version 0.2.* specifically?

cel-python is the underlying library for evaluation of protovalidate rules (which are written in CEL); cel-spec is the spec that all cel implementations are supposed to adhere to. We have to override a couple functions in protovalidate-python to pass protovalidate's own conformance tests, & wrote our own version of string formatting that we check against the cel-spec.

We're pinned to 0.2 for now because both new versions of cel-python unfortunately introduced new bugs on our end; we're trying to get them fixed upstream first and then hoping to bump to the latest release :)

stefanvanburen avatar Aug 28 '25 13:08 stefanvanburen