Regression and/or documentation lackage: cannot parse version from rubygems index
Here's a simple program to parse rubygems index I've successfully used with rubymarshal 1.0.3:
#!/usr/bin/env python3
import gzip
import requests
import rubymarshal.reader
data = requests.get('https://api.rubygems.org/latest_specs.4.8.gz').content
data = gzip.decompress(data)
for (name, ver, gemplat), _ in zip(rubymarshal.reader.loads(data), range(10)):
print(name, ver, gemplat)
It's output with 1.0.3:
_ UsrMarshal:Gem::Version(['1.4']) ruby
- UsrMarshal:Gem::Version(['1']) b'ruby'
0mq UsrMarshal:Gem::Version(['0.5.3']) b'ruby'
0xdm5 UsrMarshal:Gem::Version(['0.1.0']) b'ruby'
0xffffff UsrMarshal:Gem::Version(['0.1.0']) b'ruby'
10to1-crack UsrMarshal:Gem::Version(['0.1.3']) b'ruby'
1234567890_ UsrMarshal:Gem::Version(['1.2']) b'ruby'
12_hour_time UsrMarshal:Gem::Version(['0.0.4']) b'ruby'
16watts-fluently UsrMarshal:Gem::Version(['0.3.1']) b'ruby'
189seg UsrMarshal:Gem::Version(['0.0.1']) b'ruby'
With 1.2.6 it looks like this
_ UsrMarshal({}) ruby
- UsrMarshal({}) ruby
0mq UsrMarshal({}) ruby
0xdm5 UsrMarshal({}) ruby
0xffffff UsrMarshal({}) ruby
10to1-crack UsrMarshal({}) ruby
1234567890_ UsrMarshal({}) ruby
12_hour_time UsrMarshal({}) ruby
16watts-fluently UsrMarshal({}) ruby
189seg UsrMarshal({}) ruby
Nice thing is that unicode problem has gone, but bad thing is that custom object is no longer parsed.
At the very least, this requires major version bump.
Next, the documentation is not clean or wrong on how this can be parsed now. Changing it the way an example suggests:
#!/usr/bin/env python3
import gzip
import requests
import rubymarshal.reader
from rubymarshal.classes import RubyObject, registry
data = requests.get('https://api.rubygems.org/latest_specs.4.8.gz').content
data = gzip.decompress(data)
class GemVersion(RubyObject):
ruby_class_name = "Gem::Version"
registry.register(GemVersion)
for (name, ver, gemplat), _ in zip(rubymarshal.reader.loads(data), range(10)):
print(name, ver, gemplat)
doesn't change a thing.
In fact, this cannot work (at least with this data file), because ClassRegistry uses class names in form of strs, but class name is read by Reader.read as Symbol("Gem::Version"), which is hashed differently, so self.registry.get(class_name, UsrMarshal) always returns UsrMarshal.
I've solved this by using ver.marshal_dump() instead, but I don't think it's correct solution.
The only other code I can find which uses this for Gem Version is https://github.com/d9pouces/Moneta/blob/master/moneta/repositories/ruby.py and https://github.com/ATIX-AG/pulp_gem/blob/master/pulp_gem/specs.py written by @mdellweg , which may be also affected or might hold the answer for how to workaround this.
You're right. I'll update the doc and create two versions:
- 1.3 -> back to the previous behavior
- 2.0 -> the new behavior, which is closer to the Ruby implementation.
Still, what's the correct way to get Gem::Version value now?
Anyone only wanting a solution for Ruby versions, and only need py35+ support, https://github.com/dephell/dephell_specifier includes a fairly good Ruby version parser. I havent run a full scan of rubygems, so I do expect there are some oddballs, and I'll be happy to help fixing any issues raised.
@d9pouces do you need some help to fix this? (and as an aside, thank you ++ for this library :bow: )
@jayvdb re:
The only other code I can find which uses this for Gem Version ...
Actually I have a tool that's about to be released at last that uses this for Gem Version too.
FWIW I ran a quick git bisect and the commit that introduced the problem is at 7197a3d10c02b6616e35b5dc5c918ead942e6747.... but I cannot fathom why this makes things fail. @d9pouces would you have some idea of where to poke?
@AMDmi3 I se you have something working now at https://github.com/repology/repology-updater/blob/dbc9445f8e156caf38cca0a67108684c9960355c/repology/parsers/parsers/rubygem.py ... using https://github.com/repology/repology-updater/blob/dbc9445f8e156caf38cca0a67108684c9960355c/requirements.txt#L11 rubymarshal>=1.2.6 so I guess there is a way?
so I guess there is a way?
I've mentioned it right in the issue.
I got it to work with this class (It needs to inherit UsrMarshal, because it uses marshal_dump/load on the ruby side.):
class GemVersion(UsrMarshal):
ruby_class_name = "Gem::Version"
@property
def version(self):
return self._private_data[0]
def __repr__(self):
return f"{self.ruby_class_name}('{self.version}')"
def __str__(self):
return f"{self.ruby_class_name}('{self.version}')"
def __eq__(self, other):
return isinstance(other, self.__class__) and self._private_data == self._private_data
rubymarshal.classes.registry.register(GemVersion)