Expose raw descriptor string slice of `ClassName`, `FieldDescriptor` and `MethodDescriptor` for bindgens
I'm porting the java-spaghetti-gen to use this library, because the previously used jreflection crate is currently unmaintained.
It looks strange to me that ClassFile<'_>::this_class is Cow<'a, str>, but the object type class name contained in the FieldDescriptor is ClassName in which the original string slice cannot be read.
These desired functions are available in jreflection crate:
https://docs.rs/jreflection/latest/jreflection/field/struct.Field.html#method.descriptor_str
https://docs.rs/jreflection/latest/jreflection/method/struct.Method.html#method.descriptor_str
https://docs.rs/jreflection/latest/jreflection/field/enum.BasicType.html (Note: Id<'a> is a wrapper of &'a str)
Crate noak exposes these original string slices too, but it doesn't parse these field/method descriptors.
PS: It seems like ClassName's parse function splits the class binary name by /, but it doesn't care about $.
the object type class name contained in the
FieldDescriptorisClassNamein which the original string slice cannot be read.
The unqualified segments can be glued back together to make the string, right? I'd take a PR that exposes that as a function on ClassName.
PS: It seems like
ClassName's parse function splits the class binary name by/, but it doesn't care about$.
Sorry, is there a bug here? If so please provide a test case to demonstrate it, thanks!
The unqualified segments can be glued back together to make the string, right?
I've already tried to do it, but it's not an ideal solution: it constructs a new String which involves allocation. I'm changing it to use std::fmt::Formatter (write immediately).
Sorry, is there a bug here?
Here it is (based on your test_owned_cow):
#[test]
fn test_owned_cow() {
let chars = Cow::from("(Ljava/lang/Object$Obj;)V".to_string());
let descriptor = parse_method_descriptor(&chars, 0).unwrap();
let mut parameters = descriptor.parameters.into_iter();
let return_type = descriptor.return_type;
assert_eq!(
parameters.next().unwrap(),
FieldDescriptor {
dimensions: 0,
field_type: FieldType::Object(ClassName {
segments: vec![
UnqualifiedSegment {
name: Cow::Borrowed("java")
},
UnqualifiedSegment {
name: Cow::Borrowed("lang")
},
UnqualifiedSegment {
name: Cow::Borrowed("Object$Obj")
},
],
}),
},
);
assert!(parameters.next().is_none());
assert_eq!(return_type, ReturnDescriptor::Void);
}
Sorry, I think it's not easy to make the change: parsed fields available in ClassFile do use the borrowed class data, but the original class data isn't available as a whole.
I've built a self-referencing wrapper for the parsed ClassFile (possibly off-topic):
pub use unsafe_class::Class;
mod unsafe_class {
use std::pin::Pin;
use std::marker::PhantomPinned;
pub struct Class {
raw_bytes: Pin<Box<(Vec<u8>, PhantomPinned)>>,
inner: cafebabe::ClassFile<'static>,
}
impl Class {
pub fn read(raw_bytes: Vec<u8>) -> Result<Self, cafebabe::ParseError> {
let pinned = Box::pin((raw_bytes, PhantomPinned));
// SAFETY: `get<'a>(&'a self)` restricts the lifetime
let fake_static = unsafe {
std::slice::from_raw_parts(pinned.0.as_ptr(), pinned.0.len())
};
let inner = cafebabe::parse_class(fake_static)?;
Ok(Self { raw_bytes: pinned, inner })
}
pub fn get<'a>(&'a self) -> &'a cafebabe::ClassFile<'a> {
// SAFETY: casts `self.inner` into `cafebabe::ClassFile<'a>` forcefully.
// Why is `ClassFile` invariant over `'a`?
unsafe { &*(&raw const (self.inner)).cast() }
}
}
}
Do you think it's actually safe?
Sorry, I think it's not easy to make the change: parsed fields available in
ClassFiledo use the borrowed class data, but the original class data isn't available as a whole.
Yes, unfortunately there's the pesky case of Java UTF-8 being different from regular UTF-8, so we can't unconditionally use the original class data when representing things as rust strings.
I've built a self-referencing wrapper for the parsed
ClassFile(possibly off-topic):
neat!
Do you think it's actually safe?
I don't know, your grasp of advanced rust is better than mine, I haven't kept up with the language much recently.
PS: It seems like ClassName's parse function splits the class binary name by /, but it doesn't care about $.
I looked again the JVM spec and per https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-4.html#jvms-4.2.2 unqualified names are allowed to contain the $ character so unless there's an actual bug here I don't think there's anything to change. The test case you provided passes as-written, but even if you change the expectation there it's not clear to me why you would expect it to behave differently.
I'm about to finish porting java-spaghetti to cafebabe with some workarounds, but I can't make previous unit tests like method_mangling_style_mangle_test working, because functions like parse_method_descriptor aren't exposed by cafebabe. I'll switch to the noak crate if cafebabe cannot be changed (I made this choice because you are the author of some popular crates, it's a wrong idea of mine...). Anyway, thank you for the reply.
Yes, unfortunately there's the pesky case of Java UTF-8 being different from regular UTF-8, so we can't unconditionally use the original class data when representing things as rust strings.
How about holding a &'a [u8] reference of the original data in these descriptors to make getting Cow<'a, str> possible? On the other hand, how could you get the parsed ClassName via this_class: Cow<'a, str> field of ClassFile?
Note about nested classes:
"The binary name of a member class or interface consists of the binary name of its immediately enclosing class or interface, followed by $, followed by the simple name of the member." https://docs.oracle.com/javase/specs/jls/se17/html/jls-13.html#jls-13.1.
Have a A.java source file: public class A { public class B {} }, execute javac A.java, you can see the name A$B in both A.class and A$B.class. However, A$B is probably one unqualified name, I'm not sure...
jreflection manages to parse nested class names from the string: https://docs.rs/jreflection/latest/src/jreflection/class.rs.html#177-181.
Well, a workaround in method_mangling_style.rs is possible, I may construct MethodDescriptor manually for the test case, because its fields are all public. I may keep using cafebabe with less effort.
How about holding a
&'a [u8]reference of the original data in these descriptors to make gettingCow<'a, str>possible?
While technically this is possible, I think it would be a pretty invasive change and require a massive overhaul of how data is plumbed through the parser. If I'm wrong and there's an easy way to do this I'll consider it.
On the other hand, how could you get the parsed
ClassNameviathis_class: Cow<'a, str>field ofClassFile?
Not sure what you mean by this, can you elaborate?
Note about nested classes:
"The binary name of a member class or interface consists of the binary name of its immediately enclosing class or interface, followed by $, followed by the simple name of the member." https://docs.oracle.com/javase/specs/jls/se17/html/jls-13.html#jls-13.1.
This is in the JLS which is specific to Java. cafebabe is a class file parser which means it has to work with all JVM languages not just Java. I'd like to avoid adding Java-specific things into cafebabe directly.
It is an issue about API designing. Why should the field type (or method argument/return type) class binary name available in FieldType be ClassName, while the name of the class itself is exposed as Cow<'a, str>? (the API doesn't provide methods for convertions in between ClassName and Cow<'a, str>)
Ah I see what you mean. That's a fair point. When I wrote it I was mostly following the spec and in the one case it's a descriptor string and in the other it's not, so I used different types. But I can see your point that from the consumer's point of view it's inconsistent.
Current workarounds in https://github.com/Dirbaio/java-spaghetti/pull/5:
-
FieldSigWriter: https://github.com/Dirbaio/java-spaghetti/pull/5/files#diff-1edca7d762ef03b71662752b0f6f6dea97fba85119842e32f89042055fa11dd3 -
MethodSigWriter: https://github.com/Dirbaio/java-spaghetti/pull/5/files#diff-66f29dc681387235b8d76ba5cd9bc10d3da01fe1b03bad18b02272f702859e4d -
ClassNamenewtype: https://github.com/Dirbaio/java-spaghetti/pull/5/files#diff-4d40bc7c0590ea77cb67d01b35c599d152df076ce53495720ba33ee6e6400e94
When I wrote it I was mostly following the spec and in the one case it's a descriptor string and in the other it's not, so I used different types.
Description of this_class in https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-4.html#jvms-4.1: The value of the this_class item must be a valid index into the constant_pool table. The constant_pool entry at that index must be a CONSTANT_Class_info structure (§4.4.1) representing the class or interface defined by this class file.
https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-4.html#jvms-4.4.1: The value of the name_index item must be a valid index into the constant_pool table. The constant_pool entry at that index must be a CONSTANT_Utf8_info structure (§4.4.7) representing a valid binary class or interface name encoded in internal form (§4.2.1).
Then I took a glance at https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-4.html#jvms-4.5, https://docs.oracle.com/javase/specs/jvms/se21/html/jvms-4.html#jvms-4.6, still I can't understand your words "in the one case it's a descriptor string and in the other it's not".
Ok after reading over the spec again I agree with you, the way I have it is inconsistent and can be improved. What I'm thinking is that
- The ClassName structure should have a Cow for the whole string, and instead of exposing the Vec of unqualified names as a member, could have an impl method that computes and returns the Vec
- The
this_class(andsuper_classetc.) members of the ClassFile structure should also be ClassName structures that wrap the Cow values currently present.
Does that seem reasonable?
Maybe. However, I had worried about the lifetime of ClassName previously: if it should be a wrapper of Cow by itself, the lifetime of unqualified names got from that ClassName might be limited by ClassName, but not ClassFile. If ClassName remembers the original &'a [u8], then it's possible to have an impl method that returns a Cow<'a, str>, and another method that returns a sequence (or iterator) of unqualified names having 'a lifetime parameter (instead of being limited by the lifetime of ClassName).
Well, I'm confused by my previous thoughts about the workaround for the current version of this library. If changes were to be made inside this library, the lifetime of ClassName and other descriptors should be still consistent with the ClassFile which contains them.
发件人: Kartikaya Gupta (kats) @.>
发送时间: 2025年2月16日 4:50
收件人: staktrace/cafebabe @.>
抄送: wu bobo @.>; Author @.>
主题: Re: [staktrace/cafebabe] Expose raw descriptor string slice of ClassName, FieldDescriptor and MethodDescriptor for bindgens (Issue #52)
Ok after reading over the spec again I agree with you, the way I have it is inconsistent and can be improved. What I'm thinking is that
- The ClassName structure should have a Cow for the whole string, and instead of existing the Vec of unqualified names, could have an impl method that computes and returns the Vec
- The this_class (and super_class etc.) members of the ClassFile structure should also be ClassName structures that wrap the Cow values currently present.
Does that seem reasonable?
― Reply to this email directly, view it on GitHubhttps://github.com/staktrace/cafebabe/issues/52#issuecomment-2661096208, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AVFLUNU3A7VWYLOL3L3NKD32P6SAHAVCNFSM6AAAAABWSEOY4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRRGA4TMMRQHA. You are receiving this because you authored the thread.Message ID: @.***>
[staktrace]staktrace left a comment (staktrace/cafebabe#52)https://github.com/staktrace/cafebabe/issues/52#issuecomment-2661096208
Ok after reading over the spec again I agree with you, the way I have it is inconsistent and can be improved. What I'm thinking is that
- The ClassName structure should have a Cow for the whole string, and instead of existing the Vec of unqualified names, could have an impl method that computes and returns the Vec
- The this_class (and super_class etc.) members of the ClassFile structure should also be ClassName structures that wrap the Cow values currently present.
Does that seem reasonable?
― Reply to this email directly, view it on GitHubhttps://github.com/staktrace/cafebabe/issues/52#issuecomment-2661096208, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AVFLUNU3A7VWYLOL3L3NKD32P6SAHAVCNFSM6AAAAABWSEOY4WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRRGA4TMMRQHA. You are receiving this because you authored the thread.Message ID: @.***>
I think in this case the unqualified names would be returned as Strings with copying. I don't mind doing that since it would be an explicit function call from the user instead of happening as part of parsing.
1. The ClassName structure should have a Cow for the whole string, and instead of exposing the Vec of unqualified names as a member, could have an impl method that computes and returns the Vec 2. The `this_class` (and `super_class` etc.) members of the ClassFile structure should also be ClassName structures that wrap the Cow values currently present.
just want to chime in and show support / encouragement for this idea; trying to upgrade a decompiler from cafebabe 0.7 to 0.8 i ran into exactly this issue.
ps: i don't really see much added value by the imagined "ClassName#to_segments" method to return a vec; a classNameCow.split('/') is more flexible and would be available "for free."
there probably would be more candidates to be presented through a ClassName in the attributes module. What do you think?
ps: the more I'm looking into it, the less and less i see a value in the new type (at least in the cafebabe's public api) and would eventually vote for a plain Cow<'a, str>.
Hi, thanks for the PR. I haven't had a chance to look into it yet but will get to it in the next couple of days. Sorry for the delay.
Thank you again @xitep for the PR. It's merged now, please try it out and if it satisfies your uses I can publish a new version
many thanks! i already tried it, and it works fine for me. a published version would be great!
I've published 0.9.0 with the changes. I'll look at the performance separately and maybe publish a new patch release with perf improvements. For now I'll close this issue. Thanks!
thank you!