goblin icon indicating copy to clipboard operation
goblin copied to clipboard

Goblin ergonomics

Open m4b opened this issue 8 years ago • 18 comments

This deeply annoys me:

    let peek = goblin::peek(&mut fd)?;
    if let Hint::Unknown(magic) = peek {
        println!("unknown magic: {:#x}", magic)
    } else {
        let bytes = { let mut v = Vec::new(); fd.read_to_end(&mut v)?; v };
        match peek {
            Hint::Elf(_) => {

I think there's an architectural problem and an ergonomics problem here.

  1. We don't want to read the entire binary/file if it doesn't even have proper magic; this is what peek is for
  2. If we pass the peek test, we want to read the file in full, and pass these bytes into goblin, and receive the enum variant
  3. We don't want to even think about the Unknown variant - we already peeked to make sure the magic is good! - everything else is just a parse error for that respective file format

So, what I want is my cake and eat it too:

I want the peek to ensure the magic is correct, and route to the correct binary parser, and return this result, without the Unknown variant, without temporarily allocations (or the full fd read is passed through).

@philipc @endeav0r you seem to be using goblin::Object as clients, does this bother you?

Anyone who happens to be watching/reading this, I'm open to proposals how to fix this make it nicer.

Afaics, being flexible w.r.t. the bytes + reading is going to be tricky; first thing that comes to my mind is some kind of closure style or an inout, like:

// has no Unknown variant, and is also totally me just randomly typing stuff
let object: Option<Result<Object>> = Object::parse_and_fill(fd, &mut bytes);

This would be a breaking change, but I think its important to get right sooner rather than later

m4b avatar Nov 09 '17 05:11 m4b

I haven't used goblin::Object (or if I have I've forgotten where).

So far I've only used goblin with memmap input, and I think the existing Object::parse would work for that.

The proposed parse_and_fill makes sense to me. Why would it be a breaking change to add it? I tend to prefer ordering the return value as Result<Option<Object>> though.

philipc avatar Nov 09 '17 05:11 philipc

It could be a breaking change because i want to remove the Unknown variant since the peek ostensibly makes it redundant, or forces an unreachable clause (which is a sign of bad design to me)

m4b avatar Nov 09 '17 06:11 m4b

What if peek returned an Option<Hint>?

sunfishcode avatar Nov 09 '17 12:11 sunfishcode

  1. I am 100% ok with breaking changes.
  2. This is confusing: https://github.com/m4b/goblin/blob/master/src/lib.rs#L200 . Is is_lsb little endian?
  3. How about something like:
let goblin_loader = goblin::load(&mut fd);
let endianness = goblin_loader.peek().endian(); // Option<goblin::Endian>
let architecture = goblin_loader.peek().architecture(); // Option<goblin::Architecture>
let elf = if goblin_loader.is_elf() {
    goblin_loader.elf() // elf = Option<goblin::elf::Elf>
}
else {
    panic!("Not an elf")
}
let elf = match goblin_loader.executable() {
    match goblin::loader::Elf(elf) => elf, // goblin::elf::Elf
    _ => panic!("Not an elf")
};

Not exactly, but you kind of get the idea.

endeav0r avatar Nov 09 '17 13:11 endeav0r

So to be clear the pattern I don't like is this:

    if let Hint::Unknown(magic) = peek {
        println!("unknown magic: {:#x}", magic)
    } else {
        let bytes = { let mut v = Vec::new(); fd.read_to_end(&mut v)?; v };
        let object = Object::parse(&bytes)?;
        match object {
         Object::Elf(elf) => {
             println!("elf: {:#?}", &elf);
         },
         Object::PE(pe) => {
             println!("pe: {:#?}", &pe);
         },
         Object::Mach(mach) => {
             println!("mach: {:#?}", &mach);
         },
         Object::Archive(archive) => {
             println!("archive: {:#?}", &archive);
         },
         Object::Unknown(magic) => { unreachable!() }
        }

That last unreachable in Object::Unknown is, to use an unpalatable term, "has code smellies" to me ;)

I actually don't think there is way to fix this, short of adding dependent types to rust :thinking:

So there's a couple of problems:

  1. object needs bytes (it has a lifetime), so we can't expose an api like fn parse<R: Read>(r: R) -> Object (it actually used to be like this prior to move to zero-copy)
  2. we don't want to read a whole file just to discard it because the magic is bad
  3. if we peek, the if condition guarantees us that the Unknown variant cannot occur, hence its unreachable. But this is runtime information (hence the semi flippant dependent types remark)

To be precisely clear, I wanted to remove the Object::Unknown variant, but I don't think this is possible if we have an api like fn parse(bytes: &[u8]) -> Object since the bytes could have a bad magic.

Alternatively, we could return a Error::BadMagic variant when the magic doesn't match our list of acceptable magicks. But this would then require the client to examine the error for the specific BadMagic variant, instead of the Unknown variant, which I don't like, as it seems to "overload" the error type in some sense.

Of course this could be solved by also returning a new Object variant without the unknown case, and only provided by a function like fn peek_and_parse which returns the special variant iff the peek does not fail, and a None otherwise, e.g. something like:

pub enum DefinitelyABinary {
  Mach(mach),
  Elf(elf),
  PE(pe),
}

pub fn peek_and_parse<R: Read>(fd: R, bytes: &mut Vec<u8>) -> Result<Option<DefinitelyABinary>> {
  match peek(fd) {
    Unknown(_) => None // we lose magic...
    Peek::Elf(_) => fd.read_to_end(bytes); DefinitelyABinary(elf::Elf::parse(&bytes)
    Peek::Mach(_) => // same but parse mach
    Peek::PE(_) => // same but parse pe
  }
}

but this isn't very satisfying to me either. Maybe it should be. I dunno?

I'm not sure there's a perfect solution here :/

EDIT

To preserve the magic information, we probably want a signature like:

type UnknownMagic = usize;
Result<Result<DefinitelyABinary, UnknownMagic>, goblin::Error>

But the double result will very likely just get ignored via ? Maybe that's ok?

m4b avatar Nov 10 '17 23:11 m4b

let object = Object::peek_and_parse(fd, &mut bytes)??;

It looks we're angrily confused about the goblin library and this function call :rofl:

m4b avatar Nov 10 '17 23:11 m4b

How about this?

    let peek = goblin::peek(&mut fd)?;
    match peek {
        None => println!("unknown magic: {:#x}", magic),
        Some(magic) => {
            let bytes = { let mut v = Vec::new(); fd.read_to_end(&mut v)?; v };
            let object = Object::parse(magic, &bytes)?;
            match object {
                Object::Elf(elf) => {
                    println!("elf: {:#?}", &elf);
                },
                Object::PE(pe) => {
                    println!("pe: {:#?}", &pe);
                },
                Object::Mach(mach) => {
                    println!("mach: {:#?}", &mach);
                },
                Object::Archive(archive) => {
                    println!("archive: {:#?}", &archive);
                },
            }
        }
    }

goblin::parse would determine the filetype from the magic parameter.

sunfishcode avatar Nov 10 '17 23:11 sunfishcode

What if you pass an invalid magic? It will have either error with bad magic (same problem I mentioned above) or return variant of Unknown, or still have signature like Result<Option<Object>>; which at that point may as well push all the fd + peek logic into a single function

m4b avatar Nov 10 '17 23:11 m4b

I'm suggesting peek return an Option<Hint>. With invalid magic, peek would return None, and in that case you wouldn't call parse at all.

sunfishcode avatar Nov 11 '17 00:11 sunfishcode

     fn peek(fd: &mut File) -> Result<Option<Hint>>;
     fn parse(hint: Hint, bytes: &[u8]) -> Result<Object>;

sunfishcode avatar Nov 11 '17 00:11 sunfishcode

Right, but what would the internal logic of parse(magic: usize, bytes: &[u8]) look like?

It has to still check the magic is valid - either it will return Option, Result with an Unknown variant, or panic, no?

m4b avatar Nov 11 '17 00:11 m4b

Oh, I see what you're saying, I thought the peek was a usize:

     fn parse(hint: Hint, bytes: &[u8]) -> Result<Object> {
       match hint {
         Elf => // parse elf,
         Mach => // parse mach,
         PE => // parse pe,
         Archive => // parse pe,
         // exhausted
       }
}

m4b avatar Nov 11 '17 00:11 m4b

Yea I like that idea; it makes using goblin from raw bytes slightly more complicated though.

Using your proposal I think I like this api:

     fn peek(fd: &mut File) -> Result<Option<Hint>>;
     fn parse_with(hint: Hint, bytes: &[u8]) -> Result<Object>;
     fn parse(fd: &mut File, bytes: &mut [u8]) -> Result<Option<Object>>;

For clients who already have the bytes but no fd it sort of makes goblin a pita to use though. I'm not sure how common this would be though. probably could be very common.

m4b avatar Nov 11 '17 00:11 m4b

One thought in reply to this:

if we peek, the if condition guarantees us that the Unknown variant cannot occur, hence its unreachable. But this is runtime information (hence the semi flippant dependent types remark)

There is no such guarantee. Files can change between reads, and mmap'd files can even change mid-read. Or, plain old user error: the user can pass one buffer into peek() and another into parse().

Even if you peek() successfully, parse() has to handle the invalid magic case.

willglynn avatar Nov 11 '17 00:11 willglynn

@willglynn yea that's a good point.

However @sunfishcode 's option proposal I think somewhat sidesteps this issue.

It forces the user to pass a peek. They either got the peek from somewhere or fabricated it.

In either case, the parse will error; it'll interpret the bad bytes as an elf binary and will still return a scroll BadMagic error when attempting to parse the elf header.

The problem of the file changing, mmap bytes getting messed up is still a problem for current lib - it will just return BadMagic for elf if the header doesn't match and you try to parse it.

The advantage is that making peek an option is that it removes the unreachable case, or at least, moves it into the separate binary parsing backend errors.

m4b avatar Nov 11 '17 00:11 m4b

@m4b Sure. I just wanted to tamp down expectations. A peek() that finds a valid magic makes it highly likely that parse() will see the same thing, it just doesn't guarantee anything. parse()'s body and its API needs to accommodate that.

willglynn avatar Nov 11 '17 00:11 willglynn

@willglynn yea i probably overstated things when I said it's guaranteed :laughing:

As a bonus I think I can also have my angry double ?? version using the option peek as well:

// or perhaps vexing_parse for our c++ brethren
pub fn angry_parse(bytes: &[u8]) -> Result<Result<Object, goblin::Error>, UnknownMagic> {
  match Hint::peek(bytes) {
    Some(hint) => // parse accordingly
    None => Err(UnknownMagic::from(bytes))
  }
}

m4b avatar Nov 11 '17 00:11 m4b

I think my last suggestion is actually a good idea, or dan's; I think pre-release 1.0 one of them should be chosen.

m4b avatar Sep 29 '18 18:09 m4b