bolt icon indicating copy to clipboard operation
bolt copied to clipboard

BoltDB.Open crash when opening a partial file, require integrity check function

Open xxr3376 opened this issue 8 years ago • 6 comments

I'am writing a storage agent based on boltDB, the agent will never restart if it's killed during network recovering. This is because it will try to open a partial file and checking its version at very beginning.

I found that if I migrate boltDB between two hosts by following code, if migration failed during the procedure, it may leave a partial db file on disk, and this file will make boltDB crash when trying to opening that file.

You can reproduce this error by truncate a boltDB file in the middle, then open it. This file will contain a proper magic header but wrong content.

Various of errors may happened depends on the location of truncate point, it's useless to paste any error here. Is there any method to check integrity of a file? Sender:

func (b *boltDB) ExportTo(acceptType []string, meta MetaWriter, writer io.Writer) error {
    export := false
    for _, t := range acceptType {
        if t == boltType {
            export = true
        }
    }
    if !export {
        return errUnknownFormat
    }
    return b.db.View(func(tx *bolt.Tx) error {
        if meta != nil {
            if err := meta(boltType, tx.Size()); err != nil {
                return err
            }
        }
        _, err := tx.WriteTo(writer)
        return err
    })
}

Receiver:

func ImportBoltDB(filename string, contentType string, reader io.Reader) (KV, error) {
    if contentType != boltType {
        return nil, errUnknownFormat
    }
    file, err := os.OpenFile(filename, os.O_RDWR|os.O_CREATE, 0600)
    if err != nil {
        return nil, err
    }
    _, err = io.Copy(file, reader)
    file.Close()
    if err != nil {
        // XXX Incomplete file should be deleted to prevent boltDB crash
        // This method try it's best to remove partial file, but it can't do anything when receiving SIGKILL.
        os.Remove(filename)
        return nil, err
    }
    return NewBoltDB(filename)
}

xxr3376 avatar Jun 16 '17 03:06 xxr3376

blake2b features tree-based (updatable/incremental cryptographic) hashes that were designed for checksumming entire filesystems, so you could use it here to develop a solution. See

https://blake2.net/

and Go libs are available:

https://github.com/glycerine/blake2b-simd

https://github.com/dchest/blake2b

(update: specifically, see section 2.10 of https://blake2.net/blake2_20130129.pdf)

glycerine avatar Jul 15 '17 17:07 glycerine

I will compare checksum for integrity during transmission.

Still hope to know, it's there any possible to avoid SEGFAULT when opening an partial file?

xxr3376 avatar Jul 17 '17 02:07 xxr3376

Use defer and recover.

glycerine avatar Jul 17 '17 03:07 glycerine

No, you can't recover from an SEGFAULT error, no matter in which language.

xxr3376 avatar Jul 17 '17 03:07 xxr3376

don't be ridiculous. Only SIGKILL and SIGSTOP cannot be caught. recover works fine for segfaults:

package main                                                                                                   
                                                                                                               
import "fmt"                                                                                                   
                                                                                                               
type s struct {                                                                                                
    a int                                                                                                      
}                                                                                                              
                                                                                                               
func main() {                                                                                                  
                                                                                                               
    var p *s                                                                                                   
                                                                                                               
    defer func() {                                                                                             
        if caught := recover(); caught != nil {                                                                
            fmt.Printf("recovered from segfault")                                                              
        }                                                                                                      
    }()                                                                                                        
                                                                                                               
    p.a = 10                                                                                                   
}

glycerine avatar Jul 17 '17 04:07 glycerine

Sorry for saying Can't handle SEGFAULT in any language, we can definitely recover by handling singal.

It's hard to recover from SEGFAULT in following code, you can have a try.

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"math/rand"
	"os"

	"github.com/boltdb/bolt"
)

func main() {
	// Remove previous data
	os.Remove("/tmp/test1.db")
	os.Remove("/tmp/test2.db")

	b, err := bolt.Open("/tmp/test1.db", 0600, nil)
	log.Println("Writing data.")
	err = b.Update(func(tx *bolt.Tx) error {
		b, err := tx.CreateBucketIfNotExists([]byte("haha"))
		if err != nil {
			return err
		}
		d := make([]byte, 128)

		for i := 0; i < 10000; i += 1 {
			n, err := rand.Read(d)
			if n != 128 {
				panic("bad len")
			}
			if err != nil {
				return err
			}
			err = b.Put(d, d)
			if err != nil {
				return err
			}
		}
		return nil
	})
	if err != nil {
		log.Panic("Inserting.")
	}
	err = b.Close()
	if err != nil {
		panic("can't close file")
	}

	log.Println("Testing")
	data, err := ioutil.ReadFile("/tmp/test1.db")
	if err != nil {
		log.Println(err)
		panic("can't read source db")
	}
	err = ioutil.WriteFile("/tmp/test2.db", data[:len(data)/2], 0600)
	if err != nil {
		panic("can't write source db")
	}
	testDB("/tmp/test2.db")
}

func testDB(fn string) {
	defer func() {
		if r := recover(); r != nil {
			log.Println("Recovered in testDB", r)
		}
	}()
	b, err := bolt.Open(fn, 0600, nil)
	if err != nil {
		return
	}
	_ = b
	return
}

(update): I don't want to handle low-level signal in my main function, it's really hard to do in-place recover. Your code works fine, because go runtime can identify that nil pointer for you. If errors come from linux kernel (e.g. mmap memory), I believe we can't just simplely recover by calling recover function.

xxr3376 avatar Jul 17 '17 05:07 xxr3376