bigcache icon indicating copy to clipboard operation
bigcache copied to clipboard

Iterator readEntry crashes

Open exobin opened this issue 5 years ago • 10 comments

What is the issue you are having? Sometimes when using EntryInfoIterator to read all entries from the cache, the readEntry() call crashes with the following stack trace:

runtime error: makeslice: len out of range
  panic(0x1fabc00, 0x27ff350)
	/usr/local/go/src/runtime/panic.go:975 +0x3e3
  github.com/allegro/bigcache.readEntry(...)
  	/usr/local/Code/src/github.com/allegro/bigcache/encoding.go:36
  github.com/allegro/bigcache.(*EntryInfoIterator).Value(0xc10fd94a50, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
  	/usr/local/Code/src/github.com/allegro/bigcache/iterator.go:120 +0x305

We only recently updated from an old version of bigcache. This error seems to indicate the object being read is somehow larger than the maximum slice length, which should not be possible given we marshal the structs to bytes using protobuf's Marshal() method:

func (m *CacheEntry) Marshal() (dAtA []byte, err error) {
	size := m.Size()
	dAtA = make([]byte, size)
	n, err := m.MarshalTo(dAtA)
	if err != nil {
		return nil, err
	}
	return dAtA[:n], nil
}

We are running on a 64bit system and we limit the size of all entries to ~1MB anyhow.

What is BigCache doing that it shouldn't?

It is not clear reading the code, but it seems related to how readEntry() determines length, perhaps some kind of overflow issue.

Environment:

  • Version (git sha or release): v2.2.1
  • OS (e.g. from /etc/os-release or winver.exe): ubuntu 16.04
  • go version: 1.14.2

exobin avatar May 28 '20 03:05 exobin

Well that's exciting! Can you provide a quick reproduction of the bug so it's a bit easier to understand the full context? I think I understand where it's breaking, but a code example will help debug it a bit more.

siennathesane avatar May 28 '20 17:05 siennathesane

Hey @mxplusb - It's tricky to know how to reproduce it exactly as it is not something that occurs consistently, and it seems that some unusual condition is met sometimes. Here is a mockup that represents roughly what we are doing:

package main

import (
	"encoding/binary"
	"fmt"
	"math/rand"
	"time"

	"github.com/allegro/bigcache"
)

var cache *bigcache.BigCache

type Entry struct {
	id uint32
	v  []byte
}

func uint32Bytes(v uint32) []byte {
	dAtA := make([]byte, 4)
	binary.LittleEndian.PutUint32(dAtA, v)
	return dAtA
}

func (e Entry) Marshal() []byte {
	dAtA := uint32Bytes(e.id)
	dAtA = append(dAtA, e.v...)
	return dAtA
}

func assignRand(p []byte) {
	for i := range p {
		p[i] = byte(rand.Int63() & 0xff)
	}
}

func seed() {
	for i := 0; i < 100000; i++ {
		v := make([]byte, rand.Intn(1024))
		assignRand(v)
		e := Entry{
			id: uint32(i),
			v:  v,
		}
		cache.Set(string(uint32Bytes(e.id)), e.Marshal())
	}
}

func init() {
	conf := bigcache.DefaultConfig(60 * 24 * time.Hour)
	conf.HardMaxCacheSize = 100 // actually dynamically calculated
	conf.MaxEntrySize = 500
	var err error
	cache, err = bigcache.NewBigCache(conf)
	if err != nil {
		panic(err)
	}
	seed()
}

func main() {
	rand.Seed(1)
	it := cache.Iterator()
	var current bigcache.EntryInfo
	var err error
	for it.SetNext() {
		current, err = it.Value()
		if err != nil {
			panic(fmt.Sprint("iterator value error:", err))
			continue
		}
		fmt.Println("len-value:", len(current.Value()))
	}
}```

exobin avatar May 29 '20 04:05 exobin

@mxplusb We have had another crash incident from a different method call:

Caught potentially fatal dispatcher error: runtime error: index out of range [7] with length 4

panic(0x21b9e60, 0xc14db05820)
        /usr/local/go/src/runtime/panic.go:969 +0x166
encoding/binary.littleEndian.Uint64(...)
        /usr/local/go/src/encoding/binary/binary.go:77
github.com/allegro/bigcache.readHashFromEntry(...)
        /usr/local/Code/src/github.com/allegro/bigcache/encoding.go:57
github.com/allegro/bigcache.(*cacheShard).removeOldestEntry(0xc05ceff0e0, 0xc100000002, 0x268, 0x3b3)
        /usr/local/Code/src/github.com/allegro/bigcache/shard.go:291 +0x170
github.com/allegro/bigcache.(*cacheShard).set(0xc05ceff0e0, 0xc16b7685a8, 0x8, 0x5beafd6767c17689, 0xc14db53680, 0x24e, 0x24e, 0xffffffffffffffff, 0xc0674e54e0)
        /usr/local/Code/src/github.com/allegro/bigcache/shard.go:142 +0x1df
github.com/allegro/bigcache.(*BigCache).Set(0xc000982750, 0xc16b7685a8, 0x8, 0xc14db53680, 0x24e, 0x24e, 0x24e, 0xc14db53680)
        /usr/local/Code/src/github.com/allegro/bigcache/bigcache.go:133 +0xac
tamber/distcache/table.(*CacheTable).Set(0xc0002b62a0, 0xc16b7685a8, 0x8, 0xc14db53680, 0x24e, 0x24e, 0x0)

Any ideas?

exobin avatar Jun 01 '20 19:06 exobin

I think this may be related to #148

exobin avatar Jun 02 '20 03:06 exobin

This program will panic immediately with error makeslice: len out of range

package main

import (
    "bytes"
    "github.com/allegro/bigcache/v2"
    "log"
    "math/rand"
    "runtime"
    "strconv"
    "time"
)

func main() {
    cache, _ := bigcache.NewBigCache(bigcache.Config{
        Shards:             1,
        LifeWindow:         time.Second,
        MaxEntriesInWindow: 100,
        MaxEntrySize:       256,
        HardMaxCacheSize:   1,
        Verbose:            true,
    })

    go func() {
        defer func() {
            if err := recover(); err != nil {
                buf := make([]byte, 16*1024*1024)
                buf = buf[:runtime.Stack(buf, false)]
                log.Fatalf("err: %v\n\ncrash: %s\n", err, string(buf))
            }

        }()

        for {
            cache.Set(strconv.Itoa(rand.Intn(100)), blob('a', 1024*100))
        }
    }()

    go func() {
        defer func() {
            if err := recover(); err != nil {
                buf := make([]byte, 16*1024*1024)
                buf = buf[:runtime.Stack(buf, false)]
                log.Fatalf("err: %v\n\ncrash: %s\n", err, string(buf))
            }

        }()

        for {
            iter := cache.Iterator()
            for iter.SetNext() {
                value, err := iter.Value()
                log.Printf("key: %v, err: %v", value.Key(), err)
            }
        }
    }()

    select {}
}

func blob(char byte, len int) []byte {
    return bytes.Repeat([]byte{char}, len)
}

2020/06/03 22:49:22 err: runtime error: 

crash: goroutine 19 [running]:
main.main.func2.1()
        /Users/walli/Projects/awesomeProject/12.go:42 +0x9e
panic(0x10c04e0, 0x10fac30)
        /usr/local/Cellar/go/1.14.3/libexec/src/runtime/panic.go:969 +0x166
github.com/allegro/bigcache/v2.readEntry(...)
        /Users/walli/go/pkg/mod/github.com/widelee/bigcache/[email protected]/encoding.go:36
github.com/allegro/bigcache/v2.(*EntryInfoIterator).Value(0xc0000b80a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        /Users/walli/go/pkg/mod/github.com/widelee/bigcache/[email protected]/iterator.go:120 +0x305
main.main.func2(0xc000092d00)
        /Users/walli/Projects/awesomeProject/12.go:51 +0xf1
created by main.main
        /Users/walli/Projects/awesomeProject/12.go:38 +0x101

In Iterator, it copy index in SetNext(), but the object of index may change when reusing memory in bytesqueue and it pushes objects in the head of queue.

Why NOT just copy hash value of keys during iterating and get newest index of bytequeue in Value()? For example,

// shards.go

func (s *cacheShard) copyKeys() (keys []uint64, next int) {
	s.lock.RLock()
	keys = make([]uint64, len(s.hashmap))

	for key := range s.hashmap {
		keys[next] = key
		next++
	}

	s.lock.RUnlock()
	return keys, next
}
// iterator.go

// Value returns current value from the iterator
func (it *EntryInfoIterator) Value() (EntryInfo, error) {
	it.mutex.Lock()

	if !it.valid {
		it.mutex.Unlock()
		return emptyEntryInfo, ErrInvalidIteratorState
	}

	wrappedEntry, err := it.cache.shards[it.currentShard].getWrappedEntryWithLock(it.elementKeys[it.currentIndex])
	if err != nil {
		it.mutex.Unlock()
		return emptyEntryInfo, err
	}

	it.mutex.Unlock()
	return EntryInfo{
		timestamp: readTimestampFromEntry(wrappedEntry),
		hash:      readHashFromEntry(wrappedEntry),
		key:       readKeyFromEntry(wrappedEntry),
		value:     readEntry(wrappedEntry),
	}, nil
}

WideLee avatar Jun 03 '20 14:06 WideLee

@mxplusb any hope for a fix here? See a new issue #226 with a similar problem.

exobin avatar Jun 08 '20 22:06 exobin

I don't actively maintain this project anymore (I was Allegro's first outside collaborator for many years), so I personally won't be providing a fix, I just review community contributions and help make it easier for others to provide a fix if they choose to. Sorry!

siennathesane avatar Jun 22 '20 19:06 siennathesane

Oh cool, we've been happy users for years but it does seem there are quite a few issues now related to this. If there is any way to get more attention to it or if there is a better place to get an update from the contributor community that would certainly be appreciated.

exobin avatar Jun 22 '20 22:06 exobin

@WideLee would you like to contribute a patch?

janisz avatar Jun 23 '20 10:06 janisz

@janisz Sure ^_^ I want to change copyKeys() to copy keys' hash instead of copy keys' index in BytesQueue, is anything problem?

WideLee avatar Jun 23 '20 14:06 WideLee