Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #5001 #5078

Closed
wants to merge 9 commits into from
34 changes: 33 additions & 1 deletion core/chaincode/handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -474,7 +474,7 @@ func (h *Handler) notifyRegistry(err error) {
h.Registry.Ready(h.chaincodeID)
}

// HandleRegister is invoked when chaincode tries to register.
// handleRegister is invoked when chaincode tries to register.
func (h *Handler) HandleRegister(msg *pb.ChaincodeMessage) {
h.stateLock.RLock()
state := h.state
Expand Down Expand Up @@ -733,6 +733,17 @@ func (h *Handler) HandleGetStateMetadata(msg *pb.ChaincodeMessage, txContext *Tr

// Handles query to ledger to rage query state
func (h *Handler) HandleGetStateByRange(msg *pb.ChaincodeMessage, txContext *TransactionContext) (*pb.ChaincodeMessage, error) {
// Defer panic recovery
defer func() {
if r := recover(); r != nil {
chaincodeLogger.Errorf("Panic in HandleGetStateByRange: %v", r)
// Clean up any query context that may have been created
if txContext != nil {
txContext.CleanupQueryContext("")
}
}
}()

getStateByRange := &pb.GetStateByRange{}
err := proto.Unmarshal(msg.Payload, getStateByRange)
if err != nil {
Expand All @@ -750,6 +761,12 @@ func (h *Handler) HandleGetStateByRange(msg *pb.ChaincodeMessage, txContext *Tra
isPaginated := false
namespaceID := txContext.NamespaceID
collection := getStateByRange.Collection

// Validate start and end keys
if getStateByRange.StartKey == "" && getStateByRange.EndKey == "" {
return nil, errors.New("start key and end key must not both be empty")
}

if isCollectionSet(collection) {
if txContext.IsInitTransaction {
return nil, errors.New("private data APIs are not allowed in chaincode Init()")
Expand All @@ -767,14 +784,29 @@ func (h *Handler) HandleGetStateByRange(msg *pb.ChaincodeMessage, txContext *Tra
startKey = metadata.Bookmark
}
}

// Validate page size
if metadata.PageSize < 1 {
return nil, errors.New("page size must be greater than zero")
}

rangeIter, err = txContext.TXSimulator.GetStateRangeScanIteratorWithPagination(namespaceID,
startKey, getStateByRange.EndKey, metadata.PageSize)
} else {
rangeIter, err = txContext.TXSimulator.GetStateRangeScanIterator(namespaceID, getStateByRange.StartKey, getStateByRange.EndKey)
}

if err != nil {
// Clean up any resources if error occurs
txContext.CleanupQueryContext(iterID)
return nil, errors.WithStack(err)
}

if rangeIter == nil {
txContext.CleanupQueryContext(iterID)
return nil, errors.New("range query iterator is nil")
}

txContext.InitializeQueryContext(iterID, rangeIter)

payload, err := h.QueryResponseBuilder.BuildQueryResponse(txContext, rangeIter, iterID, isPaginated, totalReturnLimit)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ package stateleveldb

import (
"bytes"
"fmt"

"github.com/hyperledger/fabric-lib-go/common/flogging"
"github.com/hyperledger/fabric/common/ledger/dataformat"
Expand Down Expand Up @@ -164,10 +165,28 @@ func (vdb *versionedDB) GetStateRangeScanIteratorWithPagination(namespace string
if endKey == "" {
dataEndKey[len(dataEndKey)-1] = lastKeyIndicator
}
dbItr, err := vdb.db.GetIterator(dataStartKey, dataEndKey)

// Wrap the iterator creation in a recover block
var dbItr iterator.Iterator
var err error

func() {
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("panic while creating iterator: %v", r)
}
}()
dbItr, err = vdb.db.GetIterator(dataStartKey, dataEndKey)
}()

if err != nil {
return nil, err
return nil, errors.WithMessage(err, "error getting iterator from db")
}

if dbItr == nil {
return nil, errors.New("nil iterator returned from db")
}

return newKVScanner(namespace, dbItr, pageSize), nil
}

Expand Down
15 changes: 11 additions & 4 deletions gossip/gossip/algo/pull.go
Original file line number Diff line number Diff line change
Expand Up @@ -319,11 +319,18 @@ func (engine *PullEngine) OnRes(items []string, nonce uint64) {
}

func (engine *PullEngine) newNONCE() uint64 {
n := uint64(0)
for {
n = util.RandomUInt64()
if !engine.outgoingNONCES.Exists(n) {
engine.lock.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to lock at all. We only create nonces from here and we already lock there.

Just loop until engine.outgoingNONCES.Exists(n) is false, no need for attempts or anything.

defer engine.lock.Unlock()

maxAttempts := 100
for i := 0; i < maxAttempts; i++ {
n := util.RandomUInt64()
if n != 0 && !engine.outgoingNONCES.Exists(n) {
return n
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't try 100 attempts, but more attempts, like 1000,000.

and if we still fail, then we should loop until we find a nonce that is not in use using the crypto rand.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second part it can result in slow nonces if this happens right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after looking at the code myself, I think we can just use util.RandomUInt64 until it no longer exists in the outgoing nonces.

// If we couldn't generate a unique NONCE after max attempts,
// use time-based fallback
return uint64(time.Now().UnixNano())
}
16 changes: 9 additions & 7 deletions gossip/util/misc.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@ SPDX-License-Identifier: Apache-2.0
package util

import (
crand "crypto/rand"
"fmt"
"math/rand/v2"
"math/rand"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change the library from v2 to the previous library?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, didn't mean to, changed back

"reflect"
"runtime"
"sync"
Expand All @@ -18,12 +17,13 @@ import (
"github.com/spf13/viper"
)

var r *rand.Rand
var (
r *rand.Rand
rMutex sync.Mutex
)

func init() { // do we really need this?
var seed [32]byte
_, _ = crand.Read(seed[:])
r = rand.New(rand.NewChaCha8(seed))
r = rand.New(rand.NewSource(time.Now().UnixNano()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we seed this with a deterministic and non-random seed?
what was before was better, in my opinion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, rolling back

}

// Equals returns whether a and b are the same
Expand Down Expand Up @@ -179,14 +179,16 @@ func SetVal(key string, val interface{}) {
// RandomInt returns, as an int, a non-negative pseudo-random integer in [0,n)
// It panics if n <= 0
func RandomInt(n int) int {
return r.IntN(n)
return r.Intn(n)
}

// RandomUInt64 returns a random uint64
//
// If we want a rand that's non-global and specific to gossip, we can
// establish one. Otherwise this uses the process-global locking RNG.
func RandomUInt64() uint64 {
rMutex.Lock()
defer rMutex.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the lock/unlock call only here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the panic that happens when there's a lot of concurrency:

panic: runtime error: index out of range [-1]

goroutine 3305 [running]:
math/rand.(*rngSource).Uint64(0xc014386101?)
/usr/local/go/src/math/rand/rng.go:249 +0x7b
math/rand.(*Rand).Uint64(0xc0180cf980?)
/usr/local/go/src/math/rand/rand.go:104 +0x22
github.com/hyperledger/fabric/gossip/util.RandomUInt64(...)
/gossip/util/misc.go:187
github.com/hyperledger/fabric/gossip/gossip/algo.(*PullEngine).newNONCE(0xc006953000)
/gossip/gossip/algo/pull.go:324 +0x25
github.com/hyperledger/fabric/gossip/gossip/algo.(*PullEngine).initiatePull(0xc006953000)
/gossip/gossip/algo/pull.go:179 +0x11b
github.com/hyperledger/fabric/gossip/gossip/algo.NewPullEngineWithFilter.func1()
/gossip/gossip/algo/pull.go:126 +0x27
created by github.com/hyperledger/fabric/gossip/gossip/algo.NewPullEngineWithFilter in goroutine 1
/gossip/gossip/algo/pull.go:120 +0x2d5

I suppose the problem is because we are accessing this function a lot of times, so adding a lock ensures that there are no concurrency issues

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Above r is used in 2 more places without lock. Why is lock not used there?

  2. yes, this is a bug in the math/rand package. But after the release of v3.0.0 the package used 9b28d51 was changed, this change is not in the release yet. And your stacktrace is not using v2 code.

  3. Need proof that you actually fixed the bug suddenly it was fixed by changing the package to version v2. You need a test to prove that your fixes are correct

/usr/local/go/src/math/rand/rng.go:249 +0x7b
math/rand.(*Rand).Uint64(0xc0180cf980?)
/usr/local/go/src/math/rand/rand.go:104 +0x22

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Above r is used in 2 more places without lock. Why is lock not used there?

  2. yes, this is a bug in the math/rand package. But after the release of v3.0.0 the package used 9b28d51 was changed, this change is not in the release yet. And your stacktrace is not using v2 code.

  3. Need proof that you actually fixed the bug suddenly it was fixed by changing the package to version v2. You need a test to prove that your fixes are correct

/usr/local/go/src/math/rand/rng.go:249 +0x7b
math/rand.(*Rand).Uint64(0xc0180cf980?)
/usr/local/go/src/math/rand/rand.go:104 +0x22

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the panic that happens when there's a lot of concurrency:

panic: runtime error: index out of range [-1]

goroutine 3305 [running]: math/rand.(*rngSource).Uint64(0xc014386101?) /usr/local/go/src/math/rand/rng.go:249 +0x7b math/rand.(*Rand).Uint64(0xc0180cf980?) /usr/local/go/src/math/rand/rand.go:104 +0x22 github.com/hyperledger/fabric/gossip/util.RandomUInt64(...) /gossip/util/misc.go:187 github.com/hyperledger/fabric/gossip/gossip/algo.(*PullEngine).newNONCE(0xc006953000) /gossip/gossip/algo/pull.go:324 +0x25 github.com/hyperledger/fabric/gossip/gossip/algo.(*PullEngine).initiatePull(0xc006953000) /gossip/gossip/algo/pull.go:179 +0x11b github.com/hyperledger/fabric/gossip/gossip/algo.NewPullEngineWithFilter.func1() /gossip/gossip/algo/pull.go:126 +0x27 created by github.com/hyperledger/fabric/gossip/gossip/algo.NewPullEngineWithFilter in goroutine 1 /gossip/gossip/algo/pull.go:120 +0x2d5

I suppose the problem is because we are accessing this function a lot of times, so adding a lock ensures that there are no concurrency issues

If there is a panic stemming from gossip then I fail to see how is the chaincode related to the problem.

We shouldn't fix two different problems in the same PR.

Can you move all the non gossip related code changes to a new PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but you see the problem, right? gossip is invoking randomUint64, and it panicking,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, these are two different problems for two different panic errors

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic calls a different code than the one you're correcting.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @pfi79 compiling the code in main should fix the issue right about gossip right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So @pfi79 compiling the code in main should fix the issue right about gossip right?

It's possible. After updating the package to v2 I stopped catching the panic error in tests.

return r.Uint64()
}

Expand Down
Loading