client loses connection to gateway peer and cannot recover without a restart when endorsement policy is changed #332

davidkel · 2021-11-26T16:15:52Z

First I will note that the client could probably recover but would have to be coded to detect the scenario and create a new client and gateway to recover. The problem I see reports 2 errors

{"component":"CLIENT","timestamp":"2021-11-25T17:05:58.440Z","txnId":"f4bcbe05d1c9257ebfbaa7492d3a853935ab69789c2d2feb661b413662d10a76","stage":"Endorsing","message":"addUpdateAssets({\"arguments\":[\"1\",\"2000\"]})"}
{"component":"CLIENT","timestamp":"2021-11-25T17:06:28.454Z","txnId":"f4bcbe05d1c9257ebfbaa7492d3a853935ab69789c2d2feb661b413662d10a76","stage":"Failed","message":"4 DEADLINE_EXCEEDED: Deadline exceeded"}
{"component":"CLIENT","timestamp":"2021-11-25T17:05:58.443Z","txnId":"bcd6cc53c262a76f23015a2003af3e72785079f44be01ba28840c6adaa31bc6a","stage":"Endorsing","message":"updateChaosAsset({\"arguments\":[\"cd1\",\"99\"]})"}
{"component":"CLIENT","timestamp":"2021-11-25T17:06:28.454Z","txnId":"bcd6cc53c262a76f23015a2003af3e72785079f44be01ba28840c6adaa31bc6a","stage":"Failed","message":"14 UNAVAILABLE: Stream refused by server"}

But either way no further submitted transaction succeeds until the client was restarted

Unfortunately client side grpc logging doesn't tell me much only saying what the error message says. The peer logs don't provide anything helpful and if I turn on too much logging then I struggled to recreate the problem.

I recreated the scenario outside of the endorsement policy by having a long running transaction create MVCC Read Conflicts so it looks related to multiple long running transactions failing which have a large write set, in my case writing to 2000 keys but small amounts of data.

A recent change to fabric here hyperledger/fabric#3012 changed the output for when proposal responses don't match by including base64 version of protobuf bytes into the error which subsequently end up in the logs. If I revert this change then I haven't been able to hit the problem. It looks like this change was intended for when the peer is used as a cli but also occurs when the peer is running as a server.

If I don't use the long running txns with large number of keys then the problem doesn't occur, so this is specific to the contrived application being run to test the scenario, so it's not likely to be a common problem so I will leave it up for discussion about whether the recent change to outputting base64 when proposals don't match is of value and should be output when the peer is running in server mode.

The text was updated successfully, but these errors were encountered:

denyeart · 2021-11-29T04:48:45Z

@andrew-coleman what is your opinion?

andrew-coleman · 2021-11-29T17:02:24Z

@denyeart
@davidkel and I looked at this today. It's not at all obvious why this recent change would be causing the grpc connection to go bad other than the fact that sending potentially very large error messages back to the client might be triggering the problem (this was only tested with a node client - perhaps an issue with node-grpc?).

Since this a highly contrived error scenario test, we don't believe this should block the 2.4 release.

For the next release, we propose the following enhancement:

Revert the error message to no longer contain the base64 encoded response when invoked from the gateway.
Instead, when the responses don't match traverse the offending r/w sets to find the states that differ, and report those back to the client in a structured message that clearly shows the keys/values/versions that differ.

bestbeforetoday · 2021-11-30T10:13:15Z

Mis-matched proposal responses is a server-side issue related to smart contracts, so I think the key thing is to ensure that mismatched responses are logged at the server end so they can be investigated by whoever has operational responsibility for the Fabric nodes. There's certainly no harm in sending some specific information about the reason for the endorsement failure to the client so the issue can be diagnosed correctly at that end, but I don't think it's strictly necessary to send proposal response content back to the client. In some respects this might even be considered bad security practice since it gives any attacker additional information about the internal behaviour of the network.

gqqnbig · 2021-12-09T12:53:08Z

My change hyperledger/fabric#3012 is for CLI, and I'm not aware of the gateway. I agree with the two enhancement proposed by @andrew-coleman.

The gateway has been reusing the protoutil.CreateTx function to build the transaction envelope from the set of endorsements. This function changed recently to add extra information into the error message if the proposal responses don’t all match. This change was identified as causing a problem for the gateway (hyperledger/fabric-gateway#332). This commit reverts that change, and instead writes the base64 response payloads to the log. It also refactors the functions slightly to remove duplicate unmarshalling of protos when invoked from the gateway. Signed-off-by: andrew-coleman <[email protected]>

davidkel · 2021-12-16T09:55:13Z

With the latest images

 Version: 2.4.0
 Commit SHA: aaaec4f
 Go version: go1.16.7
 OS/Arch: linux/amd64
 Chaincode:
  Base Docker Label: org.hyperledger.fabric
  Docker Namespace: hyperledger

I can't recreate this problem and still see the mismatches being reported and logged.
Closing

davidkel mentioned this issue Nov 26, 2021

Test changing an endorsement policy davidkel/fabric-chaos-testing#67

Open

andrew-coleman self-assigned this Dec 7, 2021

andrew-coleman mentioned this issue Dec 8, 2021

Gateway to use simplified createTx function hyperledger/fabric#3104

Closed

davidkel closed this as completed Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client loses connection to gateway peer and cannot recover without a restart when endorsement policy is changed #332

client loses connection to gateway peer and cannot recover without a restart when endorsement policy is changed #332

davidkel commented Nov 26, 2021 •

edited

Loading

denyeart commented Nov 29, 2021

andrew-coleman commented Nov 29, 2021

bestbeforetoday commented Nov 30, 2021

gqqnbig commented Dec 9, 2021 •

edited

Loading

davidkel commented Dec 16, 2021

client loses connection to gateway peer and cannot recover without a restart when endorsement policy is changed #332

client loses connection to gateway peer and cannot recover without a restart when endorsement policy is changed #332

Comments

davidkel commented Nov 26, 2021 • edited Loading

denyeart commented Nov 29, 2021

andrew-coleman commented Nov 29, 2021

bestbeforetoday commented Nov 30, 2021

gqqnbig commented Dec 9, 2021 • edited Loading

davidkel commented Dec 16, 2021

davidkel commented Nov 26, 2021 •

edited

Loading

gqqnbig commented Dec 9, 2021 •

edited

Loading