Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operation to stop does not match the current operation. #64

Closed
Tadimsky opened this issue May 4, 2018 · 15 comments
Closed

Operation to stop does not match the current operation. #64

Tadimsky opened this issue May 4, 2018 · 15 comments

Comments

@Tadimsky
Copy link

Tadimsky commented May 4, 2018

I'm getting an error using the Service Remoting telemetry.
It appears that the operation that is being stopped is invalid:

"message": "AI (Internal): Operation to stop does not match the current operation. Details: Telemetry Id '|1cafaa09-478a33c4492e47c3.a2a37cb4_' does not match current Activity '|1cafaa09-478a33c4492e47c3.a2a37cb4_1.'",

It looks like Application Insights is trying to stop a2a37cb4_ but it's currently running activity a2a37cb4_1. Notice the 1 at the end.

Any ideas why this would be happening?
I'm using 2.1.1-beta1

@SergeyKanzhelev
Copy link

CC: @lmolkova

@lmolkova
Copy link
Member

lmolkova commented May 4, 2018

It seems like a known issue: microsoft/ApplicationInsights-dotnet#693
Which is likely a side effect of microsoft/ApplicationInsights-dotnet-server#723, that reproduces when debugging the app in Visual Studio.

If it is this issue, it should be fixed in Microsoft.ApplicationInsights.DependencyCollector 2.6.0-beta3 and Microsoft.ApplicationInsights.AspNetCore 2.3.0-beta1.

@Tadimsky could you please check versions of DependencyCollector and AspNetCore (and update them).
Also, do you know if the issue reproduces locally under debug, or in production?

If it reproduces with latest versions, could you please check which if dependency telemetry was reported on the same machine immediately after this message? (unless you have very high-loaded service and this does not make sense?)

Could you please also share some of your telemetries for this operation?

Please use following Analytics queries

  1. requests | union dependencies | where id == "|1cafaa09-478a33c4492e47c3.a2a37cb4_"
  2. requests | union dependencies | where id == "'|1cafaa09-478a33c4492e47c3.a2a37cb4_1."
  3. requests | union dependencies | union traces | where operation_Id == "1cafaa09-478a33c4492e47c3.a2a37cb4"

Please remove any sensitive information like iKey (or anything else). You can also email this info to me.

@Tadimsky
Copy link
Author

Tadimsky commented May 4, 2018

Hey @lmolkova this is happening for me in our Production services:

AI (Internal): Operation to stop does not match the current operation. Details: Telemetry Id '|fac5592a-4fd80273c9a61f11.80081fb1_' does not match current Activity '|fac5592a-4fd80273c9a61f11.80081fb1_1.'  

I'll send you an email with the traces.

@lmolkova
Copy link
Member

lmolkova commented May 4, 2018

@Tadimsky thanks for the info!

The error happens when telemetry we are going to track in StopOperation does not match current Activity. We have tracked such telemetries before (by mistake) and started to enforce this in the latest ApplicationInsights Base SDK. Otherwise, it leads to broken correlation.

So, what I see from the logs is:

Service Whiteboards API makes remoting call to SharesService (I’m omitting not interesting pieces)

Id Type OperationId ParentId
|fac5592a-4fd80273c9a61f11.3.1. dependency fac5592a-4fd80273c9a61f11 |fac5592a-4fd80273c9a61f11.3.
|fac5592a-4fd80273c9a61f11.80081fb1_1. request fac5592a-4fd80273c9a61f11 |fac5592a-4fd80273c9a61f11.3.

Eventually when something with Id |fac5592a-4fd80273c9a61f11.80081fb1_ is being reported, it could not, because Activity with |fac5592a-4fd80273c9a61f11.80081fb1_1 is not stopped.

Some strange things:

  1. Dependency has ParentId |fac5592a-4fd80273c9a61f11.3. – it has never been reported
  2. Request parent is |fac5592a-4fd80273c9a61f11.3. too! While dependency id is |fac5592a-4fd80273c9a61f11.3.1.
  3. Why request Id is fac5592a-4fd80273c9a61f11.80081fb1_1? It should be fac5592a-4fd80273c9a61f11.80081fb1_

So my assumption is:
Activities are created multiple times: e.g. when incoming request starts, we receive several event source events about it. So we attempt to start several nested activities.
Same happens for outgoing requests. This could explain all above discrepancies.

According to @Tadimsky, each service is packed into it's own package and each instance runs on it's own machine, i.e. it does not look like SharedProcess SF issue.

/cc @brahmnes @yantang-msft

@an4286sc
Copy link

an4286sc commented May 14, 2018

@lmolkova I'm getting this error message too in my current project. But only in production. Not while debugging locally. I'm using this telemetry client methods:
TrackAction
TrackTrace
TrackException

No StartOperation or StopOperation.

The Application Insights message is: "AI (Internal): Operation to stop does not match the current operation." No further informations.

@yantang-msft
Copy link
Contributor

@an4286sc What's the service fabric runtime and AI ServiceFabric version you are on?
Are you using Reliable Service or Actors? Remoting V1 or V2? Did you by any chance use Shared Process hosting model in Servicefabric.
The StartOperation and StopOperation will be called automatically when the request is coming to your service or when you make outgoing request.

Would you be able to share the project that can reproduce the issue? That would be the easiest way for us to trouble shoot.

Other than that, when you see the "AI (Internal): Operation to stop does not match the current operation.", it should give you the TelemetryId, e.g., '|fac5592a-4fd80273c9a61f11.80081fb1_', could you run analytic query like below, this will give us all telemetries happened during the request.

requests | union dependencies |
union traces | where operation_Id ==
"fac5592a-4fd80273c9a61f11"

@an4286sc
Copy link

@yantang-msft
Thanks for your reply. The error occurs in the stateless serivce with remoting v2 and shared process hosting. We are using AI Service Fabric 2.1.1-beta1.

See the query results in the attached file.
ai_stop_operation_failure.zip

@yantang-msft
Copy link
Contributor

@an4286sc I do see discrepancies of the operation ids. And what's the version of Microsoft.ServiceFabric.Services.Remoting? Can you try exclusive process model and see if that solves the problem?
I can try the shared process model later this week and see if I can reproduce the issue. If so, we will keep investigate in how to make it work with shared process model.

@yantang-msft
Copy link
Contributor

@Tadimsky @an4286sc I have reproduced this issue and identified the root cause, it's happening for me after I upgraded Microsoft.ApplicationInsights.AspNetCore 2.3.0-beta2. Hopefully you're seeing this error for the same reason.

We are working on the fix, but for some process reason, it will take some time before we can release the fix. A temporary workaround would be downgrade the Microsoft.ApplicationInsights.AspNetCore to 2.2.1.

@Tadimsky
Copy link
Author

@yantang-msft I am on 2.3.0-beta1, do you think this version is affected as well?

@yantang-msft
Copy link
Contributor

@Tadimsky Yes, I confirm 2.3.0-beta1 is also affected.

@Tadimsky
Copy link
Author

@yantang-msft is this expected to be fixed in 2.1.1?
I'm still seeing the issue:

image

image

@yantang-msft
Copy link
Contributor

@Tadimsky Yes, the bug as AI SF side is fixed. Does the huge decrease after 6/25 happened after you upgrade the AI SF package?
It could be somewhere else unrelated to AI SF that is still causing this issue. If you can provide a reproduce, I can definitely take a look and see what's wrong.

@Tadimsky
Copy link
Author

The huge decrease was due to some increased load we had to process for a few days.

This is total requests into the service:
image

We did deploy the fix on 6/27 though - it does seem to be lower, but it's still occurring:
image

The project is the same one I shared with you earlier.
Is there something I can try to do to debug this?

@yantang-msft
Copy link
Contributor

@Tadimsky can you run the query as you did last time and share the result?
You can try remove the AI SF nuget package and see if this still happens. If so then it's could be some other AI package or if you manually called client.StartOperation but didn't call client.StopOperation properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants