-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adaptive longOp for du operation #1766
Conversation
@sjenning Sometimes, I have noticed that du operation is taking more than 30 seconds, which means this line will be printed 300 times after which it should stop logging, which I believe should work fine. Any specific reason for not decreasing log level from 2 to higher value? |
container/common/fsHandler.go
Outdated
@@ -60,6 +59,8 @@ const DefaultPeriod = time.Minute | |||
|
|||
var _ FsHandler = &realFsHandler{} | |||
|
|||
var longOp = time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this variable to trackUsage fn?
@ravisantoshgudimetla at first I did just decrease the log level. But then someone running at the higher logging level would have the same complaint; that being continually seeing a message warning me about a persistent condition that I can't do anything about because someone decided what "too long" was. |
694214f
to
1cfd639
Compare
Makes sense. |
container/common/fsHandler.go
Outdated
@@ -129,6 +129,10 @@ func (fh *realFsHandler) trackUsage() { | |||
duration := time.Since(start) | |||
if duration > longOp { | |||
glog.V(2).Infof("du and find on following dirs took %v: %v", duration, []string{fh.rootfs, fh.extraDir}) | |||
// adapt longOp time so that message doesn't continue to print |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log something like the following:
"du and find on following dirs took %v: %v; will log in future if exceeds %v"
1cfd639
to
79f037d
Compare
79f037d
to
fd9c6d2
Compare
LGTM |
/retest |
Automatic merge from submit-queue. UPSTREAM: google/cadvisor: 1766: adaptive longOp for du operation google/cadvisor#1766 xref https://bugzilla.redhat.com/show_bug.cgi?id=1498632 @derekwaynecarr @ravisantoshgudimetla hold pending #16615
When running pod density tests on the order of 500 pods, the du operation in the fsHandler can take longer than a second, even if everything about the filesystem is cached. This is a persistent condition that causes the message to continually print.
This PR makes
longOp
adaptive so if the long du duration is a persistent condition, the message won't continue in the logs forever.https://bugzilla.redhat.com/show_bug.cgi?id=1498632
@dashpole @vishh @derekwaynecarr @ravisantoshgudimetla