adaptive longOp for du operation #1766

sjenning · 2017-10-05T03:17:05Z

When running pod density tests on the order of 500 pods, the du operation in the fsHandler can take longer than a second, even if everything about the filesystem is cached. This is a persistent condition that causes the message to continually print.

This PR makes longOp adaptive so if the long du duration is a persistent condition, the message won't continue in the logs forever.

https://bugzilla.redhat.com/show_bug.cgi?id=1498632

@dashpole @vishh @derekwaynecarr @ravisantoshgudimetla

ravisantoshgudimetla · 2017-10-05T03:45:47Z

@sjenning Sometimes, I have noticed that du operation is taking more than 30 seconds, which means this line will be printed 300 times after which it should stop logging, which I believe should work fine. Any specific reason for not decreasing log level from 2 to higher value?

ravisantoshgudimetla · 2017-10-05T03:59:10Z

container/common/fsHandler.go

@@ -60,6 +59,8 @@ const DefaultPeriod = time.Minute

 var _ FsHandler = &realFsHandler{}

+var longOp = time.Second


Can we move this variable to trackUsage fn?

sjenning · 2017-10-05T03:59:41Z

@ravisantoshgudimetla at first I did just decrease the log level. But then someone running at the higher logging level would have the same complaint; that being continually seeing a message warning me about a persistent condition that I can't do anything about because someone decided what "too long" was.

ravisantoshgudimetla · 2017-10-05T04:02:42Z

at first I did just decrease the log level. But then someone running at the higher logging level would have the same complaint

Makes sense.

derekwaynecarr · 2017-10-05T14:11:28Z

container/common/fsHandler.go

@@ -129,6 +129,10 @@ func (fh *realFsHandler) trackUsage() {
 			duration := time.Since(start)
 			if duration > longOp {
 				glog.V(2).Infof("du and find on following dirs took %v: %v", duration, []string{fh.rootfs, fh.extraDir})
+				// adapt longOp time so that message doesn't continue to print


log something like the following:

"du and find on following dirs took %v: %v; will log in future if exceeds %v"

derekwaynecarr · 2017-10-05T14:27:40Z

LGTM

derekwaynecarr · 2017-10-05T14:31:10Z

/retest

@derekwaynecarr

Automatic merge from submit-queue. UPSTREAM: google/cadvisor: 1766: adaptive longOp for du operation google/cadvisor#1766 xref https://bugzilla.redhat.com/show_bug.cgi?id=1498632 @derekwaynecarr @ravisantoshgudimetla hold pending #16615

ravisantoshgudimetla reviewed Oct 5, 2017

View reviewed changes

sjenning force-pushed the adapt-long-du branch from 694214f to 1cfd639 Compare October 5, 2017 04:01

ravisantoshgudimetla approved these changes Oct 5, 2017

View reviewed changes

derekwaynecarr reviewed Oct 5, 2017

View reviewed changes

sjenning force-pushed the adapt-long-du branch from 1cfd639 to 79f037d Compare October 5, 2017 14:21

adaptive longOp for du operation

fd9c6d2

sjenning force-pushed the adapt-long-du branch from 79f037d to fd9c6d2 Compare October 5, 2017 14:23

derekwaynecarr approved these changes Oct 5, 2017

View reviewed changes

derekwaynecarr merged commit 3e659ec into google:master Oct 5, 2017

sjenning mentioned this pull request Oct 5, 2017

UPSTREAM: google/cadvisor: 1766: adaptive longOp for du operation openshift/origin#16697

Merged

sjenning mentioned this pull request Oct 30, 2017

fix long du duration message #1785

Merged

uniuuu mentioned this pull request Feb 15, 2023

du and find on following dirs took long time #2212

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adaptive longOp for du operation #1766

adaptive longOp for du operation #1766

sjenning commented Oct 5, 2017

ravisantoshgudimetla commented Oct 5, 2017 •

edited

Loading

ravisantoshgudimetla Oct 5, 2017

sjenning commented Oct 5, 2017

ravisantoshgudimetla commented Oct 5, 2017

derekwaynecarr Oct 5, 2017

derekwaynecarr commented Oct 5, 2017

derekwaynecarr commented Oct 5, 2017

		@@ -60,6 +59,8 @@ const DefaultPeriod = time.Minute

		var _ FsHandler = &realFsHandler{}

		var longOp = time.Second

adaptive longOp for du operation #1766

adaptive longOp for du operation #1766

Conversation

sjenning commented Oct 5, 2017

ravisantoshgudimetla commented Oct 5, 2017 • edited Loading

ravisantoshgudimetla Oct 5, 2017

Choose a reason for hiding this comment

sjenning commented Oct 5, 2017

ravisantoshgudimetla commented Oct 5, 2017

derekwaynecarr Oct 5, 2017

Choose a reason for hiding this comment

derekwaynecarr commented Oct 5, 2017

derekwaynecarr commented Oct 5, 2017

ravisantoshgudimetla commented Oct 5, 2017 •

edited

Loading