Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

balancer/rls: allow maxAge to exceed 5m if staleAge is set #8137

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

shivaspeaks
Copy link
Member

@shivaspeaks shivaspeaks commented Mar 1, 2025

Internal feature request: b/371591767

RELEASE NOTES:

  • balancer/rls: allow maxAge to exceed 5m if staleAge is set for the rls cache

Copy link

codecov bot commented Mar 1, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.23%. Comparing base (8ae4b7d) to head (bebde4c).
Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8137      +/-   ##
==========================================
- Coverage   82.34%   82.23%   -0.12%     
==========================================
  Files         389      392       +3     
  Lines       39103    39153      +50     
==========================================
- Hits        32200    32197       -3     
- Misses       5579     5628      +49     
- Partials     1324     1328       +4     
Files with missing lines Coverage Δ
balancer/rls/config.go 82.95% <100.00%> (+1.35%) ⬆️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

logger.Infof("rls: max_age in route lookup config is %v, using %v", maxAge, maxMaxAge)
maxAge = maxMaxAge
}
if staleAge > maxAge {
logger.Infof("rls: stale_age %v is not less than max_age %v, ignoring it", staleAge, maxAge)
staleAge = maxAge
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume, ignoring staleAge let us to set it to maxAge and that should be fine.

@@ -219,7 +219,7 @@ func parseRLSProto(rlsProto *rlspb.RouteLookupConfig) (*lbConfig, error) {
}

// Validations performed here:
// - if `max_age` > 5m, it should be set to 5 minutes
// - if `max_age` > 5m, it should be set to 5 minutes only if stale age is not set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you make sure to wrap the comments within 80 cols?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any checkstyle failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no checker to test but for go 80 cols is limit for comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go/go-style/decisions#comment-line-length

logger.Infof("rls: max_age in route lookup config is %v, using %v", maxAge, maxMaxAge)
maxAge = maxMaxAge
}
if staleAge > maxAge {
logger.Infof("rls: stale_age %v is not less than max_age %v, ignoring it", staleAge, maxAge)
staleAge = maxAge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't staleAge be set to 0 because we are ignoring it? That's how its done before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#Comment23 talks about clamping the these values. I think this will be an improvement. Also rls_config.proto has been changed that talks about clamping the stale_age: grpc/grpc-proto@0462d4b

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stale_age was set to 0 here, #3379. But now, as per rls_config it should be clamped. That's what java and core are doing.

CC: @easwars

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the following logic in C-core

  • Internal representatino of stale_age and max_age both start off with default value of 5m, which is max allowed value
  • In the JSON configuration, check if stale_age and max_age are set in the configuration:
    • If stale_age is set, but max_age is not set, then throw an error
  • Clamp stale_age to the max allowed value
  • If stale_age is not set, clamp max_age to the max allowed value
  • If stale_age is greater than or equal to max_age, set stale_age to be the same as max_age

In Go, we marshal the JSON configuration into a proto message and perform all validation and build our internal representation from the proto message. Because the proto API in Go does not support checking if a value is set or not, I suggest going with the following approach.

  • Read staleAge from proto using the convertDuration helper function. If the read staleAge is non-zero, then set staleAgeSet to be true. If the read staleAge is zero, set staleAge to maxMaxAge which is 5m.
	staleAgeSet := false
	staleAge, err := convertDuration(rlsProto.GetStaleAge())
	if err != nil {
		return nil, fmt.Errorf("rls: failed to parse staleAge in route lookup config %+v: %v", rlsProto, err)
	}
	if staleAge == 0 {
		staleAge = maxMaxAge
	} else {
		staleAgeSet = true
	}
  • Do the same thing for maxAge
  • Then check whether both are set or not
	If staleAgeSet && !maxAgeSet {
		return nil, fmt.Errorf("rls: stale_age is set, but max_age is not in route lookup config %+v", rlsProto)
	}
  • Clamp stale_age to the max allowed value
	if staleAge > maxMaxAge {
		staleAge = maxMaxAge
	}
  • If stale_age is not set, clamp max_age to the max allowed value
	if !staleAgeSet && maxAge > maxMaxAge {
		maxAge = maxMaxAge
	}
  • If stale_age is greater than or equal to max_age, set stale_age to be the same as max_age
	if staleAge > maxAge {
		staleAge = maxAge
	}

Let me know what you think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems ditto logic-wise. The only diff is, we are always assigning some value to staleAge and that should be fine.
But looks good to use staleAgeSet and maxAgeSet . Makes it much easier to understand for the reader.

PS: In java we are not using these extra variables. We are playing with staleAge and maxAge only.

}
if maxAge == 0 || maxAge > maxMaxAge {
if staleAge == 0 && maxAge > maxMaxAge {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need to update this condition at line 239 and leave rest of the stuff as is. Just to make sure we are not doing any unintended changes here and only updating the validation to allow maxAge > 5 but only if staleAge is set. So, may be something like this should suffice?

if maxAge == 0 {
		logger.Infof("rls: max_age in route lookup config is %v, using %v", maxAge, maxMaxAge)
		maxAge = maxMaxAge
	} else if maxAge > maxMaxAge && staleAge == 0 {
		logger.Infof("rls: max_age in route lookup config is %v and stale_age is not set, using %v", maxAge, maxMaxAge)
		maxAge = maxMaxAge
	}
	```
	
We update maxAge to maxMaxAge only if its either not set or its set but staleAge is not set. Otherwise we just keep the value assigned to it.

@purnesh42H purnesh42H added Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. Type: Feature New features or improvements in behavior labels Mar 4, 2025
@purnesh42H purnesh42H changed the title rls: allow maxAge to exceed 5m if staleAge is set balancer/rls: allow maxAge to exceed 5m if staleAge is set Mar 4, 2025
@@ -219,7 +219,7 @@ func parseRLSProto(rlsProto *rlspb.RouteLookupConfig) (*lbConfig, error) {
}

// Validations performed here:
// - if `max_age` > 5m, it should be set to 5 minutes
// - if `max_age` > 5m, it should be set to 5 minutes only if stale age is not set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go/go-style/decisions#comment-line-length

logger.Infof("rls: max_age in route lookup config is %v, using %v", maxAge, maxMaxAge)
maxAge = maxMaxAge
}
if staleAge > maxAge {
logger.Infof("rls: stale_age %v is not less than max_age %v, ignoring it", staleAge, maxAge)
staleAge = maxAge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the following logic in C-core

  • Internal representatino of stale_age and max_age both start off with default value of 5m, which is max allowed value
  • In the JSON configuration, check if stale_age and max_age are set in the configuration:
    • If stale_age is set, but max_age is not set, then throw an error
  • Clamp stale_age to the max allowed value
  • If stale_age is not set, clamp max_age to the max allowed value
  • If stale_age is greater than or equal to max_age, set stale_age to be the same as max_age

In Go, we marshal the JSON configuration into a proto message and perform all validation and build our internal representation from the proto message. Because the proto API in Go does not support checking if a value is set or not, I suggest going with the following approach.

  • Read staleAge from proto using the convertDuration helper function. If the read staleAge is non-zero, then set staleAgeSet to be true. If the read staleAge is zero, set staleAge to maxMaxAge which is 5m.
	staleAgeSet := false
	staleAge, err := convertDuration(rlsProto.GetStaleAge())
	if err != nil {
		return nil, fmt.Errorf("rls: failed to parse staleAge in route lookup config %+v: %v", rlsProto, err)
	}
	if staleAge == 0 {
		staleAge = maxMaxAge
	} else {
		staleAgeSet = true
	}
  • Do the same thing for maxAge
  • Then check whether both are set or not
	If staleAgeSet && !maxAgeSet {
		return nil, fmt.Errorf("rls: stale_age is set, but max_age is not in route lookup config %+v", rlsProto)
	}
  • Clamp stale_age to the max allowed value
	if staleAge > maxMaxAge {
		staleAge = maxMaxAge
	}
  • If stale_age is not set, clamp max_age to the max allowed value
	if !staleAgeSet && maxAge > maxMaxAge {
		maxAge = maxMaxAge
	}
  • If stale_age is greater than or equal to max_age, set stale_age to be the same as max_age
	if staleAge > maxAge {
		staleAge = maxAge
	}

Let me know what you think.

},
},
{
desc: "with transformations 1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value for this field needs to be different in each of the sub tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to know why it didn't fail even with same value with this field! And surprisingly it made it also through github checks!
Now it is getting caught and failing in the editor itself when I'm running the tests.

{
desc: "with transformations 1",
input: []byte(`{
"top-level-unknown-field": "unknown-value",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required for this sub-test.

input: []byte(`{
"top-level-unknown-field": "unknown-value",
"routeLookupConfig": {
"unknown-field": "unknown-value",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with this field.

Comment on lines +91 to +92
maxAge: 500 * time.Second, // Max age is not clamped when stale age is set.
staleAge: 300 * time.Second, // StaleAge is clamped because it was higher than maxMaxAge.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at the top of this sub-test also needs to change.

Comment on lines 120 to 121
{"cds_experimental": {"Cluster": "my-fav-cluster"}},
{"unknown-policy": {"unknown-field": "unknown-value"}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of these two lines because they are not pertinent to this sub-test.

"maxAge" : "500s",
"staleAge": "200s",
"cacheSizeBytes": 100000000,
"defaultTarget": "passthrough:///default"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get rid of the defaultTarget as well. Not pertinent to this sub-test.

"lookupService": ":///target",
"maxAge" : "500s",
"cacheSizeBytes": 100000000,
"defaultTarget": "passthrough:///default"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get rid of the defaultTarget as well. Not pertinent to this sub-test.

Comment on lines 157 to 158
{"cds_experimental": {"Cluster": "my-fav-cluster"}},
{"unknown-policy": {"unknown-field": "unknown-value"}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get rid of these two lines because they are not pertinent to this sub-test.

@easwars easwars removed their assignment Mar 5, 2025
@easwars easwars added this to the 1.72 Release milestone Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Resolvers/Balancers Includes LB policy & NR APIs, resolver/balancer/picker wrappers, LB policy impls and utilities. Type: Feature New features or improvements in behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants