Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrections to CLM data collection code #558

Merged
merged 8 commits into from
Sep 7, 2022
Merged

Corrections to CLM data collection code #558

merged 8 commits into from
Sep 7, 2022

Conversation

nr-swilloughby
Copy link
Contributor

@nr-swilloughby nr-swilloughby commented Aug 24, 2022

Updates and corrects CLM collection code.

Fixes issue 557 by correcting where we were unconditionally calling functions to resolve the code location even when code level metrics were disabled globally, as well as doing redundant work in some cases if the user passed their own code locations in. There was also a race condition and thread incompatibility for wrapped function handlers that has been corrected.

Now the agent will never try to resolve code locations unless explicitly asked to (e.g., by the user directly calling ThisCodeLocation or similar function), until it knows for sure it needs to collect CLM for that transaction. It will also look to see if code level metrics are locally suppressed (WithoutCodeLevelMetrics()) or disabled globally and avoid going through the overhead of collecting anything in those cases. It will also avoid adding its own calls to ThisCodeLocation or WithFunctionLocation (etc) if the user supplied their own explicit function location in the option list. (Previously, it would call one or both of those and discard the results if overridden downstream by user-set options.)

Additionally, this PR includes some trivial enhancements that were prompted by the above corrections, and/or we expect will be needed once more users start using the CLM feature so it's better to head that off and get them implemented now:

Like we did with IgnoredPrefix, we expanded PathPrefix from a single string value to a new []string field PathPrefixes (which deprecates the old PathPrefix field). Congruent with that, ConfigCodeLevelMetricsPathPrefix now takes any number of string values and NEW_RELIC_CODE_LEVEL_METRICS_PATH_PREFIX environment variable interprets its value as a comma-separated list of path prefix strings.

A transaction-specific WithPathPrefix option allows the specification of a different set of source path prefixes to be given for individual transactions instead of the globally configured list.

A WithCodeLevelMetrics option has been added for transactions as the complement to WithoutCodeLevelMetrics. This allows for individual transactions to report CLM even if they wouldn't have otherwise (due, for example, to a narrower scope being globally configured which would have excluded this trace). Note that this will never enable CLM for any transaction which also carries a WithoutCodeLevelMetrics option, or if CLM are disabled globally for the application.

A WithDefaultFunctionLocation option has been added which is identical to WithFunctionLocation except that it will only evaluate the location of the given function value if no other option has set the code location to be reported before WithDefaultFunctionLocation was encountered in the option list.

@nr-swilloughby
Copy link
Contributor Author

nr-swilloughby commented Aug 24, 2022

This is still very much a work in progress but I'm drafting the PR now to make collaboration and review easier along the way, since it is important that we get this released ASAP.

It is still experimental, untested, and subject to changes. I'll be adding testcases and doing more full testing next.

One thing that I considered adding but have not yet is the ability to set the global scope to exclude wrapped functions from CLM as a class unto themselves. This would mean you could collect CLM for anything you called StartTransaction for (or some integration code did on your behalf) but wrapped handler functions would not automatically get CLM as they do now.

I'm thinking that with the improvements here that wouldn't be a necessary step, but if it does prove to be a useful configuration option I can add that as well by expanding the list of scopes from the current set of

  • AllCLM
  • TransactionCLM

to

  • AllCLM
  • TransactionCLM (i.e. the combination of ExplicitTransactionCLM and WrappedTransactionCLM)
  • ExplicitTransactionCLM (i.e., normal StartTransaction calls)
  • WrappedTransactionCLM (functions in WrapHandle and WrapHandleFunc)

That would necessitate the changing of the raw numeric values behind those constants in order to preserve TransactionCLM's semantic meaning for backward compatibility. I have added a note in the comments for those constants warning against relying on those underlying values anyway, and advising to just use the defined names like TransactionCLM instead of using 1 for that value.

@nr-swilloughby
Copy link
Contributor Author

Moved from the automatically-set "awaiting user input" state to the actually-accurate "in progress" one.

@nr-swilloughby nr-swilloughby marked this pull request as ready for review September 2, 2022 22:28
type CachedCodeLocation struct {
Location *CodeLocation
Err error
once sync.Once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A sync.Once cannot be copied after first use. Unless you use a pointer here, then if someone copies a CachedCodeLocation around it will cause issues.

Also, I think Location and Err need to be private. There is no way for package clients to use them safely, nor is there a reason for them to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the once value a pointer shouldn't be a problem. I also agree with the suggestion to make the members private here. Really the whole CachecCodeLocation value should be opaque to the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm experimenting now with a new CachedCodeLocation type which uses a pointer to the once field so copying the struct should just give you another struct that tracks to the same once value instead of copying it, and has the fields all unexported. That does mean you need to call NewCachedCodeLocation() to get one with the pointer set up for you instead of just making a zero-value CachedCodeLocation so that the pointer is set up before you start doing anything concurrent in there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated code just pushed implements this approach.

}

c.once.Do(func() {
// add 4 skip levels to compensate for the internal calls used to get here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is contingent upon the implementation of the sync package, which could be a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What issues do you see with the sync package? I thought that was a fairly well-defined part of the standard library at this point. But if there are issues we could explicitly document expectations around that or do something to mitigate the issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that there are 4 levels is an implementation detail. If the Go authors refactor the way sync.Once.Do works internally it could change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that. Yes, that's on my mind as well. I'm afraid there may need to be some more logic in there to do something more intelligent than blindly skipping a number of levels, like skipping over things that belong to the sync package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code just pushed includes a new implementation that more intentionally steps around the internal agent code and calls out to the sync library and back rather than just skipping a number of calls, which should mitigate this issue.

// UnmarshalJSON allows for a CodeLevelMetricsScope value to be read from a JSON
// string whose value is a comma-separated list of scope labels.
//
func (s *CodeLevelMetricsScope) UnmarshalJSON(b []byte) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement UmarshalText instead as it is more generic. (It will be used by JSON, XML, and others.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was trivial enough that I just pushed the change for it. I'll look at the other suggestions next.

if tOptions != nil && !tOptions.SuppressCLM && (tOptions.DemandCLM || app.app.run.Config.CodeLevelMetrics.Scope == 0 || (app.app.run.Config.CodeLevelMetrics.Scope&TransactionCLM) != 0) {
// we are for sure collecting CLM here, so go to the trouble of collecting this code location if nothing else has yet.
if tOptions.LocationOverride == nil {
if loc, err := cache.FunctionLocation(handler); err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify away the need for two caches if you instead put the fallback logic into the callback that the single cache will invoke within the sync.Once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored that to only need one cache value.

t.Errorf("skipA shows as %v %v", l.LineNo, l.Function)
}
}

Copy link
Contributor

@iamemilio iamemilio Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to see a test of the FunctionLocation() function in a threaded context just to verify the functionality, but also to verify that the memory is not being duplicated, just the pointer. Not something worth holding the release up over, just a nice thing to have

//
type CachedCodeLocation struct {
location *CodeLocation
once *sync.Once
Copy link
Contributor

@rittneje rittneje Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using atomic.LoadPointer, etc. or atomic.Value instead of sync.Once. While it won't be air-tight (two goroutines might do the same lookup in parallel) it should be good enough for what you need, and avoids the extra logic you had to add to omit sync from the stack frame lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, and I considered that earlier on, but I think it would be better to go to the effort to use the more robust sync.Once function to avoid possible parallel lookups and such, and I think the way the code now steps around the calls to whatever's in the middle (e.g. sync.Once) will be flexible enough to support future needs as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would rather be defensive here, but appreciate the suggestion

Copy link
Contributor

@iamemilio iamemilio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We will do a little more end to end testing just to verify before merging to master.

@nr-swilloughby nr-swilloughby merged commit 6d194e9 into newrelic:develop Sep 7, 2022
@nr-swilloughby nr-swilloughby deleted the clm_updates_3_18_2 branch September 7, 2022 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants