-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrections to CLM data collection code #558
Corrections to CLM data collection code #558
Conversation
This is still very much a work in progress but I'm drafting the PR now to make collaboration and review easier along the way, since it is important that we get this released ASAP. It is still experimental, untested, and subject to changes. I'll be adding testcases and doing more full testing next. One thing that I considered adding but have not yet is the ability to set the global scope to exclude wrapped functions from CLM as a class unto themselves. This would mean you could collect CLM for anything you called I'm thinking that with the improvements here that wouldn't be a necessary step, but if it does prove to be a useful configuration option I can add that as well by expanding the list of scopes from the current set of
to
That would necessitate the changing of the raw numeric values behind those constants in order to preserve |
Moved from the automatically-set "awaiting user input" state to the actually-accurate "in progress" one. |
v3/newrelic/code_level_metrics.go
Outdated
type CachedCodeLocation struct { | ||
Location *CodeLocation | ||
Err error | ||
once sync.Once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sync.Once
cannot be copied after first use. Unless you use a pointer here, then if someone copies a CachedCodeLocation
around it will cause issues.
Also, I think Location
and Err
need to be private. There is no way for package clients to use them safely, nor is there a reason for them to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making the once
value a pointer shouldn't be a problem. I also agree with the suggestion to make the members private here. Really the whole CachecCodeLocation
value should be opaque to the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm experimenting now with a new CachedCodeLocation
type which uses a pointer to the once
field so copying the struct should just give you another struct that tracks to the same once
value instead of copying it, and has the fields all unexported. That does mean you need to call NewCachedCodeLocation()
to get one with the pointer set up for you instead of just making a zero-value CachedCodeLocation
so that the pointer is set up before you start doing anything concurrent in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated code just pushed implements this approach.
v3/newrelic/code_level_metrics.go
Outdated
} | ||
|
||
c.once.Do(func() { | ||
// add 4 skip levels to compensate for the internal calls used to get here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is contingent upon the implementation of the sync package, which could be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What issues do you see with the sync package? I thought that was a fairly well-defined part of the standard library at this point. But if there are issues we could explicitly document expectations around that or do something to mitigate the issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that there are 4 levels is an implementation detail. If the Go authors refactor the way sync.Once.Do
works internally it could change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that. Yes, that's on my mind as well. I'm afraid there may need to be some more logic in there to do something more intelligent than blindly skipping a number of levels, like skipping over things that belong to the sync package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code just pushed includes a new implementation that more intentionally steps around the internal agent code and calls out to the sync library and back rather than just skipping a number of calls, which should mitigate this issue.
v3/newrelic/config.go
Outdated
// UnmarshalJSON allows for a CodeLevelMetricsScope value to be read from a JSON | ||
// string whose value is a comma-separated list of scope labels. | ||
// | ||
func (s *CodeLevelMetricsScope) UnmarshalJSON(b []byte) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please implement UmarshalText
instead as it is more generic. (It will be used by JSON, XML, and others.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was trivial enough that I just pushed the change for it. I'll look at the other suggestions next.
v3/newrelic/instrumentation.go
Outdated
if tOptions != nil && !tOptions.SuppressCLM && (tOptions.DemandCLM || app.app.run.Config.CodeLevelMetrics.Scope == 0 || (app.app.run.Config.CodeLevelMetrics.Scope&TransactionCLM) != 0) { | ||
// we are for sure collecting CLM here, so go to the trouble of collecting this code location if nothing else has yet. | ||
if tOptions.LocationOverride == nil { | ||
if loc, err := cache.FunctionLocation(handler); err == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simplify away the need for two caches if you instead put the fallback logic into the callback that the single cache will invoke within the sync.Once
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored that to only need one cache value.
t.Errorf("skipA shows as %v %v", l.LineNo, l.Function) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to see a test of the FunctionLocation() function in a threaded context just to verify the functionality, but also to verify that the memory is not being duplicated, just the pointer. Not something worth holding the release up over, just a nice thing to have
// | ||
type CachedCodeLocation struct { | ||
location *CodeLocation | ||
once *sync.Once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using atomic.LoadPointer
, etc. or atomic.Value
instead of sync.Once
. While it won't be air-tight (two goroutines might do the same lookup in parallel) it should be good enough for what you need, and avoids the extra logic you had to add to omit sync from the stack frame lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, and I considered that earlier on, but I think it would be better to go to the effort to use the more robust sync.Once
function to avoid possible parallel lookups and such, and I think the way the code now steps around the calls to whatever's in the middle (e.g. sync.Once
) will be flexible enough to support future needs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would rather be defensive here, but appreciate the suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. We will do a little more end to end testing just to verify before merging to master.
Updates and corrects CLM collection code.
Fixes issue 557 by correcting where we were unconditionally calling functions to resolve the code location even when code level metrics were disabled globally, as well as doing redundant work in some cases if the user passed their own code locations in. There was also a race condition and thread incompatibility for wrapped function handlers that has been corrected.
Now the agent will never try to resolve code locations unless explicitly asked to (e.g., by the user directly calling
ThisCodeLocation
or similar function), until it knows for sure it needs to collect CLM for that transaction. It will also look to see if code level metrics are locally suppressed (WithoutCodeLevelMetrics()
) or disabled globally and avoid going through the overhead of collecting anything in those cases. It will also avoid adding its own calls toThisCodeLocation
orWithFunctionLocation
(etc) if the user supplied their own explicit function location in the option list. (Previously, it would call one or both of those and discard the results if overridden downstream by user-set options.)Additionally, this PR includes some trivial enhancements that were prompted by the above corrections, and/or we expect will be needed once more users start using the CLM feature so it's better to head that off and get them implemented now:
Like we did with
IgnoredPrefix
, we expandedPathPrefix
from a singlestring
value to a new[]string
fieldPathPrefixes
(which deprecates the oldPathPrefix
field). Congruent with that,ConfigCodeLevelMetricsPathPrefix
now takes any number ofstring
values andNEW_RELIC_CODE_LEVEL_METRICS_PATH_PREFIX
environment variable interprets its value as a comma-separated list of path prefix strings.A transaction-specific
WithPathPrefix
option allows the specification of a different set of source path prefixes to be given for individual transactions instead of the globally configured list.A
WithCodeLevelMetrics
option has been added for transactions as the complement toWithoutCodeLevelMetrics
. This allows for individual transactions to report CLM even if they wouldn't have otherwise (due, for example, to a narrower scope being globally configured which would have excluded this trace). Note that this will never enable CLM for any transaction which also carries aWithoutCodeLevelMetrics
option, or if CLM are disabled globally for the application.A
WithDefaultFunctionLocation
option has been added which is identical toWithFunctionLocation
except that it will only evaluate the location of the given function value if no other option has set the code location to be reported beforeWithDefaultFunctionLocation
was encountered in the option list.