Corrections to CLM data collection code #558

nr-swilloughby · 2022-08-24T21:48:27Z

Updates and corrects CLM collection code.

Fixes issue 557 by correcting where we were unconditionally calling functions to resolve the code location even when code level metrics were disabled globally, as well as doing redundant work in some cases if the user passed their own code locations in. There was also a race condition and thread incompatibility for wrapped function handlers that has been corrected.

Now the agent will never try to resolve code locations unless explicitly asked to (e.g., by the user directly calling ThisCodeLocation or similar function), until it knows for sure it needs to collect CLM for that transaction. It will also look to see if code level metrics are locally suppressed (WithoutCodeLevelMetrics()) or disabled globally and avoid going through the overhead of collecting anything in those cases. It will also avoid adding its own calls to ThisCodeLocation or WithFunctionLocation (etc) if the user supplied their own explicit function location in the option list. (Previously, it would call one or both of those and discard the results if overridden downstream by user-set options.)

Additionally, this PR includes some trivial enhancements that were prompted by the above corrections, and/or we expect will be needed once more users start using the CLM feature so it's better to head that off and get them implemented now:

Like we did with IgnoredPrefix, we expanded PathPrefix from a single string value to a new []string field PathPrefixes (which deprecates the old PathPrefix field). Congruent with that, ConfigCodeLevelMetricsPathPrefix now takes any number of string values and NEW_RELIC_CODE_LEVEL_METRICS_PATH_PREFIX environment variable interprets its value as a comma-separated list of path prefix strings.

A transaction-specific WithPathPrefix option allows the specification of a different set of source path prefixes to be given for individual transactions instead of the globally configured list.

A WithCodeLevelMetrics option has been added for transactions as the complement to WithoutCodeLevelMetrics. This allows for individual transactions to report CLM even if they wouldn't have otherwise (due, for example, to a narrower scope being globally configured which would have excluded this trace). Note that this will never enable CLM for any transaction which also carries a WithoutCodeLevelMetrics option, or if CLM are disabled globally for the application.

A WithDefaultFunctionLocation option has been added which is identical to WithFunctionLocation except that it will only evaluate the location of the given function value if no other option has set the code location to be reported before WithDefaultFunctionLocation was encountered in the option list.

nr-swilloughby · 2022-08-24T21:57:16Z

This is still very much a work in progress but I'm drafting the PR now to make collaboration and review easier along the way, since it is important that we get this released ASAP.

It is still experimental, untested, and subject to changes. I'll be adding testcases and doing more full testing next.

One thing that I considered adding but have not yet is the ability to set the global scope to exclude wrapped functions from CLM as a class unto themselves. This would mean you could collect CLM for anything you called StartTransaction for (or some integration code did on your behalf) but wrapped handler functions would not automatically get CLM as they do now.

I'm thinking that with the improvements here that wouldn't be a necessary step, but if it does prove to be a useful configuration option I can add that as well by expanding the list of scopes from the current set of

AllCLM
TransactionCLM

to

AllCLM
TransactionCLM (i.e. the combination of ExplicitTransactionCLM and WrappedTransactionCLM)
ExplicitTransactionCLM (i.e., normal StartTransaction calls)
WrappedTransactionCLM (functions in WrapHandle and WrapHandleFunc)

That would necessitate the changing of the raw numeric values behind those constants in order to preserve TransactionCLM's semantic meaning for backward compatibility. I have added a note in the comments for those constants warning against relying on those underlying values anyway, and advising to just use the defined names like TransactionCLM instead of using 1 for that value.

v3/newrelic/instrumentation.go

v3/newrelic/code_level_metrics.go

nr-swilloughby · 2022-08-24T23:49:03Z

Moved from the automatically-set "awaiting user input" state to the actually-accurate "in progress" one.

v3/newrelic/config.go

v3/newrelic/instrumentation.go

rittneje · 2022-09-02T22:33:58Z

v3/newrelic/code_level_metrics.go

+type CachedCodeLocation struct {
+	Location *CodeLocation
+	Err      error
+	once     sync.Once


A sync.Once cannot be copied after first use. Unless you use a pointer here, then if someone copies a CachedCodeLocation around it will cause issues.

Also, I think Location and Err need to be private. There is no way for package clients to use them safely, nor is there a reason for them to do so.

Making the once value a pointer shouldn't be a problem. I also agree with the suggestion to make the members private here. Really the whole CachecCodeLocation value should be opaque to the caller.

Ok, I'm experimenting now with a new CachedCodeLocation type which uses a pointer to the once field so copying the struct should just give you another struct that tracks to the same once value instead of copying it, and has the fields all unexported. That does mean you need to call NewCachedCodeLocation() to get one with the pointer set up for you instead of just making a zero-value CachedCodeLocation so that the pointer is set up before you start doing anything concurrent in there.

The updated code just pushed implements this approach.

rittneje · 2022-09-02T22:36:40Z

v3/newrelic/code_level_metrics.go

+	}
+
+	c.once.Do(func() {
+		// add 4 skip levels to compensate for the internal calls used to get here.


This is contingent upon the implementation of the sync package, which could be a problem.

What issues do you see with the sync package? I thought that was a fairly well-defined part of the standard library at this point. But if there are issues we could explicitly document expectations around that or do something to mitigate the issues.

The fact that there are 4 levels is an implementation detail. If the Go authors refactor the way sync.Once.Do works internally it could change.

Oh, that. Yes, that's on my mind as well. I'm afraid there may need to be some more logic in there to do something more intelligent than blindly skipping a number of levels, like skipping over things that belong to the sync package.

The code just pushed includes a new implementation that more intentionally steps around the internal agent code and calls out to the sync library and back rather than just skipping a number of calls, which should mitigate this issue.

rittneje · 2022-09-02T22:38:26Z

v3/newrelic/config.go

+// UnmarshalJSON allows for a CodeLevelMetricsScope value to be read from a JSON
+// string whose value is a comma-separated list of scope labels.
+//
+func (s *CodeLevelMetricsScope) UnmarshalJSON(b []byte) error {


Please implement UmarshalText instead as it is more generic. (It will be used by JSON, XML, and others.)

That was trivial enough that I just pushed the change for it. I'll look at the other suggestions next.

rittneje · 2022-09-02T22:45:40Z

v3/newrelic/instrumentation.go

+			if tOptions != nil && !tOptions.SuppressCLM && (tOptions.DemandCLM || app.app.run.Config.CodeLevelMetrics.Scope == 0 || (app.app.run.Config.CodeLevelMetrics.Scope&TransactionCLM) != 0) {
+				// we are for sure collecting CLM here, so go to the trouble of collecting this code location if nothing else has yet.
+				if tOptions.LocationOverride == nil {
+					if loc, err := cache.FunctionLocation(handler); err == nil {


You can simplify away the need for two caches if you instead put the fallback logic into the callback that the single cache will invoke within the sync.Once.

I refactored that to only need one cache value.

iamemilio · 2022-09-07T16:38:15Z

v3/newrelic/code_level_metrics_test.go

+		t.Errorf("skipA shows as %v %v", l.LineNo, l.Function)
+	}
+}
+


I think it would be nice to see a test of the FunctionLocation() function in a threaded context just to verify the functionality, but also to verify that the memory is not being duplicated, just the pointer. Not something worth holding the release up over, just a nice thing to have

rittneje · 2022-09-07T16:59:59Z

v3/newrelic/code_level_metrics.go

+//
+type CachedCodeLocation struct {
+	location *CodeLocation
+	once     *sync.Once


Consider using atomic.LoadPointer, etc. or atomic.Value instead of sync.Once. While it won't be air-tight (two goroutines might do the same lookup in parallel) it should be good enough for what you need, and avoids the extra logic you had to add to omit sync from the stack frame lookup.

Good suggestion, and I considered that earlier on, but I think it would be better to go to the effort to use the more robust sync.Once function to avoid possible parallel lookups and such, and I think the way the code now steps around the calls to whatever's in the middle (e.g. sync.Once) will be flexible enough to support future needs as well.

I think we would rather be defensive here, but appreciate the suggestion

iamemilio

Looks good to me. We will do a little more end to end testing just to verify before merging to master.

corrections to CLM data collection code

7b85a24

nr-swilloughby added bug enhancement labels Aug 24, 2022

nr-swilloughby requested a review from iamemilio August 24, 2022 21:48

nr-swilloughby self-assigned this Aug 24, 2022

rittneje reviewed Aug 24, 2022

View reviewed changes

v3/newrelic/instrumentation.go Show resolved Hide resolved

rittneje reviewed Aug 24, 2022

View reviewed changes

v3/newrelic/instrumentation.go Show resolved Hide resolved

rittneje reviewed Aug 24, 2022

View reviewed changes

v3/newrelic/code_level_metrics.go Outdated Show resolved Hide resolved

rittneje reviewed Aug 25, 2022

View reviewed changes

v3/newrelic/config.go Show resolved Hide resolved

rittneje reviewed Aug 25, 2022

View reviewed changes

v3/newrelic/instrumentation.go Outdated Show resolved Hide resolved

nr-swilloughby added 2 commits September 2, 2022 15:20

debugged code, added caching functionality

bc90987

corrected typo on comments

90426a7

nr-swilloughby marked this pull request as ready for review September 2, 2022 22:28

rittneje reviewed Sep 2, 2022

View reviewed changes

nr-swilloughby added 4 commits September 2, 2022 15:57

changed from JSON marshalling to Text marshalling for CLM scope

417e758

first cut at optimizing cache locations and making them safe to copy

070c59b

updates to CLM bug fixes, corrects compensation for sync.once.Do calls.

d19621a

added comment for NewCachedCodeLocation

435d03d

iamemilio reviewed Sep 7, 2022

View reviewed changes

corrected call to CallersFrames

c2cb8ea

rittneje reviewed Sep 7, 2022

View reviewed changes

iamemilio approved these changes Sep 7, 2022

View reviewed changes

nr-swilloughby merged commit 6d194e9 into newrelic:develop Sep 7, 2022

nr-swilloughby deleted the clm_updates_3_18_2 branch September 7, 2022 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrections to CLM data collection code #558

Corrections to CLM data collection code #558

nr-swilloughby commented Aug 24, 2022 •

edited

Loading

nr-swilloughby commented Aug 24, 2022 •

edited

Loading

nr-swilloughby commented Aug 24, 2022

rittneje Sep 2, 2022

nr-swilloughby Sep 2, 2022

nr-swilloughby Sep 3, 2022

nr-swilloughby Sep 6, 2022

rittneje Sep 2, 2022

nr-swilloughby Sep 2, 2022

rittneje Sep 2, 2022

nr-swilloughby Sep 2, 2022

nr-swilloughby Sep 6, 2022

rittneje Sep 2, 2022

nr-swilloughby Sep 2, 2022

rittneje Sep 2, 2022

nr-swilloughby Sep 3, 2022

iamemilio Sep 7, 2022 •

edited

Loading

rittneje Sep 7, 2022 •

edited

Loading

nr-swilloughby Sep 7, 2022

iamemilio Sep 7, 2022

iamemilio left a comment

Corrections to CLM data collection code #558

Corrections to CLM data collection code #558

Conversation

nr-swilloughby commented Aug 24, 2022 • edited Loading

nr-swilloughby commented Aug 24, 2022 • edited Loading

nr-swilloughby commented Aug 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iamemilio Sep 7, 2022 • edited Loading

Choose a reason for hiding this comment

rittneje Sep 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iamemilio left a comment

Choose a reason for hiding this comment

nr-swilloughby commented Aug 24, 2022 •

edited

Loading

nr-swilloughby commented Aug 24, 2022 •

edited

Loading

iamemilio Sep 7, 2022 •

edited

Loading

rittneje Sep 7, 2022 •

edited

Loading