Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GODRIVER-3255 Await heartbeat checks upto freq when polling #1720

Merged
merged 12 commits into from
Aug 1, 2024

Conversation

prestonvasquez
Copy link
Collaborator

@prestonvasquez prestonvasquez commented Jul 30, 2024

Summary

Align the heartbeat check with driver specifications:

if (streamingEnabled and (serverSupportsStreaming or connectionIsStreaming)) or transitionedWithNetworkError:
    continue

Background & Motivation

When serverMonitoringMode=poll, SDAM awaits is not awaiting the full heartbeatFrequency before handshaking with the server.

@mongodb-drivers-pr-bot mongodb-drivers-pr-bot bot added the priority-3-low Low Priority PR for Review label Jul 30, 2024
Copy link
Contributor

API Change Report

No changes found!

@prestonvasquez prestonvasquez marked this pull request as ready for review July 30, 2024 02:15
@prestonvasquez prestonvasquez added priority-2-medium Medium Priority PR for Review and removed priority-3-low Low Priority PR for Review labels Jul 30, 2024
@prestonvasquez prestonvasquez added priority-1-high High Priority PR for Review and removed priority-2-medium Medium Priority PR for Review labels Jul 30, 2024
matthewdale
matthewdale previously approved these changes Jul 31, 2024
Copy link
Collaborator

@matthewdale matthewdale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with one suggestion. 👍

Comment on lines 262 to 265
ticker := time.NewTicker(heartbeatInterval / 4)
t.Cleanup(ticker.Stop)

timer := time.NewTimer(heartbeatInterval - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: This seems to rely on precise sleep synchronization and may intermittently fail if the scheduler responds to the timer goroutine slightly after the ticker goroutine (on the 4th "tick"). Consider using a more reliable assertion, like asserting a maximum number of heartbeats during an elapsed time.

E.g.

start := time.Now()
time.Sleep(heartbeatInterval * 4)
got := heartbeatStartedCount.Load()

elapsed := time.Since(start)
wantMax := elapsed / heartbeatInterval

assert.LessOrEqual(mt,
	got,
	wantMax,
	"expected fewer than %d heartbeats in %v",
	wantMax,
	elapsed)

It may also be helpful to further reduce the heartbeat interval to speed up the test.

Copy link
Collaborator Author

@prestonvasquez prestonvasquez Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! A more robust solution would be to assert that the number of handshakes align with the number of discovered servers within minHeartbeatFrequencyMS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, you'd have to know how many servers connected 🤔 . I believe you could derive that from the heartbeat events by counting distinct addresses.

qingyang-hu
qingyang-hu previously approved these changes Jul 31, 2024
// servers.
time.Sleep(500 * time.Millisecond)

assert.LessOrEqual(mt, heartbeatStartedCount.Load(), int64(len(servers)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also make this a strong equality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that could cause failures in some cases. Go timers (including tickers) guarantee that a goroutine will continue no earlier than the deadline set. However, there are no strong guarantees for how soon after the deadline the goroutine will continue. As a result, it's possible (though unlikely) that the polling loop will not continue as many times as expected during the test duration.


mt.Run("polling must await frequency", func(mt *mtest.T) {
var heartbeatStartedCount atomic.Int64
servers := map[string]bool{}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly certain there are no concurrency concerns here since this is a set (i.e. nothing is aggregated).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go maps are not concurrent safe except for read-only use, independent of the keys or data written. Here's the data race detector error:

WARNING: DATA RACE
Read at 0x00c000511470 by goroutine 122:
  go.mongodb.org/mongo-driver/mongo/integration.TestServerHeartbeatStartedEvent.func2()
      /mongo-go-driver/mongo/integration/sdam_prose_test.go:266 +0x370
  go.mongodb.org/mongo-driver/mongo/integration/mtest.(*T).Run.(*T).RunOpts.func1()
      /mongo-go-driver/mongo/integration/mtest/mongotest.go:267 +0x2ac
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1689 +0x180
  testing.(*T).Run.gowrap1()
      /usr/local/go/src/testing/testing.go:1742 +0x40

Previous write at 0x00c000511470 by goroutine 195:
  runtime.mapaccess2_faststr()
      /usr/local/go/src/runtime/map_faststr.go:108 +0x42c
  go.mongodb.org/mongo-driver/mongo/integration.TestServerHeartbeatStartedEvent.func2.2()
      /mongo-go-driver/mongo/integration/sdam_prose_test.go:249 +0xb8
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).publishTopologyDescriptionChangedEvent()
      /mongo-go-driver/x/mongo/driver/topology/topology.go:1067 +0x224
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Topology).apply()
      /mongo-go-driver/x/mongo/driver/topology/topology.go:973 +0xe98
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.New.func1()
      /mongo-go-driver/x/mongo/driver/topology/topology.go:163 +0x78
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).updateDescription()
      /mongo-go-driver/x/mongo/driver/topology/server.go:707 +0x140
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).update.func3()
      /mongo-go-driver/x/mongo/driver/topology/server.go:624 +0xd8
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).update()
      /mongo-go-driver/x/mongo/driver/topology/server.go:654 +0x4f0
  go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).Connect.gowrap1()
      /mongo-go-driver/x/mongo/driver/topology/server.go:252 +0x34

Copy link
Collaborator Author

@prestonvasquez prestonvasquez Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of this test it seems like it shouldn't matter. We're just checking that the number of heartbeats doesn't get ahead of discovery, AFAIK the size of the set doesn't have to be precise at the time we make the assertion. Anyway, updated with a mutex.

Copy link
Collaborator Author

@prestonvasquez prestonvasquez Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though there is a race concern I don't think you can actually break what is being tested. I wasn't sure the race detector would care in this case 🤷

Copy link
Collaborator

@qingyang-hu qingyang-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT!

@prestonvasquez prestonvasquez merged commit 7fcbd5a into mongodb:v1 Aug 1, 2024
28 of 33 checks passed
@prestonvasquez prestonvasquez deleted the GODRIVER-3255 branch August 1, 2024 19:21
blink1073 pushed a commit to blink1073/mongo-go-driver that referenced this pull request Aug 1, 2024
blink1073 pushed a commit to blink1073/mongo-go-driver that referenced this pull request Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-1-high High Priority PR for Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants