Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRIVERS-2868 Adjust getMore maxTimeMS Calculation for tailable awaitData Cursors #1749

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

prestonvasquez
Copy link
Contributor

@prestonvasquez prestonvasquez commented Jan 29, 2025

DRIVERS-2868

Please complete the following before merging:

@prestonvasquez prestonvasquez requested a review from a team as a code owner January 29, 2025 22:50
@prestonvasquez prestonvasquez changed the title DRIVERS-2869 Adjust getMore maxTimeMS Calculation for tailable awaitData Cursors DRIVERS-2868 Adjust getMore maxTimeMS Calculation for tailable awaitData Cursors Jan 29, 2025
"name": "iterateOnce",
"object": "tailableCursor",
"arguments": {
"timeoutMS": 50
Copy link
Member

@vbabanin vbabanin Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In timeoutMS is refreshed for getMore if maxAwaitTimeMS is set test case, timeoutMS is set when the cursor is created. I suggest we follow the same approach here and the other test since not all drivers allow setting a timeout for each next() call as an argument - it gets refreshed under the hood and i believe is supposed to behave the same in the drivers which additionally allow to override timeouMS per next() call.

For example, we could set timeoutMS to 150ms and maxAwaitTimeMS to 100ms like this:

{
          "name": "createFindCursor",
          "object": "collection",
          "arguments": {
            "filter": {},
            "cursorType": "tailableAwait",
            "batchSize": 1,
            "timeoutMS": 150
            "maxAwaitTimeMS": 100
          },
          "saveResultAsEntity": "tailableCursor"
        },

Also, we configure a failpoint to block the find command for 60ms and call iterateOnce(). At this point, maxTimeMS on the command should be ≤ 90ms, which is ≤ maxAwaitTimeMS."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbabanin It's not idiomatic for the Go Driver to use the timeoutMS applied to constructing a cursor to each next() call. This would violate user expectations of context.Context. For example,

findCtx, findCancel := context.WithTimeout(context.Background(), 1*time.Second)
defer findCancel()

cur, _ := coll.Find(findCtx, bson.D{}) // User is applying a 1s timeout

// User does not apply a timeoutMS to the context passed to Next, internally 
// applying a timeout here would violate expectations.
for cursor.Next(context.Background()) {
	// ...
}

Tests such as timeoutMS is refreshed for getMore if maxAwaitTimeMS is set are skipped in the Go Driver specifically because timeoutMS is not being applied at either the client or operation level. More than likely we will create a DRIVERS ticket to address this, see GODRIVER-3480 for more info. IIUC, we are likely to suggest putting the same timeout on both constructor and next. Would this work for Java?

    operations:
      - name: createFindCursor
        object: *collection
        arguments:
          filter: {}
          cursorType: tailableAwait
          batchSize: 1
          maxAwaitTimeMS: 50
          timeoutMS: 100
        saveResultAsEntity: &tailableCursor tailableCursor
      # Iterate twice to force a getMore.
      - name: iterateOnce
        object: *tailableCursor
        arguments: 
          timeoutMS: 100
      - name: iterateOnce
        object: *tailableCursor
        arguments: 
          timeoutMS: 100

Copy link
Member

@vbabanin vbabanin Feb 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

summary of discussion with @prestonvasquez:

Currently, the spec mandates that drivers reuse the same timeoutMS specified during cursor creation:

Change stream: Drivers MUST also apply the original timeoutMS value to each next call on the change stream."
Cursors: After the operation has executed, the original timeoutMS value MUST also be applied to each next call on the created cursor."

However, since Python and Go allow timeoutMS overrides per call, we should update the spec to clarify that overriding timeoutMS per next call is permitted, as it is idiomatic to their language. We should create a separate DRIVERS ticket to address that and update other unified tests for consistency to cover a use-case with override per getMore.

Some drivers might not currently expect timeoutMS: 100 argument in iterateOnce, that could lead to failed tests - this is the case for Java.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbabanin For now, I think we should update the unified spec tests to pass per the specifications (i.e. not applying a timeoutMS iteratively). Can you verify that the updated tests pass for Java?

Copy link
Member

@vbabanin vbabanin Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I ran the tests and one of them applying the remaining timeoutMS if it's less than maxAwaitTimeMS fails with state should be: maxAwaitTimeMS must be less than timeoutMS during cursor creation as expected by the spec. I’ve posted a suggestion for handling this in a separate comment: #1749 (comment).

Comment on lines 432 to 433
"maxAwaitTimeMS": 100,
"timeoutMS": 50
Copy link
Member

@vbabanin vbabanin Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By spec, maxAwaitTimeMS must be less than timeoutMS when passed to createFindCursor.

Suggested change
"maxAwaitTimeMS": 100,
"timeoutMS": 50
"maxAwaitTimeMS": 50,
"timeoutMS": 51

If we use this config, timeoutMS will likely be less than maxAwaitTimeMS by the time getMore executes— since there is just a 1ms difference.

Alternatively, we could configure a failpoint to delay the find command execution to lower exhaust timeout so that it becomes less then maxAwaitTimeMS, making the test more robust. This could prevent potential edge cases where fast-executing drivers issue a getMore at 50ms.

Example with the failpoint:
    {
      "description": "apply remaining timeoutMS if less than maxAwaitTimeMS",
      "operations": [
        {
          "name": "failPoint",
          "object": "testRunner",
          "arguments": {
            "client": "failPointClient",
            "failPoint": {
              "configureFailPoint": "failCommand",
              "mode": {
                "times": 1
              },
              "data": {
                "failCommands": [
                  "find"
                ],
                "blockConnection": true,
                "blockTimeMS": 60
              }
            }
          }
        },
        {
          "name": "createFindCursor",
          "object": "collection",
          "arguments": {
            "filter": {},
            "cursorType": "tailableAwait",
            "batchSize": 1,
            "maxAwaitTimeMS": 50,
            "timeoutMS": 100
          },
          "saveResultAsEntity": "tailableCursor"
        },
        {
          "name": "iterateOnce",
          "object": "tailableCursor"
        },
        {
          "name": "iterateOnce",
          "object": "tailableCursor"
        }
      ],
      "expectEvents": [
        {
          "client": "client",
          "events": [
            {
              "commandStartedEvent": {
                "commandName": "find",
                "databaseName": "test"
              }
            },
            {
              "commandStartedEvent": {
                "commandName": "getMore",
                "databaseName": "test",
                "command": {
                  "maxTimeMS": {
                    "$$lte": 50
                  }
                }
              }
            }
          ]
        }
      ]
    }

Copy link
Contributor Author

@prestonvasquez prestonvasquez Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbabanin It's unclear to me how blocking find will reduce the remaining timeout on a call to Next.

# Each next call starts a fresh timeout_ms.
def next(cursor):
	timeout_ms = cursor['timeout_ms']
	# ...

    	while get_more(remaining_timeout_ms):
        	pass # ...

Here is a gist with the updated JSON: https://gist.github.com/prestonvasquez/498ccc00c59a72b367de88272f0a777b

Does this pass in the Java driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbabanin I've updated the unified spec tests with the changes discussed offline.

@prestonvasquez prestonvasquez requested a review from vbabanin March 7, 2025 02:31
Copy link
Member

@vbabanin vbabanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ShaneHarvey
Copy link
Member

@vbabanin can you confirm this test passes in .net on both linux and windows?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants