Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key Value Watch does not automatically recover after connection lost #1777

Open
brenth-monad opened this issue Jan 15, 2025 · 3 comments
Open
Assignees
Labels
defect Suspected defect such as a bug or regression

Comments

@brenth-monad
Copy link

Observed behavior

If the nats connection is lost while watching a key value the updates channel is closed and not auto recovered when the connection is restored.

All other producers and consumers within the client automatically reconnect as expected.

There is a log message that suggests is should reconnect nats: JetStream system temporarily unavailable: recreating ordered consumer on connection [524] for subscription on "$KV.d6d185fe-1998-4814-8f9c-4386ff8482ec.6ba655f7-423a-47bd-9e3d-8386ccad2767" that was introduced in a pr #1097 from a similar ticket #1094

I've currently worked around this on my side but it would be nice if it was consistent with other methods in this client that handle reconnects gracefully.

Expected behavior

The channel should not be closed when the connection is lost https://github.com/nats-io/nats.go/blob/main/kv.go#L1098 but instead recreate the subscriber and continue to send messages to the channel.

Server and client version

server version: 2.10.24
client version: 1.38.0

Host environment

No response

Steps to reproduce

  1. Start below test
  2. Interrupt nats connection
    • In kubernetes you can scale down the nats statefulset
    • This will start to spam the "channel closed" output in the below example because it's intentionally not exiting the loop
  3. Observe the reconnection log nats: JetStream system temporarily unavailable: recreating ordered consumer on connection [524] for subscription on "$KV.d6d185fe-1998-4814-8f9c-4386ff8482ec.6ba655f7-423a-47bd-9e3d-8386ccad2767"
  4. Reconnect nats
    • Scale up nats statefulset
  5. Observe the connection is never restored
func test() {
	conn, _ := nats.Connect("nats://localhost:4222")
	js, _ := jetstream.New(conn)
	kv, _ := js.CreateKeyValue(context.Background(), jetstream.KeyValueConfig{Bucket: "test-bucket"})
	w, _ := kv.Watch(context.Background(), "test-key")

	// Produce some changes on the key
	go func() {
		for range time.After(time.Second) {
			if _, err := kv.Put(context.Background(), "test-key", []byte(time.Now().String())); err != nil {
				fmt.Printf("failed to put: %s\n", err)
			} else {
				fmt.Println("put successful")
			}
		}
	}()

	for {
		v, ok := <-w.Updates()
		if !ok {
			fmt.Println("channel closed")
			return
		}

		if v != nil {
			fmt.Printf("got value: %s\n", string(v.Value()))
		}
	}
}
@brenth-monad brenth-monad added the defect Suspected defect such as a bug or regression label Jan 15, 2025
@brenth-monad brenth-monad changed the title Watch does not automatically recover after connection lost Key Value Watch does not automatically recover after connection lost Jan 15, 2025
@piotrpio
Copy link
Collaborator

Hello @brenth-monad, thanks for creating the issue. It looks to me like it doesn't really happen on each reconnect (simply running this example on a single server and restarting the server does not cause any problems), but I have an idea of when and why this may be happening. I'll let you know when I have something concrete.

@acormier-maia
Copy link

@brenth-monad Any chance you can share the workaround?

@brenth-monad
Copy link
Author

@acormier-maia All I did was add a retry loop that sets a short 1 second timeout and retries if it gets errors.Is(err, context.DeadlineExceeded). After 3 attempts it gives up and returns the DeadlineExceeded error. Any other error type I return immediately.

It's not a great workaround but my use case can handle the retry delay when it occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect Suspected defect such as a bug or regression
Projects
None yet
Development

No branches or pull requests

3 participants