Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc的keepalive机制升级 #8130

Open
towerWei opened this issue Feb 27, 2025 · 5 comments
Open

grpc的keepalive机制升级 #8130

towerWei opened this issue Feb 27, 2025 · 5 comments
Assignees
Labels
Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. Status: Requires Reporter Clarification Type: Feature New features or improvements in behavior

Comments

@towerWei
Copy link

Please see the FAQ in our main README.md before submitting your issue.

Use case(s) - what problem will this feature solve?

在使用grpc的过程当中发现,客户端keepalive机制配置后不生效,并没有按照我设置的间隔去进行ping,查看源码后发现在keepalive的底层,ping之前会检查上次正常发送消息的时间,如果小于设置的间隔就不会进行ping,我们这边想通过ping淘汰一些高延迟的链接(因为目前只有ping超时后高延迟的链接才会关闭并且重新建立)

Proposed Solution

能否加个开关,业务根据自己的需要来判断是否需要跟正常发送的消息时间做比对

Alternatives Considered

Additional Context

@towerWei towerWei added the Type: Feature New features or improvements in behavior label Feb 27, 2025
@dfawley
Copy link
Member

dfawley commented Feb 27, 2025

Hi @towerWei,

gRPC sets TCP_USER_TIMEOUT based on the keepalive timeout setting on all clients and servers (1, 2). This means all writes to the network effectively act like the keepalive ping. So, I think this kind of an option should not be required.

(1): client:

if kp.Time != infinity {
if err = isyscall.SetTCPUserTimeout(conn, kp.Timeout); err != nil {
return nil, connectionErrorf(false, err, "transport: failed to set TCP_USER_TIMEOUT: %v", err)
}
keepaliveEnabled = true
}

(2): server:

if kp.Time != infinity {
if err = syscall.SetTCPUserTimeout(rawConn, kp.Timeout); err != nil {
return nil, connectionErrorf(false, err, "transport: failed to set TCP_USER_TIMEOUT: %v", err)
}
}

Please let us know if you have any other questions about this. Thanks!

@towerWei
Copy link
Author

towerWei commented Feb 28, 2025

这样确实可以实现,但是就把是服务处理的慢还是因为网络延迟混淆了,有些情况下某些服务确实比较慢这个业务侧可以接受,我这边只想检测网络延迟的情况,我理解因为keepalive不涉及业务是最能体现出来的 @dfawley

@towerWei towerWei reopened this Feb 28, 2025
@towerWei towerWei reopened this Feb 28, 2025
@dfawley
Copy link
Member

dfawley commented Feb 28, 2025

@towerWei

TCP_USER_TIMEOUT is based on the TCP ACK and would not be affected by the server's speed. So I believe this should be suitable for your needs.

@towerWei
Copy link
Author

towerWei commented Mar 4, 2025

抱歉这两天有点其他业务忙,刚抽出时间,但是在tcp通信的底层这个选项应该是用到了所有的消息传递上吧,而不是简单的ping,那岂不是还是影响了正常的处理时长较长的业务 @dfawley

@dfawley
Copy link
Member

dfawley commented Mar 4, 2025

If anything, the TCP_USER_TIMEOUT should have even less dependence on the application than a ping. A ping requires our library to run and process the ping and manually send an ack back. The kernel would be responsible for handling TCP_USER_TIMEOUT. I believe this should be enough for your needs.

Are you seeing any real problems at this time?

@dfawley dfawley assigned towerWei and unassigned dfawley Mar 4, 2025
@purnesh42H purnesh42H added the Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. label Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. Status: Requires Reporter Clarification Type: Feature New features or improvements in behavior
Projects
None yet
Development

No branches or pull requests

4 participants