-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: assignment to entry in nil map in channelz.(*SubChannel).addChild #8110
Comments
log info. 2025-02-18 23:33:10.287
/go/pkg/mod/[google.golang.org/[email protected]/balancer_wrapper.go:300](http://google.golang.org/[email protected]/balancer_wrapper.go:300) +0x56
2025-02-18 23:33:10.287
created by [google.golang.org/grpc.(*acBalancerWrapper).Connect](http://google.golang.org/grpc.(*acBalancerWrapper).Connect) in goroutine 480
2025-02-18 23:33:10.287
/go/pkg/mod/[google.golang.org/[email protected]/clientconn.go:912](http://google.golang.org/[email protected]/clientconn.go:912) +0x145
2025-02-18 23:33:10.287
[google.golang.org/grpc.(*addrConn).connect(0xc000f24008)](http://google.golang.org/grpc.(*addrConn).connect(0xc000f24008))
2025-02-18 23:33:10.287
/go/pkg/mod/[google.golang.org/[email protected]/clientconn.go:1253](http://google.golang.org/[email protected]/clientconn.go:1253) +0x1cc
2025-02-18 23:33:10.287
[google.golang.org/grpc.(*addrConn).resetTransportAndUnlock(0xc000f24008)](http://google.golang.org/grpc.(*addrConn).resetTransportAndUnlock(0xc000f24008))
2025-02-18 23:33:10.287
/go/pkg/mod/[google.golang.org/[email protected]/clientconn.go:1319](http://google.golang.org/[email protected]/clientconn.go:1319) +0x408
2025-02-18 23:33:10.287
[google.golang.org/grpc.(*addrConn).tryAllAddrs](http://google.golang.org/grpc.(*addrConn).tryAllAddrs)(0xc000f24008, {0x3d61500, 0xc000d2ab40}, {0xc0012bf440, 0x1, 0xc0452c5e00?}, {0xc0db1e7310?, 0xc0f5eeb381?, 0x5aabe20?})
2025-02-18 23:33:10.287
/go/pkg/mod/[google.golang.org/[email protected]/clientconn.go:1372](http://google.golang.org/[email protected]/clientconn.go:1372) +0x2e5
2025-02-18 23:33:10.287
[google.golang.org/grpc.(*addrConn).createTransport](http://google.golang.org/grpc.(*addrConn).createTransport)(_, {_, _}, {{0xc0007a7c0f, 0x22}, {0xc0007a7c0f, 0x22}, 0x0, 0x0, {0x0, ...}}, ...)
2025-02-18 23:33:10.287
/go/pkg/mod/[google.golang.org/[email protected]/internal/transport/transport.go:722](http://google.golang.org/[email protected]/internal/transport/transport.go:722)
2025-02-18 23:33:10.287
[google.golang.org/grpc/internal/transport.NewClientTransport(...)](http://google.golang.org/grpc/internal/transport.NewClientTransport(...))
2025-02-18 23:33:10.287
/go/pkg/mod/[google.golang.org/[email protected]/internal/transport/http2_client.go:359](http://google.golang.org/[email protected]/internal/transport/http2_client.go:359) +0x1771
2025-02-18 23:33:10.287
[google.golang.org/grpc/internal/transport.newHTTP2Client](http://google.golang.org/grpc/internal/transport.newHTTP2Client)({_, _}, {_, _}, {{0xc0007a7c0f, 0x22}, {0xc0007a7c0f, 0x22}, 0x0, 0x0, ...}, ...)
2025-02-18 23:33:10.287
/go/pkg/mod/[google.golang.org/[email protected]/internal/channelz/funcs.go:193](http://google.golang.org/[email protected]/internal/channelz/funcs.go:193) +0x48
2025-02-18 23:33:10.287
[google.golang.org/grpc/internal/channelz.RegisterSocket(0xc102e1fd10)](http://google.golang.org/grpc/internal/channelz.RegisterSocket(0xc102e1fd10))
2025-02-18 23:33:10.287
/go/pkg/mod/[google.golang.org/[email protected]/internal/channelz/channelmap.go:116](http://google.golang.org/[email protected]/internal/channelz/channelmap.go:116) +0x1d9
2025-02-18 23:33:10.287
[google.golang.org/grpc/internal/channelz.(*channelMap).addSocket](http://google.golang.org/grpc/internal/channelz.(*channelMap).addSocket)(0x5aab080, 0xc102e1fd10)
2025-02-18 23:33:10.287
/go/pkg/mod/[google.golang.org/[email protected]/internal/channelz/subchannel.go:64](http://google.golang.org/[email protected]/internal/channelz/subchannel.go:64) +0x9f
2025-02-18 23:33:10.287
[google.golang.org/grpc/internal/channelz.(*SubChannel).addChild](http://google.golang.org/grpc/internal/channelz.(*SubChannel).addChild)(0x2ed4280?, 0x8fc, {0x3d743e0, 0xc102e1fd10?})
2025-02-18 23:33:10.287
goroutine 2220205 [running]:
2025-02-18 23:33:10.287
2025-02-18 23:33:10.287
panic: assignment to entry in nil map |
The grpc-go/internal/channelz/funcs.go Lines 154 to 158 in 3f95b38
This means that channelz was turned off when the subchannel was created. The panic happens when registering a socket for the SubChannel. This code path should get triggered only when channelz is turned on. grpc-go/internal/channelz/funcs.go Lines 192 to 194 in 3f95b38
Theoretically, this bug should happen when channelz was toggled b/w subchannel creation and transport creation. Trying to repro this by inserting delays in test code. |
I have a repro for this here: https://github.com/arjan-bal/grpc-go/pull/13/files Steps to repro:
Instead of step 2, it should also be possible to have the same Subchannel re-connect after channelz is turned on causing the panic. This would happen more commonly than step 2. |
From what I can tell, channelz can only be enabled at init time by importing the channelz service grpc-go/channelz/service/service.go Lines 33 to 35 in 3f95b38
So this situation should not be possible in non-test code. |
@Cluas I wanted to check if you've found a way (e.g., forking the code, using linkname, etc.) to toggle Channelz after initialization? |
yes. import (
channelz "google.golang.org/grpc/channelz/grpc_channelz_v1"
_ "google.golang.org/grpc/channelz/service"
)
//go:embed templates
var templates embed.FS
var (
headerTemplate = parseTemplate("header.tmpl")
channelsTemplate = parseTemplate("channels.tmpl")
subChannelTemplate = parseTemplate("sub_channels.tmpl")
channelTemplate = parseTemplate("channel.tmpl")
serversTemplate = parseTemplate("servers.tmpl")
serverTemplate = parseTemplate("server.tmpl")
socketTemplate = parseTemplate("socket.tmpl")
footerTemplate = parseTemplate("footer.tmpl")
)
//go:linkname newCZServer google.golang.org/grpc/channelz/service.newCZServer
func newCZServer() channelz.ChannelzServer
func init() {
handler := channelzHandler{
server: newCZServer(),
}
http.HandleFunc("/runtime/channelz/", handler.Index)
http.HandleFunc("/runtime/channelz/channels/", handler.Channels)
http.HandleFunc("/runtime/channelz/channel/", handler.Channel)
http.HandleFunc("/runtime/channelz/subchannel/", handler.Subchannel)
http.HandleFunc("/runtime/channelz/server/", handler.Server)
http.HandleFunc("/runtime/channelz/socket/", handler.Socket)
} |
The snippet has imported the service If not, are you creating gRPC clients in init functions before channelz is enabled? The fix here would be to import |
We didn't get an error at startup. Instead, an error occurred suddenly during the service operation. |
Moreover, the triggering condition is that the upstream is a gRPC stream. When the upstream service restarts and the connection is disconnected, when the service here attempts to reconnect, a panic is reported. I wonder if we can provide protection on the program side. At least, don't panic. |
This issue can happen when a subchannel created with channelz // Channelz off.
cc.Connect()
testutils.AwaitState(ctx, t, cc, connectivity.Ready)
// SubChannel fails
backends.backends[0].lis.Stop()
testutils.AwaitState(ctx, t, cc, connectivity.Idle)
channelz.TurnOn()
// Attempt to reconnect
backends.backends[0].lis.Restart()
cc.Connect() // This will trigger the panic
time.Sleep(time.Second) |
This is not a condition that gRPC clients expect to be in. The reason for enabling ChannelZ in an init function is to allow gRPC to assume it will not change. If the code ends up in such a situation where the enabling status changed, the program state may become inconsistent, making it unclear how to safely recover. If the code tries to suppress the issue, it may lead to undefined behaviour later on. |
The simplest fix would be to import |
There is a sudden panic during runtime. According to what you said above, is it only possible that this occurs during startup? As long as the service starts normally, it must have been imported. |
The situation is as follows: package outer
import (
"inner1"
"inner2"
)
func init(){} package inner1
import (
"google.golang.org/grpc" // grpc doesn't import channelz automatically
)
func init() {
cc1 := grpc.NewClient()
cc1.Connect()
} package inner2
import (
_ "google.golang.org/grpc/channelz/service" // enables channelz
)
func init() {} Dependency graph
The order of init function execution is as follows:
When innner1 creates a client, cc1 runs for a couple of hours. Now the connection for cc1 breaks, maybe due to server restart. cc1 will attempt to connect again. This time |
There are a couple of ways to prevent this issue:
|
This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
Closing assuming the query has been answered. Feel free to re-open if this isn't the case. |
We have misidentified the direction of the problem. Closing channelz can indeed solve the problem, but the current problem does not lie in the initialization order of channlez. Even if we initialize channelz in main, we will still encounter this problem. The root of the problem is that we directly call func (ccb *ccBalancerWrapper) NewSubConn without having func (ccb *ccBalancerWrapper) NewSubConn. The former initializes the map that causes panic. I think this may be an implementation problem of the loadbalancer. |
#7663 This PR may solve this problem. Let's upgrade the version and test it. |
Hi @Cluas, #7663 deals with a race condition that happens when the message about |
This is unclear to me. What do you mean by calling
After reading through the code, I can't see how this panic can happen without channelz being initially off and then being turned on later. Can you provide a way for me to reproduce the issue locally? Without a repro, I don't know where to look next. |
Our case is that channelz was initially turned on and running normally. This panic suddenly occurred during the service operation. Moreover, this case cannot be reproduced locally, occasional but it is highly related to the restart of upstream services. The panic log also shows that channelz is in an open state. What I can ensure is that channelz was enabled from the very beginning. func (g *grpcServer) initGrpcServer() {
maxMsgSize := g.getMaxMsgSize()
gopts := []grpc.ServerOption{
grpc.MaxRecvMsgSize(maxMsgSize),
grpc.MaxSendMsgSize(maxMsgSize),
grpc.UnknownServiceHandler(g.handler),
}
if creds := g.getCredentials(); creds != nil {
gopts = append(gopts, grpc.Creds(creds))
}
if opts := g.getGrpcOptions(); opts != nil {
gopts = append(gopts, opts...)
}
g.srv = grpc.NewServer(gopts...)
healthgrpc.RegisterHealthServer(g.srv, health.NewServer())
// register reflection service
g.reflectionServer = RegisterReflection(g.srv, g.handlers)
// register channelz service
channelzservice.RegisterChannelzServiceToServer(g.srv)
} Our grpc server can handle requests normally. Then the incident occurred when we had a grpc client that connected to a configuration distribution service through a stream. When the panic occurred, it was precisely when the configuration distribution service was restarted, resulting in an EOF for the connection, and when the client initiated a reconnection. // Create connection manager
connectFunc := func(ctx context.Context) (grpc.ClientStream, error) {
connCtx, connSpan := tracer.Start(ctx, "Invoke Watch Stream")
defer connSpan.End()
stream, err := wc.cli.invokeWatch(connCtx, wc.req, grpc.WaitForReady(true))
if err != nil {
connSpan.RecordError(err)
return nil, fmt.Errorf("failed to invoke watch: %w", err)
}
// Also update old fields to ensure compatibility
wc.rwMu.Lock()
wc.stream = stream
cancel := func() {
select {
case <-ctx.Done():
default:
// If context is not closed, we do nothing
// Actual cancellation will be handled by connectionManager
}
}
wc.cancel = cancel
wc.rwMu.Unlock()
return stream, nil
} |
I read the golang docs about the order of execution of package init function: https://go.dev/ref/spec#Program_initialization
This means that if a package with a lexicographically smaller name than "google.golang.org/grpc/channelz/service" is present in the import list, it will be initialized first. I have two questions to check if this could be happenning:
cc := func() grpc.ClientConn {
return grpc.NewClient("some-target")
}()
|
I discussed this issue with other maintainers who mentioned that creating gRPC channels during or before package initialization is incorrect. gRPC and libraries that use gRPC perform actions such as registering LB polices and resolvers at init time. These libraries or gRPC may fail in unexpected ways if gRPC channels are created before they initialize. |
First, our service is running normally, indicating that |
Can you check the answer to the following question also?
|
we don't have this situation. |
I see where the confusion might be. As I mentioned in this comment, the panic occurs specifically when a subchannel reconnects, not at startup. This aligns exactly with the situation you’re describing. From my analysis, the code should never panic if Channelz is enabled at the time of ClientConn creation. If this were a common issue, we would have likely seen similar reports over the past couple of years, given that the code has been stable for a long time. The key question we need to address is how a ClientConn could be created before Channelz's init function runs. Understanding this will help us identify a fix. |
In #8110 (comment), there is a function named |
When we actually encounter an error, it is not this code. I just illustrated a similar situation. We must figure out one point. The error occurs during reconnection, not during connection. If it is a connection, I can understand that it is related to the initialization order. But this is a reconnection. |
The panic occurs at re-connection time because the ClientConn creation was done assuming channelz is off. Quoting from the previous comment.
|
I understand what you mean. I will ask relevant colleagues for confirmation. |
I have confirmed with them that the import of channelz is before the initialization of the client. |
NOTE: if you are reporting is a potential security vulnerability or a crash,
please follow our CVE process at
https://github.com/grpc/proposal/blob/master/P4-grpc-cve-process.md instead of
filing an issue here.
Please see the FAQ in our main README.md, then answer the questions below
before submitting your issue.
What version of gRPC are you using?
v1.67.1
What version of Go are you using (
go version
)?go 1.22
What operating system (Linux, Windows, …) and version?
linux amd64
What did you do?
If possible, provide a recipe for reproducing the error.
This is an occasional bug that occurs when an upstream service connected using stream restarts.
What did you expect to see?
no panic. Because this is a separate goroutine, it directly led to a large number of restarts of prod services(So what is the cause of the connection abnormality?).
What did you see instead?
painc
The text was updated successfully, but these errors were encountered: