Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job timeout applied after middleware #752

Open
bgentry opened this issue Feb 10, 2025 · 3 comments
Open

Job timeout applied after middleware #752

bgentry opened this issue Feb 10, 2025 · 3 comments

Comments

@bgentry
Copy link
Contributor

bgentry commented Feb 10, 2025

While reading through the executor for another change, I realized that as of #632 / #584 we're applying the job timeout after stepping through all the middleware:

river/job_executor.go

Lines 209 to 227 in fc667eb

doInner := func(ctx context.Context) error {
jobTimeout := e.WorkUnit.Timeout()
if jobTimeout == 0 {
jobTimeout = e.ClientJobTimeout
}
// No timeout if a -1 was specified.
if jobTimeout > 0 {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(ctx, jobTimeout)
defer cancel()
}
if err := e.WorkUnit.Work(ctx); err != nil {
return err
}
return nil
}

I'm not sure if this was an intentional decision and I couldn't find any conversations where it was mentioned, so I thought we should at least talk about it. Do you think this is the right place for it, or should the timeout happen before the middleware stack gets called? I was thinking of a worst case scenario where a middleware got stuck or took awhile on a blocking operation and the job ended up running far longer than either the job's timeout or the client's default timeout.

@brandur
Copy link
Contributor

brandur commented Feb 12, 2025

Hmm, so I didn't consider it one way or the other super deeply admittedly, but including the middleware in the timeout doesn't seem super obviously more or less correct to me.

An argument could be made that that when setting work timeouts, users are trying to set a timeout specifically on their code as opposed to River's internal code, which to their eye would be an implementation detail. So similarly how the timeout doesn't include the time it takes to complete a job, it doesn't include middleware, which is largely made up of other internal stuff.

@bgentry
Copy link
Contributor Author

bgentry commented Feb 12, 2025

The risk I see is that as it currently is, the middleware have no timeout whatsoever applied to them. Beyond that I don't think it matters a lot either way, but I do worry about the risk of a middleware getting stuck and the job just running forever despite whatever timeouts are configured.

@brandur
Copy link
Contributor

brandur commented Feb 14, 2025

Yeah I suppose that could be a danger depending on what you're trying to do in the middleware. Probably a separate timeout like you'd find in http.Transport (each phase of a request can be configured separately) would be as defensible as mixing in the same timeout being used for the job's work phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants