-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Routing changes and Traffic Server upgrades #410
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Rather than hopping back to nginx. Also migrate trafficserver rewrite logic to Lua. Also experimenting/debugging trafficserver v7 collapsed connections.
While the behavior might be a bit different from what we had in TrafficServer v5 and the collapsed_connection plugin, that plugin is deprecated (and seems to cause errors if used), and the options around all this have changed a bit. These new tests also cover the connection collapsing behavior in more detail, testing various situations we didn't previously test.
Since Traffic Server has DNS resolving built in, using Traffic Server as the proxy layer connecting to API backends in some ways becomes simpler (since we can rely on all it's built-in functionality for DNS resolving, caching, etc).
Since we're now routing to APIs directly with Traffic Server, switch to a text log, instead of a binary log for easier searching, since this log file might be more useful for debugging now.
- Treat the web-app as an API Backend, to simplify the routing so that everything goes through the same proxy flow, rather than having special routing just for the web-app (particularly as more of the web-app aside from some login things are just APIs). This is mostly taken from the "postgres-lapis" branch, where we had already done this refactoring to simplify the routing to that new version of the web-app. - Also route website backends through trafficserver, since trafficserver handles dynamic DNS lookups with keepalive connections a bit more easily out of the box (and again, just to normalize the flow through the proxy stack). Previously, website backends didn't have keepalive connections enabled (since we were relying on nginx's own proxy_pass workaround to deal with dynamic DNS, which doesn't enable keepalive), so this should also improve website backend performance a bit (and also allow normal HTTP caching too). - Cleanup some of the nginx config files and remove unused pieces. - Remove some of the temporary HTTP headers used before sending requests to API backends. - Tweak trafficserver config to match our existing retry behavior (don't retry API backend requests if connections timeout). - Shift X-Cache header setup into Traffic Server's lua hooks, since this allows for easier and more efficient generation rather than doing this with regexes at the nginx layer. - Set trafficserver's proxy_name setting to eliminate the need to rewrite Via headers at nginx's layer.
The trafficserver approach to setting X-Cache won't work in some edge-cases, so revert back to parsing the Via header for this (but tweak the regex implementation a bit for better optimization).
With the shift to using Traffic Server for DNS resolution and upstream connections, we no longer need all of this custom code at our nginx layer. This was some of the thornier things to deal with (DNS resolution, caching, and configuration of upstreams via the ngx-dyups plugin), so being able to rely on Traffic Server for this functionality allows for some nice cleanup.
- Fix redirects performed by the "redirect_https" setting so they're not subject to redirect rewriting. - Remove unused web_app_matcher middleware. - Rework the https configuration tests to better account for the changes in how the web-app is now routed as an api backend with some configuration options.
Skipping redirect rewriting when the redirect_https setting was merely present wasn't correct, since that skipped rewriting for all responses when this setting was present, even if they came from the backend. Instead, tweak things so the redirect rewriting is only skipped when redirect_https actually triggers a redirect. Also shift how redirect_https triggers the redirect, to treat it as an error, which will short-circuite the rest of the middleware, rather than letting them execute (since in the case of these redirects, we don't need to check for rate limits, etc).
Now that we're using Traffic Server for establishing backend connections, tune it's configuration to better match our existing keepalive settings. - This removes the ability to set "keepalive_connections" on a per server basis, but this ability was never really exposed through the admin interface, so I don't think this is a huge loss. - There's an idle timeout for Traffic Server's keepalive settings, which while different, actually seems preferable. - Improve the "reload" signal to trafficserver to better work (while a SIGHUP should have been equivalent, it didn't seem to pick up changes, so using traffic_ctl seems more reliable).
require_https is normally on by default, but it had been disabled in the test environment. Now that it's at least re-enabled for the web-app backend, update the tests to account for this (which better matches production behavior anyway).
With Traffic Server now proxying directly to the API backend, there have been some subtle shifts in gzip behavior, since the nginx layer behind Traffic Server no longer potentially gzips things. This actually seems more predictable, but adjust our tests, particularly the ones surrounding gzip behavior between the first and second request.
This was due to the changes in the caching helper that makes duplicate requests so that it can be called multiple times within a single test. This fixes it so that it can still be called multiple times, but we have predictable URLs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We've had a few different branches that have been in development for a while. In an attempt to clean things up and start to merge some of the changes in more piece-meal, this pull request integrates some of those changes from the in-progress branches.
This upgrades our bundled version of Traffic Server (the HTTP caching layer) from 5.3 to 7.1. As part of this change, we've made a semi-significant change to how we route API requests internally (although, this internal change shouldn't really be noticeable to API Umbrella users or administrators).
Previously, API requests were routed like this:
With these changes, they are now routed like this:
Basically, we've eliminated the second hop back through nginx to handle the API proxying for requests we're passing along. Instead, we proxy directly to the API backend with Traffic Server.
In addition to simplifying things by removing an extra hop, this change also allows us to simplify our codebase by removing some complicated pieces related to DNS resolution and managing nginx upstreams, where we can better leverage features built into Traffic Server as a replacement. In order to properly support API backends with keepalive connections and with dynamic DNS entries, we previously jumped through several hoops to hook this up inside nginx (it involved ngx_dyups, lua-resty-dns-cache, and custom Lua code to resolve DNS entries, and keep nginx upstreams in sync). Traffic Server has DNS resolution/caching built-in, and also handles keepalive connections for all upstreams automatically, so by leveraging Traffic Server for the upstream proxying, all that code can go away in favor of some much simpler Traffic Server config.
Summary of changes: