Brownout on Update Center (updates.jenkins.io): 24 and 25 October 2024
Summary (TL;DR)
Note: this is a follow up of the 26 and 27 September 24 hours brownout.
The service https://updates.jenkins.io will switch its implementation to a new system during 1 day:
-
From Thursday 24 October 2024 from 10:00am UTC until Friday 25 October 2024 10:00am UTC
All Jenkins users are impacted but should not see any functional change.
⚠️ Please, check that your organization respects the advertised DNS TTL or you might be stuck in the brownout longer than expected.
Under the hood, any HTTP request made to this service will be redirected to a mirror close to user locations during these brownouts.
What is the "Update Center"?
Jenkins Update Center is a web server at the core of the Jenkins public infrastructure which distributes the plugins, tool installers, and versions index to all Jenkins servers across the world.
From the installation wizard to regular plugin updates, if you run Jenkins, then you use this service under the hood.
Today, it serves around 50 Tb of data (outbound bandwidth) each month from a single virtual machine on AWS, which costs around $6,000 per month.
In order to sustain this service and improve it, the Jenkins infrastructure team has worked relentlessly during the past years to have a new sustainable implementation for this service.
The new Update Center implementation features a highly available system that redirects user requests to a download mirror close to their location. Additional information is available in the GitHub issue.
Why this Fifth "Brownout"?
Our functional tests and performance tests are meeting our expectations after the initial work and the four brownouts we ran.
However the previous (fourth) brownout raised 2 issues: - We’ve seen (and were reported) HTTP/404 errors due to broken links in some HTML pages (not used a lot) and missing endpoints used for healthchecks in https://github.com/jenkins-infra/helpdesk/issues/4311 - We’ve also seen HTTP/404 errors after 8-9 hours or production usage due to network file share (SMB/CIFS) issues with Apache. The service is self-healing but we are trying to fine tune this: if it fails, then we’ll have to update the Apache architecture to stop using a network file share. See https://github.com/jenkins-infra/helpdesk/issues/4312 for details.
As such, a fifth brownout serves to verify the above (minor) problems are solved under a production workload.
How
Both current and new Update Centers are updated at the same time and serve the same index.
Starting 12 weeks ago, the Jenkins infrastructure has been using the new Update Center (azure.updates.jenkins.io) with a client-side DNS override (updates.jenkins.io
hostname points to this new service).
During this brownout, we’ll simply switch the DNS entry updates.jenkins.io
to this new service and watch for the logs and error rate.
At the end, we’ll switch DNS back to the normal service and then analyze metrics and logs to see how the system behaved.
We are confident the new system will perform as expected.
Please refer to the helpdesk ticket for more information.