Status
The Outage is currently remediated, not resolved. Active work on resolution in progress.
Summary
All customers were impacted in not being able to develop and deliver new app versions to their end users. End users were not impacted.
Root Cause: Fontaweome invalidated Build.One authentication token due to excessive bandwidth usage, which resulted in Gitpod workspace starts and CI builds to fail.
Trigger: We were alerted by XXX on the issue by Customer XZ
The issue was resolved by providing a new authentication token.
Consumptions was high both for Gitpod prebuilds and CI/CD loops.
All other NPM packages use the public registry, so no further action on those. npm.org we rely on for uptime/reliability.
FAQ
- Could we make the the fontawesome a non hard coded dependency? - The NPM dependencies are general dependency as they need to be downloaded during building the code, the mitigation is to put 3rd party dependencies on our own server (CloudSmith). Fontawesome was the only 3rd party dependency, which we do not download from https://registry.npmjs.org/. The latter only enforces a rate limit (bandwidth being lowered) but not a block on hosts trying to download packages. If we run into rate limits is currently not known and warrants further documentation, however no specific license or token is needed and limits are on a per host basis.
Action Items
- Wolfgang - In Progress: Finish follow up with Fontawesome what triggered their action at exactly at that point in time. Complete Fontawesome Timeline Details.
- Open: Dataforce to change their WebUI build.
- Open: Customer CI jobs to change to use CloudSmith, the caching is in the script itself.
- Open: Should all customers have their own token? Confirm we ask customers to buy a fontawesome license or buy one on their behalf. Confirm Legal/License dimension.
- Open: Check Build scripts to verify the CloudSmith downloads on frequency and time by which IP and location.
- Open: To assure each customer has their own CloudSmith token.
Completed
- Changed WebUI builds to use CircleCI
- Gitpod prebuilds have no caching mechanism, but Gitpod will be fetching from CloudSmith once customer pipelines switched.
- Consider long term: Could we use a more lightweight icon library?
Timeline
Timeline in UTC
01:24 CI/CD automation to push nightly release to OSIVnet failed, token invalidated by Fontawesome, hard limit 80GB hit
06:30 raised in #dev channel by Max
11:38 Max raised in #product if we can get a new token for a new plan
12:04 Max created a new token and stored in Secureable
12:14 Max validated this solve our need with Optiwork
12:26 Max updates the internal CircleCI Project
12:32 Max resolved the issue for the following customers, and shares the call out that Gidpod prebuilds have to be rerun if issue persists still.
Update, I've adjusted the token here
- Igus CircleCI project
- Igus gitpod project
- Optiwork CircleCI project
- Optiwork gitpod project
- Hydro gitpod project
13:23 Atul confirms he updated the following workspaces
- portal
- swat
- offer
- doj
- cstrainingdeve
- moj2
Supporting Information
Technical detail, the following error is seen in Gitpod start or CI
[2/4] Fetching packages...
error An unexpected error occurred: "https://npm.fontawesome.com/@fortawesome/fontawesome-pro/-/6.4.0/fontawesome-pro-6.4.0.tgz: Request failed \"401 Unauthorized\"".
info If you think this is a bug, please open a bug report with the information provided in "/home/circleci/project/tmp/frameworks/swat-webui/yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/install for documentation about this command.
Exited with code exit status 1
Open work
Actions Taken
- new token used, existing token restored
- changed download to be fetched from CloudSmith
- Post Mortem 2023-05-26 completed
- CSO Process introduced