Thanks, Mark.
Yes, the forum outage was caused by a Discourse update. The admin panel indicated that the update could be applied directly from within the hub, so I went ahead—but that turned out to be a mistake. Somewhere in the process (either due to the update itself or my early attempts to fix things), the Docker container was removed from the droplet, effectively taking the forum offline.
At first, this looked pretty bad. However, thanks to Brahn’s original setup, where the database files were stored outside of the Discourse container, all the data remained intact. That was a huge relief because it meant that, in theory, everything could be restored once I figured out how to rebuild the system properly.
On top of that, I had peace of mind knowing that daily backups to AWS S3 were in place, configured within Discourse itself. So even if the worst had happened, I could have restored the forum from a recent backup.
What Went Wrong?
The biggest issue turned out to be PostgreSQL, but not in the way I originally thought. Discourse installs PostgreSQL 15 inside the Docker container as part of its package build process, meaning the host OS shouldn’t have been a problem.
However, the droplet was still running Ubuntu 16, which is long past end-of-life and lacks modern system libraries like newer versions of glibc. The Discourse package inside the container likely relied on system features that Ubuntu 16 simply didn’t support.
The only way forward was a full system upgrade, which meant:
- Upgrading from Ubuntu 16 → 18 → 20 → 22 (since major versions can’t be skipped).
- Ensuring Docker and all dependencies still worked properly after each upgrade.
- Rebuilding the Discourse container from scratch, making sure it functioned in the new environment.
Why Was This Such a Mess?
Honestly, Discourse shouldn’t have allowed the update in the first place. A well-designed update system should:
- Check the host OS version and refuse to proceed if it’s unsupported.
- Warn in advance if the update requires newer system libraries.
- Provide clear error messages explaining what needs to be done first (e.g., upgrading Ubuntu).
Instead, the update process proceeded blindly, only to fail because the host OS wasn’t up to scratch. It’s frustrating that something so easily avoidable ended up causing such a major disruption.
How I Fixed It
Once I upgraded the droplet to Ubuntu 22, I was able to:
- Rebuild the Discourse container with the updated dependencies.
- Reattach the database to ensure all content was intact.
- Verify that everything was working as expected before bringing the forum back online.
Finally—boom, we were live again!
Lessons Learned & Reassurance
This was a tough lesson in making sure system compatibility is checked before hitting “update.” If you’re running Discourse, learn from my pain—make sure your OS is up to date before attempting an upgrade!
That said, thanks to the daily AWS S3 backups, I always had a fallback if something went wrong. Even though I didn’t need to restore from a backup this time, knowing they were there gave me confidence throughout the recovery process.
Moving forward, I’ll be more cautious about updates and will review the process to prevent similar issues in the future. I’ve also set up a cron job to take snapshots of the droplet every weekend as an additional safeguard. Lesson learned!