Hi, I’m Mathieu and I’m a Software Engineer at PayFit in the HRIS squad. I’ll talk a bit about how, my team and I, manage errors in production.
Different sources of errors 🐛
Features bugs by Customer Teams 🚒
We use a Trello board commonly used by the Customer Success and Support teams to list the issues encountered by our customers. As soon as an issue is added to the board, a product manager qualifies it and decide if it’s a bug that needs to be fixed ASAP. Then they add the tag of the engineering squad in charge of this scope.
To manage and fix the bugs that should be fixed ASAP, we created a role called Superman. The developer taking this role will fix the validated bugs, has the power to assign them to another developer (having ownership of that scope) and should also monitor any errors coming from Sentry.
We made a small app which connects users between Trello and channel/users on Slack so we receive an alert as soon as an activity on any Trello card concerns us.
How can I be Superman? 👩🚒 You can be eligible for the position of Superman if you work at PayFit for at least 4 months (the mean time to properly know the tech stack, its different features and global architecture).
Sentry is a SaaS for error tracking. It reports application errors and notifies any concerned developers on Slack.
Sentry is open source so we can host it directly on our own servers and guarantee the security and privacy of the data we could deal with.
Some Sentry configurations we made to improve our daily lives
A release is a version of the code we deployed on an environment.
- discover errors and regressions introduced with a new release
- determine the commit and the author responsible for it
Implement Sentry release with CircleCI
We added a job to our build workflow to automatically deploy a new release based on the SHA1 of our last commit on the master branch (branch in sync with our production environment)
To make the connection with the apps, we use the hash of the commit at the initialization of Sentry given as an env var by CircleCI:
And now when we have an error, we see:
Slack Notification 🚨
At PayFit, we use Slack for many things. We ended up deciding to use it for error alerts as well! We configured one rule for the production environment so that we can have alerts on a dedicated channel which the Superman monitors.
Sentry propose to define tags, this allow on one look to have key informations on the error. Tags are very important because they allow the Superman to have a minimum of context on the error and will help him to debug locally. You can define tags on the page
Settings -> Tags
We love Redux here on the Tech team and one benefit is to have an overview of the history of the past actions. We developed a small middleware to get this information before sending it up to Sentry:
This lets us have a high level overview on Sentry of what happened.
The middleware serves its purpose, but we will take the time in the future to improve the middleware by partially adding the relevant state of the store.
Issue Owners & assignment ✅
We also created a routing for errors depending on the file it occurred or the current url. With these rules, we can assign the error to a team.
You can create these rules on the page
Settings -> Issue Owners and add some rules like this:
path:../node_modules/@jetlang/* #payrollurl:<https://api.payfit.com/payroll/*> #payroll
This configuration lets us be as fine-grained as we want to be, letting us route errors directly to developers depending on the specific rules added matching their scope.
It’s important to install this kind of process, it solves the problems in production as quickly as possible. This will improve the image that you convey to your customers.
Our process is constantly changing, and is not perfect. Feel free to give us feedback on how you handle this topic in your team.
The Engineering Team is scaling , have a look here if you want to join the PayFit rocket 🚀
-- Le Tyrant Mathieu, Software Engineer @PayFit