Check SMTP connection in health check

Question

I have created a web service which can also send emails (using Gmail SMTP). This is deployed using Kubernetes and a health check is run every 30 seconds.

Since a team member changed the login credentials we use for sending emails, a check for the SMTP connection has been added as part of the health check. Now the health check tends to fail with the following error:

Invalid login: 454 4.7.0 Too many login attempts, please try again later. - gsmtp

Should verifying the SMTP connection be part a health check? Or would it be better to just skip this check? Most of the system can still function properly without using SMTP.

Berin Loritsch · Accepted Answer · 2019-07-15T17:01:41.083

In general, it is a good idea to perform smoke tests and confirm your assumptions regularly. However, this is not something that needs to occur every 30 seconds. I am not surprised you are running into 454 4.7.0 Too may login attempts errors.

The policy for checking should allow for successful sending of emails to count toward your last check.

First, your policy

Your application/business has a certain period of time where email can be down and it not constitute a problem. Define the period of time
Have the automatic check test with enough lead time to fix the problem before that time is up.
Count normal use of the service as a check

So for example, let's say your application can go 8 hours without sending email before it's a problem. Let's also say in this scenario you want to budget 3 hours for any last minute fixes. That means you can use 5 hours as your periodic check time (8-3 = 5).

Now let's say the last official check was 4 hours ago, but your application just successfully sent an email. You can safely reset the clock for the next check because you know the SMTP servers are set up correctly. So instead of checking again in 1 hour, you wait for another 5 hours.

Second, keeping track

You'll need a runtime value to keep track of your periodic testing. Key-value stores like Redis are perfect for this use case. You don't need persistence, which makes it really quick to query and update. The actual key/value management system you use doesn't really matter, but how you use it does.

For this to work, you need the following to work:

A key that is used to keep track of the last check that stores the value of a timestamp
That timestamp must be the same timezone for all services that need to reference it. I recommend UTC
The monitoring code will check the value on start-up
- If there is no value, it will do the first check
- If there is a value, but it has expired, the monitoring code will do a check
- If there is a value and it has not expired, the monitoring code will set a timer for the next check
The code that sends the emails will set that value every time it successfully sends an email
The code that checks the connection will
- Check the value to see if it is still valid to test
- If the test is valid to run, make sure the connection is properly closed after the test (if possible send SMTP command QUIT)
- The timer is set again based on the time in the lookup value plus the wait time

I'm trying to outline it without assuming the monitoring code is baked in to the email notification service.

Bottom line is no test is necessary if we just sent an email. There is impact by exercising all your connections, so make sure the impact is not detrimental to your system.

Check SMTP connection in health check

1 Answers1