9

Like many websites, we use email addresses as user identifiers for logins.

RFC 5321 [#2.3.11] states that the local parts of email addresses:

MUST be interpreted and assigned semantics only by the host specified in the domain part of the address

This means that applications processing email addresses can't second guess, for example, the way that GMail will ignore dots when determining usernames. It also means that applications must treat email address local-parts as case sensitive, since that's how the local part is specified.

However users are stupid, and in a recent case a user who had signed up with an uppercase email address (and had successfully received and clicked the verification email) found they were unable to log in because they were now using the lowercase format of their email address.

I've never come across an email server which enforced case-sensitivity on its inboxes. I don't doubt that some exist somewhere, but I'm questioning whether there isn't a bigger benefit to dropping case-sensitivity for our email usernames than there are problems keeping it.

I wouldn't go so far as to emulate GMail's dot processing, or any kind of plus-addressing, because those don't seem to be as ubiquitous or as automatically-assumed as case-sensitivity. However, I'm aware it's not a black and white scenario so I'm interested if anyone knows how widespread case-sensitive email addresses actually are.

Gareth
  • 5,092
  • 1
  • 17
  • 13
  • 5
    Two answers already from the same RFC. Next time, make sure you read the whole thing :) – yannis Jan 04 '12 at 12:53
  • There are two answers to this question. Don't use email address as the username. Use the same string manipulation on both the user's input and the value stored in the database, if they don't match at that point, then they are not equal. – Ramhound Jan 04 '12 at 14:40
  • Yahoo used to have case sensitive user names in the really, really old days. Like mid 90s. Made hacking yahoo games lots of fun . . . – Wyatt Barnett Jan 04 '12 at 15:15
  • Why not just enforce case insensitivity on your user identifiers by converting them to an standard internal case during login? Your database can still store the case sensitive e-mail address for when you need to e-mail the user (for password recovery etc.). It would seem to be *exceedingly unlikely* that you would get two different users whose e-mail addresses only differ by the case of their local part. – Mark Booth Jan 04 '12 at 16:16
  • 1
    @YannisRizos I did read it, but a SHOULD is always trumped in RFCs by a MUST. The fact that hosts SHOULD do something means that they *might* not and I just wanted reassurance that, in reality, the SHOULD is a bit more widely adopted than its definition implies. – Gareth Jan 04 '12 at 17:43
  • "Like many websites, we use email addresses as user identifiers for logins." - No, you do not, and neither do any of those other websites. What you use are login identifiers which are _assigned based on_ the user-provided email address. RFC 5321 is inapplicable to this case (though it still applies to when you are sending email to it obviously). – Random832 Jan 04 '12 at 17:53
  • @YannisRizos Er, the local part is part of the email address; AAA@example.com is therefore a different email address from aaa@example.com . – Random832 Jan 04 '12 at 17:55
  • @YannisRizos Huh? I'm talking about the substring that comes before "@" in the address. That is what "local part" means, nothing more nothing less. And as for the other comment, my point was that it's not an email address, it's a user identifier - it just _looks like_ an email address. – Random832 Jan 04 '12 at 18:33
  • @Random832 Right, I failed to read the `RFC 5321 is inapplicable to this case` part of your comment, I was referring to smtp local parts, in contrast with what users think of email. – yannis Jan 04 '12 at 18:56
  • As @Random832 points out the SMTP RFC is only applicable if you are building an STMP server, from a host's perspective the MUST you're referring to doesn't apply. The other quotes deal with the host's perspective, so you should treat them as MUST. – yannis Jan 04 '12 at 18:57

3 Answers3

19

https://www.rfc-editor.org/rfc/rfc5321#page-42 (emphasis added):

While the above definition for Local-part is relatively permissive, for maximum interoperability, a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form or where the Local-part is case-sensitive.

The rfc discourages case-sensitivity. Also, I personally have never seen a host with case-sensitive local parts.

Case-insensitive email addresses are a de-facto standard.

Therefore, I think you are correct to say there is bigger benefit to dropping case-sensitivity for your email usernames than there are problems keeping it.

quentin-starin
  • 5,800
  • 27
  • 26
6

What you should be doing is using the email address as provided for sending emails and transforming it into something less error-prone when using it as an account identifier or at least using it as a fallback when you don't find an exact match.

This would be a very rough analogue to what the Soundex algorithm does for words or names in English by removing the things that create ambiguities. For example, you might convert the entire address to lowercase and remove subaddresses and non-alphanumeric symbols from the local part (e.g., Lance.Boyle+sometag@Example.com would strip down to lanceboyle@example.com). Applying the same transformation to any address used during login would get you a match for more variants, and the user is none the wiser because any other use of the address would be the as-provided version.

From a technical standpoint, this does make your logins marginally easier to brute force. If that's a concern, you can always require an exact match for logins and provide a "forgot my login or password" option that's tolerant when looking up the account since any email you send will be to the "right" address.

If an address like the one above makes it all the way through sign-up and verification, I don't think it would be unreasonable to reject others that transform the same way as duplicates. I'd posit that any site that assigns local addresses like Lance.Boyle, LanceBoyle, lance.boyle and lanceboyle and treats them as different might be the source of other kinds of trouble.

Blrfl
  • 20,235
  • 2
  • 49
  • 75
  • I'm not sure I'd go so far as to strip the dots - I think I'd stick to being case insensitive when checking logins - but fundamentally this is what I would suggest too. – Murph Jan 04 '12 at 16:32
  • 1
    I definitely wouldn't strip the dots. Googling finds people named, for instance "Robert Oot" and "Richard Oot", they could well have email addresses like "r.oot@somedomain.com". Strip the dots out of that and you could well have a conflict. :-) – Carson63000 Jan 05 '12 at 04:45
  • That particular conflict isn't likely since the superuser is probably not out signing up for accounts on web sites, but you have a point. One possibility would be to only strip dots from domains where it's known not to matter and might be the source of a lot of users, like `gmail.com`. – Blrfl Jan 05 '12 at 15:04
5

Paragraph 2.4. General Syntax Principles and Transaction Model of RFC 5321 states (emphasis mine):

Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. In particular, for some hosts, the user "smith" is different from the user "Smith". However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged. Mailbox domains follow normal DNS rules and are hence not case sensitive.

AFAIK all popular mail hosts avoid case sensitive emails, and you should too. I haven't used a service which assumed case sensitivity for emails, and I hate to be surprised.

yannis
  • 39,547
  • 40
  • 183
  • 216