I've seen some pretty darned restrictive 'chat-systems' in kids games, that make it nigh-on-impossible to get much of a message across whilst typing.
So, I looked at some ideas, and there where two main ones, the whitelist and the blacklist:
- Whitelisted words are the only words you can use.
- Certainly safe language without a doubt.
- Very, very restricted conversation.
- Large data needed for freer conversation.
- Blacklisted words are the only words you can't use.
- Easy detection of specific words.
- May not be a comprehensive list.
- Doesn't deal with 5tr4n9e w4y5 0f typ1n9.
Clearly, the whitelist is not the way to go for easy, free conversation, but the blacklist could easily be bypassed.
Furthermore, how should disallowed words be dealt with? For instance, If I were to send the string "You're truly elucidating!"
through the whitelist, which probably wouldn't have a benign but very complex word such as elucidate in it's database, and it instead broadcast "You're truly ***********!
, I doubt many people would take it as a compliment. The blacklist has a similar problem - should I blank out curse words, or just entirely prevent such a message fro being sent?
So: What method should I use to detect and handle obcene language in user input without restricting conversation, but keeping things 100% safe?
As a sidenote, an API for a dictionary that marks out obscene language would be useful for both a whitelist and a blacklist.