7

There are now lots of WYSIWYG editors, however whenever we use one on a CMS-based website we consistently have issues.

The biggest being users pasting content from Word or other online sources and all the various formatting rules being added in "behind the scenes".

How do you deal with these editors on a live production website?

I love Markdown, however its target market is most definitely the tech industry.

Dan McGrath
  • 11,163
  • 6
  • 55
  • 81
LiamB
  • 658
  • 1
  • 7
  • 16

6 Answers6

9

Pessimistically, I don't think there is or would be a solution which will replace by-hand typing of HTML code without creating a mess. Look at WYSIWYG software products: for years, their advertisement tells us that finally, the new application creates a very clean code; in reality, it was crap years ago, it's still crap.

For WYSIWYG online editors (like in CMS), it's nearly the same thing, except that in those cases, I believe that they don't even care about clean code.


It seems that WYSIWYG editors are good for some sorts of things and not for others; even if there is a demand from the users, they will never be. They are good at providing a very intuitive interface for people who don't care about HTML, and don't even know what it is. It makes it more difficult to adopt them on websites which requires to be optimized, etc.

WYSIWYG editors with more features will have more difficulties to produce clean content. It's nearly the same thing for beginner programmers: they will easily create a W3C valid HTML page which contains only titles, bold and italic text, but will probably fail to make something clean and valid when trying to implement more features and interactivity.

It means that a WYSIWYG editor with only few options will be the best choice if you care about clean code and the issues of copy-paste. For example, removing the ability to change the font and to set freely the font size (except through titles) can remove problems with different fonts copy-pasted from Word (if the intent is to have an uniform font in every message).

The editor of this website is a good example, except that it has the same problem for non-IT people than PHPBB: when you click on a button, instead of seeing live what happens, you see some markdown stuff added to your plain text message.

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • 1
    +1 for 'I don't think there is or would be a solution which will replace by-hand typing of HTML code without creating a mess.' I think that we may see better and better code generation, but computers (at least, as we know them) will never be able to match the optimizing power of the human brain. – Michael K Jan 03 '11 at 15:08
5

Limit functionalities, just like the one I'm using right now to write this answer.

Recent WYSIWYGs provide that ability, including automatic formatting stripping on paste.

4

I use formatting stripping whenever I incorporate a WYSIWYG interface into a web site- given that the site will have style of its own, I tend to want to restrict what a user can add to it, preferring to have the details of the style deriving from the CSS. I have a bit of code that more or less strips attributes off tags ( with a few exceptions ) strips out tags that repeat themselves more than twice and a few tags that will only cause trouble - font, anyone? - and I end up using that frequently. It works well because it gives users the impression of power without giving them enough rope to hang themselves.

glenatron
  • 8,729
  • 3
  • 29
  • 43
3

I restrict the editor to the simplest options (like those used to post at stackexchange), and run the result through HTMLPurifier (php) on the server.
Since the actual output might be different to what the user created, you must somehow show him the result. You might even want to store two versions, clean and unclean.

HTMLPurifier guarantees valid html, but the result might still not be what the user desired, or totally clean markup (as in minimal or ideal).

Gipsy King
  • 139
  • 4
2

I needed a form-based front end for users to create and update XML content. I did this by breaking the major XML sections into separate input or WYSIWYG textareas. Most of the in-browser editors seem to rely on the same basic contentEditable set of functions, so I just use the lightest-weight one that I've found so far--Whizzywing. I use PHP's strip_tags() on the input fields and I pass the rich text content through PHP's tidy tool with the output-xml option in effect (among others for removing spurious style attributes and normalizing the semantic phrases). I have also used the htmLawed library with equal success. Tidy has a word-2000 cleanup option that may be useful as well.

With well-formed XML as the output from this step, I then use an XSLT transform to finish the cleanup. This is the stage where you can apply business rules to your process... the fault of both editors and the various purifiers is that they assume direct reuse of the HTML content, whereas if you need to apply any kind of QA check or business rule compliance check on the content, these checks are best done by writing appropriate XSL templates that can reorder or retag the content to fit the rules. This is effectively the benefit you get from using Markdown-to-HTML transforms, but this process gives users the more appealing WYSIWYG front end plus Word-pasting as a benefit.

These steps go a long way toward cleaning up most things that can be pasted from Word into HTML while still maintaining a directed order for the eventually-saved result. It is very similar to the process used by the iFixit web site for rich text content authoring. They produce oManual XML format at the end of the save step; I'm producing a simple form of DITA, FWIW.

Don Day
  • 137
  • 6
2

We uses a WYSIWYG web editor in our intranet Quote System.

Instead of limiting functionality, like others mentioned, I limited the users and warned them about it's use. Also, The output of those HTML entry is restricted to PDF, so I added a preview button so they could see if WYSIWYrG (what you see is what you really get).

They are not programmers, not even tech swavy, but they learned to play with the beast without being hurt. There's a "Word clean" toolbar button in the editor, but they often got better result by cutting and pasting into notepad prior to pasting in into the editor and then reformatting it properly.

It's a great tool to do simple things. And, they are realizing it. There a HTML tab in this editor, some of them started to play with it to fix some issues. The necessity is a great motivator.

Bad HTML, yes, but if it produce a good final output, who cares?

DavRob60
  • 3,286
  • 2
  • 31
  • 39