103

So what happened to XHTML5?

http://www.w3.org/TR/html5/

That page is a draft for both xhtml5 and html5? So there's no difference between these doctypes?

Michael Borgwardt
  • 51,037
  • 13
  • 124
  • 176
W3C
  • 1,041
  • 2
  • 7
  • 4
  • 2
    It appears as of 2014-12-08 that W3C is still working on the standard. http://www.w3.org/TR/html5/ and http://www.w3.org/TR/html5/the-xhtml-syntax.html were updated 2014-10-28. – Russel Winder Dec 08 '14 at 14:27
  • 7
    Nowadays, 2015's, **XHTML is a W3C standard!**... See [updated discussion](http://programmers.stackexchange.com/a/272619/84349) – Peter Krauss Feb 09 '15 at 15:01

4 Answers4

87

In 2012 at the moment of writing, it was clear that W3C decided to abandon XHTML for HTML 5. This decision was motivated by several reasons:

  • Only few people were really interested in XHTML. Most of the websites were written in plain HTML.

  • Even fewer really understood what XHTML is about and how to use it. Too many websites which pretended to serve XHTML used wrong headers, instead of Content-Type: application/xhtml+xml.

  • Even when you fully understand what XHTML is and what must be the headers, the thing is really tricky with some crappy browsers not accepting/supporting application/xhtml+xml content type. This meant that you had to change the header according to the browser.

  • The XML part of XHTML also caused some weird situations the developers had to solve. One is INVALID_STATE_ERR: DOM Exception 11 message appearing when you assign the text containing HTML characters (like é) to an element within the XHTML page. When you encounter this error with its very helpful message in a large web application after doing an AJAX request, you have really no idea if it's the fault of JQuery, AJAX, or something else.

  • Writing HTML 5 code doesn't mean mixing up tags all around. If you're passionate about XML and XHTML, you can still write HTML 5 code which will look very close to XML.

  • In the early days of mobile phones, XHTML was interesting for the mobile devices which were not very powerful. Parsing XML is much easier than HTML. Now, with dual-core mobile devices, it really doesn't matter if they have to parse clean valid XML or dirty HTML full of hacks and mixed tags.

The spec of October 2014 mentions XHTML syntax. For the moment, it is unclear whether there is such a thing as the new XHTML language (not syntax), and if there is, what will be the position of XHTML, nor the adoption of the new XHTML standard by the mainstream browsers.

Arseni Mourzenko
  • 134,780
  • 31
  • 343
  • 513
  • 12
    I think the only thing your answer is missing is a reference to [polyglot markup](http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html) – yannis May 23 '12 at 15:21
  • 1
    The JSF specification declares conformity to XHTML standards and many JSF based web applications still use the extension. Note this doesn't preclude some clever JSF component development occurring with HTML5 as well, but it isn't entirely abandoned as some think. – maple_shaft May 23 '12 at 16:52
  • 1
    @maple_shaft However the JSF specification is not the W3C's problem, is it? – Roc Martí May 23 '12 at 19:20
  • 2
    You can serve HTML5 as XML, however, and gain the benefit of stricter syntax. – Erik Reppen Mar 07 '13 at 01:49
  • 3
    @ErikReppen but you'll lose the benefit of entity references like ` ` – Mr Lister Dec 17 '13 at 15:54
  • 1
    One more reason why XHTML was problematic in practice: XML's draconian error handling. Most server-side tools are oriented around outputting gobs of text, instead of serializing a node tree. When you use the gobs of text approach, that yellow "xml parsing error" page is pretty much a given. Whatever benefits accrued due to XML's strictness were wiped out by XML's strictness. – Joeri Sebrechts Aug 26 '14 at 07:28
  • 4
    A terrible decision. We threw away the XSL tooling that would have been available with XML. – Mihai Danila Oct 28 '14 at 12:26
  • 7
    This statement is false. The W3C has not abandoned XHTML in HTML5 [The XHTML syntax, vocabulary and APIs](http://www.w3.org/TR/html5/the-xhtml-syntax.html) – Rob Dec 08 '14 at 14:50
  • 1
    The W3C did not abandon XHTML. The W3C has suspended work on XHTML2 in favor of HTML5. XHTML still exists, and the latest version is XHTML5 which is part of the HTML5 specification. There are over all three possible syntaxes for HTML5, one is a syntax which is HTML5 and not an XML-compatible syntax and must only be served as `text/html`, one is a syntax which is XHTML5 which is not a HTML- / SGML-compatible syntax and must only be served as `application/xhtml+xml`, and one is the polyglot syntax which is HTML5 in XML and can be served as both. XHTML5 is all but dead or abandoned. – Christian Hujer Dec 31 '14 at 14:44
  • 1
    @MrLister You lose the benefit of entities if you do not declare a DOCTYPE. You can provide a DOCTYPE which has a public identifier supported by the browser or a system identifier which declares the entities in which you're interested. – Christian Hujer Dec 31 '14 at 14:48
  • 1
    @ChristianHujer Well, to be pedantic, if you include a DOCTYPE declaration, then it's technically no longer XHTML5, but XHTML1. Unless you create a system identifier of your own, but then you may confuse the browsers (in a worst case scenario, to the point where they will use Quirks mode) and the W3 validator. – Mr Lister Dec 31 '14 at 15:54
  • 3
    @MrLister The declaration ` ` (no system identifier, no public identifier) is a DOCTYPE declaration but it's *not* XHTML1. The declaration ` ` is not XHTML1 but XHTML 1.1. Similar for XHTML Basic. Most browsers render `application/xhtml+xml` in standards mode always. If I create my own DOCTYPE based XHTML version, use the (X)HTML namespace, serve it as `application/xhtml+xml` and follow HTML5, I'm on the safe side - even with IE9. – Christian Hujer Dec 31 '14 at 16:06
  • @ChristianHujer I know all that. My point was simply that if you include the HTML5 DOCTYPE in an XML file, the browsers won't recognise entity names. And if you include a classic XHTML doctype, most browsers want it to be one of the standard ones (1.0 strict/transitional/frameset or 1.1), and if you invent one of your own, browsers don't always react in the same manner! – Mr Lister Dec 31 '14 at 17:59
  • 1
    @ChristianHujer See my own site, http://examples.strictquirks.nl/errors/errors.xhtml#doctype for examples on how browsers differ when confronted with non-standard doctypes. – Mr Lister Dec 31 '14 at 18:02
36

XHTML5 is a synonym for "HTML5 serialized as XML".

There are various concrete syntaxes that can be used to transmit resources that use this abstract language, two of which are defined in this specification.

...

The second concrete syntax is the XHTML syntax, which is an application of XML. When a document is transmitted with an XML MIME type, such as application/xhtml+xml, then it is treated as an XML document by Web browsers, to be parsed by an XML processor. Authors are reminded that the processing for XML and HTML differs; in particular, even minor syntax errors will prevent a document labeled as XML from being rendered fully, whereas they would be ignored in the HTML syntax. This specification defines version 5.0 of the XHTML syntax, known as "XHTML 5".

Also, there's a nice document on writing HTML5 polyglots (pages, that can be serialized both as regular HTML5 and XML) here:

http://dev.w3.org/html5/html-polyglot/html-polyglot.html#bib-HTML5

And a validator even!

http://html5.validator.nu/

It's rarely called XHTML5 nowadays (and probably even more seldom used), since it's basically still HTML5, but it's still there.

Simply put: every change to HTML5 spec is also an implicit, corresponding change to XHTML5.

13

HTML5 is a de facto and de jure standard! XHTML is there, as standard also.

HTML5 - A vocabulary and associated APIs for HTML and XHTML

W3C Recommendation 28 October 2014

The title of the standard contains the string "and XHTML", so, we are talking about a final decision of W3C to merge HTML and XHTML into one single standard; and this standard shows how to serialize an HTML file into XHTML file and vice-versa.

XHTML parts and important notes:


Understanding and using

As summarized by LF Sikos

XHTML5 is the XML serialization of HTML5. The syntax is described by the HTML5 specification. However, one shouldn’t be confused since XHTML5 is as an application of XML. In other words, HTML5 and XHTML5 have identical vocabulary but different parsing rules.

HTML5 documents might also be valid XML documents. This markup is often referred as a “polyglot” language. It is the overlap language of documents which are HTML5 and XML documents at the same time. HTML5 and XHTML5 serializations are cross-compatible. However, XHTML5 has a stricter syntax. Furthermore, some parts of XHTML5 are not valid in HTML5, e.g., processing instructions.

So, strictly speaking (and emphasized by @vaxquis) "XHTML is just a syntax for XML serialization", there are no DTD or other kind of XML schema.

Some people not like to say "XHTML5 is XHTML". The question must split into a mini-FAQ about "when I can use it as XHTML". This is a WIKI, please correct if there are some "misunderstanding"...


FAQ

Can I use XHTML5 as the "2014's version of XHTML standard"?

There are some problems in a "perfect and generic HTML5-to-XHTML5/XHTML5-to-HTML5 convertions", you must do "personal choices" and lost information. As the context will be different answers:

  • Loose speaking: YES. There are a lot of (simple) examples where the mapping is perfect and reversible.

  • Strictly speaking: NO. See also @vaxquis comment below and old answers in this page. Some typical problems:

Can I use (fearless!) XHTML5 serialization with XSLT, XPath, etc.?

Yes, you can. Even serializing fragments.

Can I validate XHTML5?

Yes, but not so fast and easy than the old DTD's... See complex validators, as validator.nu

Can I use XHTML5 as non-terminal output in a XSLT chain?

Yes, you can. Let's explain what you can.

Some frameworks, like Cocoon, use "XSLT chains". HTML5 and XHTML5 outputs can be used as "last output in the chain"... Of course, in intermediary steps, HTML5 can not be used because is non-XML, but XHTML5 can be used.

The above problem of validation reappears here: there no strong convention, so, sometimes, less clarity of "XHTML standard structure" appears. In that situation you must pay attention in "yourself conventions", and be consistent.

When using DOMDocument of a HTML5 page, can I use a saveXML() method?

Yes. This is a typical situation where the serializaion recommendations are used. The XML will be valid, the XHTML5 code is mapped from the original HTML5 and DOM state... But, in some structures, some information can be lost, as commented above.

Peter Krauss
  • 747
  • 1
  • 9
  • 23
  • 1
    nope. XHTML is just a *syntax* for XML serialization of HTML5, which is a topic I already covered in my answer about a year ago. There is no "merge", because it was already "merged" by the early W3C/WHATWG HTML5 drafts after shelving XHTML 2.0; You're misunderstanding the context here. Also, XPath & XSLT are only tangentially related to the matter; also, how is a well-known MIME type "XHTML part and/or note"? Also, you basically *can't* serialize HTML *into* XHTML - the proposed solution is to write polyglots serializing as both, not "re-serialize" it. –  Feb 09 '15 at 15:19
  • hum... @vaxquis, ok, I edited, please help. And here, at comments, let's spoke in the same language: you use "strictly speaking" and I used "broader speaking" in the introduction... Now we can point in the answer's text what you want to correct. – Peter Krauss Feb 09 '15 at 15:45
  • What sense does it make to have a XML file without a schema? – ceving Sep 01 '21 at 07:52
  • @ceving, XML is used as final format also, not only for data-interchange. See for example the [use of XML in PostgreSQL](https://www.postgresql.org/docs/current/functions-xml.html): **after** the [*ingestion process*](https://en.wikipedia.org/wiki/Extract,_transform,_load) (when some validation can be helpful), **no validation is needed**, but a lot can be done with XML. And not be confuse about "XHTML5 validation", it is not a "classic validation" but it useful to block bad inputs. ... And, in general the best is to use a domain specific validation for your input, so a specialized DTD. – Peter Krauss Sep 04 '21 at 11:17
  • @user88637 I not agree 100% with your assertions, but it is a Wiki (!), you can **add a section like "CRITICISM" and explain better**, citing examples and/or reliable sources... like in the Wikipedia. – Peter Krauss Sep 04 '21 at 11:22
9

Yes unfortunately XHTML is gone.

Adding 1 more reason to MainMa's great answer:

When XHTML was created, it was meant to be used by WebApps to serve structured content that would be understood by non-browser softwares, that would not have tag-soup HTML parsers. For ScreenReaders XHTML is still great, but for any other kind of software, WebServices fit that need, and they mostly use XML or JSON. SOAP itself has its own XML Schema, simpler than XHTML and operation-oriented.

As long as I know, there's not even 1 WebApp in the world that serves the same HTTP message to both browsers and other clients. Even REST architecture, which was meant to serve the same representation of a content in multiple content types based on client's preference, isn't used to serve XHTML/feed browsers.

In Java EE for example, using Eclipse we can deploy a unique war file holding Servlets+JSPs to serve HTML, together with Axis2 to serve a WebService. It's simply easier to develop separated softwares aimed for browsers and WebService than have a unique, complex software that serves them all.

The major reason for REST being rejected is exactly the complexity (and it was meant to be simple!) of developing a server that serves the same content for any type of client without knowing anything about it. And it's also hard to handle Web's need of fast evolving, together with keeping a stable definition that would not force non-browser clients to be updated every time an XHTML changes, say it to keep the XHTML valid when it's built by many different modules.

In the same way, it's very hard to develop a non-browser client that parses an XHTML document, even it being valid, because of all those XML elements that are meant to structure the browser-rendered layout, and not meant to hold content.

If REST adopters already complain about SOAP's XML complexity, which is WAY simpler than an XHTML meant for a browser, imagine how hard it is to handle XHTML for multiple client types, server and client-side.

In practice: use HTML, XML-like if you want, to build WebSites for browsers, and any kind of WebService solution for non-browser clients.

BUT, I also think that XHTML5 must be created. XHTML 1.1 (ok, 1.0, 1.1 is unusable) will become outdated with HTML5, and we still need a validator that accepts HTML5's elements and validates XML wellformedness.

Pang
  • 313
  • 4
  • 7
Hikari
  • 215
  • 2
  • 1
  • 2
    Maybe I'm a bit late, but how is XHTML 1.1 unusable compared to 1.0? If anything, its DTD contains more elements. Unless you're talking about framesets and things like that? – Mr Lister Dec 17 '13 at 15:58
  • 4
    [Vocabulary and associated APIs for XHTML5](http://www.w3.org/TR/html5/the-xhtml-syntax.html) – Rob Dec 08 '14 at 14:48