How is rendering a Word document different from rendering a website?

Question

Now, it doesn't necessarily have to be Word — for ease of comparison, let's use ODT, which is based on XML — which is pretty similar to HTML. That would, to my mind, make rendering an ODT document almost like rendering an HTML website.

With ODT and HTML+CSS basically being two ways of describing a page's layout, what are the differences in rendering them?
Is it simply that HTML+CSS is more flexible and thus requires more complex rendering? A complicated website can have countless nested elements, all with relative positioning, custom styling etc. Compared to that, an ODT has a far simpler/more predictable structure, which I think should be easier to render.

They are both mark up types, I would think that at least the concept is similar, but the implementations could be drastically different. — bakoyaro, Jan 14 '16 at 14:41
@bakoyaro - with a couple more sentences, your comment would make a good answer — Dan Pichelman, Jan 14 '16 at 14:43
I suspect the closest thing to a fundamental difference is that a Word document renderer gets to assume everything is on the local filesystem, while a website renderer has to assume that a potentially large part of the content will require waiting for network requests. — Ixrec, Jan 14 '16 at 14:49
Websites are more than just HTML rendering as CSS and JavaScript also have to be handled. — JB King, Jan 14 '16 at 20:28
Imagine what the Internet would be like today if we had had "Microsoft Word browsers" instead of "HTML (Web) browsers". — Brandin, Jan 14 '16 at 23:45
@Brandin, there was FrontPage once upon a time from Microsoft. — JB King, Jan 15 '16 at 21:23

score 4 · Answer 1 · answered Jan 14 '16 at 14:59

4

When your talking about rendering engines they are very different. For one thing, HTML documents have links to external resources, and are meant to present a way to navigate between pages. That's what "Hypertext" is. Word documents are meant to represent markup of a printed page. They are almost a typesetting tool.

HTML has to work and relay the information regardless of output device (screen, printer, screen printer, TTS, or others). A word document's output is either an emulated 8.5 x 11 page or a real one (or other sizes).

The very job of HTML and word documents is fundamentally different. It's basically trying to compare cars and boats. There are similarities, but there are way more differences.

answered Jan 14 '16 at 14:59

coteyr

2,420
1
12
14

Note that Word has, for a long time, had flow/web layouts that make no pretense of being page-oriented. Likewise, HTML is perfectly capable of being used for page layout, and routinely *is*, whenever printable pages are designed. Their differences can therefore easily be exaggerated. – Nathan Tuggy Jan 14 '16 at 20:45
How does internal/external influence rendering an image? I'm guessing that fetching an image from a separate file could take more time than using one that's embedded in the document, but other than that? – uryga Jan 14 '16 at 21:15
1

Depending on what you are looking at, a car and boat can have more similarities than differences, e.g. if you are interested in their engines and power systems they share quite a lot in common. The exact same thing is going on with HTML vs Typesetting. In principle, they are doing very similar tasks, but for different design goals. – whatsisname Jan 15 '16 at 18:34

bakoyaro · Answer 2 · 2016-01-15T15:22:26.313

Let's cut to the chase, we are talking about mark-up languages and how they are displayed in a browser.

HTML is data coupled with the instructions on how to display the data. Other technologies such as CSS and Javascript, can be used to make changes to the document after it is rendered in a browser.

XML is primarily data, generally without instructions on how to display the data. XSLTs, etc can be used in conjunction with the XML to display that data in a chosen format.

ODT is XML, but by extension, as well as properties, can be transformed from text and binary resources into a graphical display, much like a HTML document is rendered in a browser.

As with anything in CS, there will be exceptions, such as an API that or some other tool that can make changes that were not envisioned by the authors of the specifications.

Browsers are designed to take HTML (text), based on the extension type, and turn that into a graphical display of those text and binary resources.

Most browsers are designed to take XML, also based on the extension type, and display a hierarchical tree of the data. That is where transformations such as XSLTs come in, they are designed to take data in a specific format, and then transform the data into something else; HTML, text, more XML, etc.

XML is primarily concerned with storing the data, by design there aren't any instructions embedded in XML that define how the data should be displayed. Custom XML schemas sometimes throw this idea right out the window and mix XML with custom elements and attributes in order to create their own markup language variant, for their own custom interpreter, such is the case with ODT and other open document types.

Since ODT is also XML, based on the extension a browser could process the data using a specific set of instructions.

Check out these links for more information on the HTML and XML specifications:

Link to HTML 5 Specification at W3C

Link to the XML Specification at W3C

Incorrect. An XML language may or may not contain information about presentation. It depends on the specific language. ODT does contain presentational information. — JacquesB, Jan 14 '16 at 20:32
Also, HTML does not necessarily contain representation rules. CSS does that, and you can easily set "b" tags to use any font weight. Also, HTML standard has shifted from representational tags ("i", "b", "div", etc.) to semantic ones ("em", "strong", "header", "section" etc.) exactly to get rid of representation details. — scriptin, Jan 14 '16 at 21:13
Down votes, really? Do that many people need the critic badge around here? — bakoyaro, Jan 15 '16 at 15:24

Jon Raynor · Answer 3 · 2016-01-14T20:24:01.600

-1

HTML documents have pre-defined tags, whilst XML does not. Because the tags are defined, browsers can be made to render the display.

A <BODY> tag has a specific meaning in HTML and is treated as such.

Now consider this XML fragment:

<BODY>
 <ARM></ARM>
 <EYECOLOR></EYECOLOR>
</BODY>

The <BODY> tag has a different context and thus is treated differently than an HTML tag.

As indicated by @MichealIT, ODT also has specific defined format for it's tags, so any software or browser will need to adhere to that definition in order to render the document. There are probably many differences between an HTML document versus an ODT one.

edited Jan 14 '16 at 20:24

answered Jan 14 '16 at 20:10

Jon Raynor

10,905
29
47

An ODT file has a [defined format that can be validated](https://wiki.oasis-open.org/office/How_to_Validate_an_ODF_document). Each tag has a specific meaning. – Jan 14 '16 at 20:16
@JonRaynor I'm not quite understanding the down votes on your answer or mine, they both seem alright to me. – bakoyaro Jan 15 '16 at 15:25
1

@Bakoyaro - I was wondering why as well, basically you have format X and format Y. Although both formats use markup and tagging to describe the document, the specifications are different. – Jon Raynor Jan 15 '16 at 15:57

How is rendering a Word document different from rendering a website?

3 Answers3