109

I've been wondering why XML has an L in its name.

By itself, XML doesn't "do" anything. It's just a data storage format, not a language! Languages "do" things.

The way you get XML to "do" stuff, to turn it into a language proper, is to add xmlns attributes to its root element. Only then does it tell its environment what it's about.
One example is XHTML. It's active, it has links, hypertext, styles etc, all triggered by the xmlns. Without that, an XHTML file is just a bunch of data in markup nodes.

So why then is XML called a language? It doesn't describe anything, it doesn't interpret, it just is.

Edit: Maybe my question should have been broader. Since the answer is currently "because XML was named after SGML, which was named after GML, etc" the question should have been, why are markup languages (like XML) called languages?

Oh, and WRT the close votes: no, I'm not asking about the X. I'm asking about the L!

Mr Lister
  • 1,599
  • 3
  • 12
  • 18
  • 130
    On what do you base your requirement that a language has to "do" something? I don't see that in any of the definitions at [dictionary.com](http://www.dictionary.com/browse/language?s=t). – kdgregory Apr 03 '16 at 13:06
  • 3
    @kdgregory Those are human languages, not computer languages. However, but anyway the page makes a big deal of the "communication" part. A language is meant to be understood by both parties, the sender and the receiver. This is what xml does _only_ if it contains namespace information! – Mr Lister Apr 03 '16 at 13:10
  • 10
    Just like Swahili is only understood if both understand it. Or a medical journal article is understood if the reader understands that part of the language. It's no different. And people make up the definitions. – Sami Kuhmonen Apr 03 '16 at 13:47
  • 3
    http://programmers.stackexchange.com/questions/88405/why-is-xml-not-called-eml?rq=1 contains the answer: because the people who named it had a vote on various names and XML won. – James Snell Apr 03 '16 at 13:59
  • 46
    Markup language is a common term https://en.wikipedia.org/wiki/Markup_language – paparazzo Apr 03 '16 at 15:35
  • 38
    @MrLister: _"Those are human languages, not computer languages"_ A language is a language. At its most extreme, even English requires contextual information (which dialect is being used) to understand unambiguously. Doesn't stop it from being a language. Your question simply has a false premise. – Lightness Races in Orbit Apr 03 '16 at 16:24
  • 3
    [On another note...](http://programmers.stackexchange.com/questions/28098/why-does-it-matter-that-html-and-css-are-not-programming-languages) – Robert Harvey Apr 03 '16 at 16:28
  • 3
    Because the L in "XML" stands for "language". (And XML **does** describe something -- it describes the "markup".) – Daniel R Hicks Apr 03 '16 at 18:36
  • 3
    @DanielRHicks Ehm, you are answering my question about _why XML has an L in its name_ with _because the L stands for "language"_? – Mr Lister Apr 03 '16 at 19:07
  • 1
    Note that nothing would prevent from inventing an XML interpreter that takes any well-formed XML document and executes it. You'd have to define the semantics of the XML markup in terms of operations, so you'd end up with a programming language that is completely unreadable, but it could be done. And you can do the opposite: think of programs as simply describing data. See denotational semantics for example. With that you could take a Java program and see it as a description of a mathematical (continuous) function between certain semantic domains. – Bakuriu Apr 03 '16 at 19:18
  • 3
    @Bakuriu You're describing XSLT there. Semantics, check. Completely unreadable, check. – Mr Lister Apr 03 '16 at 19:21
  • 72
    Languages don'rt *do* things, they *express* and *communicate* things – Hagen von Eitzen Apr 03 '16 at 20:42
  • 4
    If the only reason to call it a "language" was to annoy the pedants, I think that would be reason enough. – Karl Bielefeldt Apr 03 '16 at 21:57
  • @KarlBielefeldt OK, but please note that a) the tags to my post saying "programming" were not added by me, and b) most of us are also pedants! – Mr Lister Apr 04 '16 at 07:19
  • 6
    `It doesn't describe anything, it doesn't interpret, it just is.` So? Doesn't mean it isn't a language. This is a silly question. – JᴀʏMᴇᴇ Apr 04 '16 at 09:15
  • 3
    @JᴀʏMᴇᴇ That may well be, but it was something I was wondering about, and asking in here got me not just the answer, but much other food for though as well, so I'm glad I did. – Mr Lister Apr 04 '16 at 09:23
  • 3
    The definition of a formal language is a set of finite strings (from some alhphabet). The set of all possible XML documents is a language. – RemcoGerlich Apr 04 '16 at 09:44
  • Languages can express *what to do*. Note that *what to do* is a recipe to create almost anything, provided someone understands the language. – S.D. Apr 04 '16 at 13:36
  • 5
    Lots of languages don't "do" anything. Query languages describe a model of desired information and constraints required to retrieve it -- but it's up to the query engine to decide how to use them. Markup languages alter content to add context, content handling advice or categorization -- which makes that content easier for an engine to digest. One could even argue most high level programming languages don't "do" anything outside the context of a compiler or interpreter -- Java won't run on V8, for example. – Matthew Mark Miller Apr 04 '16 at 14:32
  • 6
    related (possibly a duplicate): [Programming Language vs Markup Language vs Scripting Language](http://programmers.stackexchange.com/questions/241104/programming-language-vs-markup-language-vs-scripting-language) – gnat Apr 04 '16 at 15:25
  • 5
    In a computer science context, "language" means a sequence of symbols, following a set of rules. – Vatine Apr 04 '16 at 16:25
  • 2
    Related : http://stackoverflow.com/q/5175840/327083 – J... Apr 05 '16 at 17:34
  • 4
    I think you are getting the concept of a language mixed up with a **Turing complete language**, which are a subset of languages - https://en.wikipedia.org/wiki/Turing_completeness – Max Williams Apr 07 '16 at 14:47
  • @Vatine To refine that more, computer science defines a language as a subset of all possible strings over a set of symbols. Viewed another way, the only basic requirement of a language is to be able to tell which strings are inside the language and which strings are outside. The definition does not require the language to have meanings, actions, or rules. – Nayuki Jul 31 '20 at 07:02

6 Answers6

245

The real answer is XML has an L in the name because a guy named Raymond Lorie was among the designers of the first "markup language" at IBM in the 1970'ies. The developers had to find a name for the language so they chose GML because it was the initials of the three developers (Goldfarb, Mosher and Lorie). They then created the backronym Generalized Markup Language.

This later became standardized as SGML (Standardized General Markup Language), and when XML was created, the developers wanted to retain the ML-postfix to indicate the family relationship to SGML, and they added the X in front because they thought it looked cool. (Even though it doesn't actually make sense - XML is a meta language which allows you to define extensible languages, but XML is not really extensible itself.)

As for your second question if XML can legitimately be called a language:

Any structured textual (or even binary) format which can be processed computationally can be called a language. A language doesn't "do" anything as such, but some software might process input in the language and "do" something based on it.

You note that XML is a "storage format" which is true, but a textual storage format can be called a language, these term are not mutually exclusive.

Programming languages are a subset of languages. E.g. HTML and CSS are languages but not programming languages, while JavaScript is a real programming language. That said, there is no formal definition of programming language either, and there is a large grey zone of languages which could be called either data formats or programming languages depending on your point of view.

Given this, XML is clearly a language. just not a programming language - though it can be used to define programming languages like XSLT.

Your point about namespaces is irrelevant. Namespaces are an optional feature of XML and do not change the semantics of an XML vocabulary. It is just needed to disambiguate element names if the format may contain multiple vocabularies.


Edit: reinierpost pointed out that you might have meant something different with the question than what I understood. Maybe you meant that specific vocabularies like XHTML, RSS, XSLT etc. are languages because they associate elements and attributes with particular semantics, but the XML standard itself does not define any semantics for specific elements and attributes, so it does not feel like a "real language".

My answer to this would be that XML does define both syntax and semantics, it just defines it at a different level. For example it defines the syntax of elements and attributes and rules about how to process them. XML is a "metalanguage" which is still a kind of language (just like metadata is still data!). As an example EBNF is also clearly a language, but its purpose is to define the syntax of other languages, so it is also a metalanguage.

JacquesB
  • 57,310
  • 21
  • 127
  • 176
  • 33
    There is a [formal definition of a language in computing](https://en.wikipedia.org/wiki/Formal_language). –  Apr 03 '16 at 18:01
  • 19
    @Snowman: A "formal language" does not necessarily correspond to what is usually called a language in computing. For example a "formal language" does not need to be textual - machine code is as a formal language as is most binary formats and protocols. So I wouldn't say the term "formal language" covers the same meaning as "language" in computing. – JacquesB Apr 03 '16 at 18:14
  • 15
    I am not aware of any requirement that a language needs to be textual or not be textual. The idea of constructing a sentence from terminals has nothing to do with an arbitrary interpretation of the bits in those terminals, or which types of computers (silicon or carbon-based) are capable of reading them. –  Apr 03 '16 at 18:16
  • 5
    @NicolBolas: Good point, machine code is definitely a language. I just think it is more common to call binary languages "formats", e.g. you say the GIF-format not the GIF-language. – JacquesB Apr 04 '16 at 07:25
  • Let's keep it at this. – Mr Lister Apr 04 '16 at 09:19
  • It's worth pointing out that, just as XML represents a tree structure, all programming languages (including machine code) are also tree-like in structure. Terms such as "language" and "format" seem to be more colloquial rather than strictly technical. – Ben Cottrell Apr 04 '16 at 09:44
  • 1
    @BenCottrell: How is machine code tree-like in structure? – JacquesB Apr 04 '16 at 11:16
  • @JacquesB Sorry, to be clear I mean in terms of decision-making, each decision or `jump` could be represented as a path in a 'tree' – Ben Cottrell Apr 04 '16 at 12:55
  • 3
    @BenCottrell: Wouldn't it be a graph then, since is may have loops? – JacquesB Apr 04 '16 at 13:05
  • 3
    Fun fact: XML can in fact be used as a programming language: See e.g. [o:XML](http://www.o-xml.org/), which is an object-oriented programming language based on XML. You have just gazed into the abyss, and it gazed back into you. – errantlinguist Apr 05 '16 at 16:49
  • 1
    Minor nitpick - "...in 1969, together with Ed Mosher and Ray Lorie, I invented Generalized Markup Language (GML) " according to Goldfarb (source: http://www.sgmlsource.com/history/roots.htm) – James Snell Apr 05 '16 at 20:19
  • @Ben Cottrell '"all programming languages" are tree-like' - only at the context free level of analysis, what we usually call parsing. Once you tie symbol usages to definitions, which is essential for interpreting or compiling, they become graph structures. As for machine code - you can force it into a tree, but it's a pretty degenerate tree. Machine code, unlike higher level languages, isn't built around recursively nesting expressions. – James Iry Apr 05 '16 at 21:30
  • 1
    Anybody ever used [struts logic tags](http://www.jajakarta.org/struts/struts1.1/documentation/ja/target/userGuide/printer/struts-logic.html), that proves that XML can be a programming language and it also proves that it's a really bad idea to do logic with XML. – Ruan Mendes Apr 06 '16 at 13:15
  • why does everyone who uses the word "backronym" link to the definition of the word... – xdhmoore Apr 06 '16 at 19:20
  • 1
    The definition of a language in formal language theory is very narrow and crafted for a particular purpose - it does *not* cover how the term is typically used in IT or in computer science - only one aspect of it. In IT, languages usually have both a form (syntax) and meaning (semantics). The questioner seems to argue that XML doesn't have a semantics, only a syntax. – reinierpost Apr 07 '16 at 08:40
  • @reinierpost: Good point, I read the question differently, but your interpretation makes sense. I have added a bit to the answer. – JacquesB Apr 07 '16 at 11:59
  • I think it's a great answer! – reinierpost Apr 07 '16 at 12:37
  • The use of "storage format" is interesting. I think that is how a lot of written language came about. Everything was word of mouth, then people made notes of numbers. Then they made labels to remind them what the numbers were counting. Then I guess someone worked out that if "three baskets of corn" could be done as symbols it would not be hard to make a symbol for "give me" or similar. – TafT Apr 08 '16 at 07:29
180

Because it is a language. A markup language, not a programming language.

Notice that natural human languages like English and Spanish don't "do" anything either. In fact, technically C++ and Java and the like don't "do" anything until they're fed into a compiler and the output gets executed. Doing stuff and being a language are largely orthogonal to each other.

Ixrec
  • 27,621
  • 15
  • 80
  • 87
  • 44
    Substitute "interpreter" for "compiler". Being fed to a compiler doesn't make them "do" anything, either, it just translates them into a different language, which, again, doesn't "do" anything. All execution is interpretation. Sometimes, the interpreter might be extremely simple and implemented in silicon, in which case we call it an "execution unit", but it's still an interpreter. `` Anyway, good answer! – Jörg W Mittag Apr 03 '16 at 19:00
  • 8
    @JörgWMittag Good point. Since I randomly chose languages that are normally compiled, added "and the output gets executed". – Ixrec Apr 03 '16 at 19:02
  • 1
    An *extensible* markup language, if you will. – doppelgreener Apr 04 '16 at 05:58
  • Also if a DSL uses XML as a transport then it can become a programming language as well. It's eXtensible that way. – Den Apr 04 '16 at 11:53
  • 1
    I'd argue that human languages do "do" things. See Speech-Act Theory... – Ray Apr 04 '16 at 14:53
  • 2
    Sweet, sweet orthogonality. Execute the language in a different algebra, and a whole new set of actions unfolds. Under theory, anyway. – Kenogu Labz Apr 04 '16 at 21:28
  • @Ray I'd refute that on the grounds that mimes can do all these things without the use of language. That and it's the person doing the thing, not the language. – Pharap Apr 05 '16 at 00:37
  • @Pharap Not that I agree with Ray, but mimes do use language. It just isn't spoken language. Of course, this still goes against what Ray suggests (as do facts like the existence of humans who were deaf all their life). Does the language you use affect the way you act and think? Maybe, IIRC that's still a "hot" debate in some circles. But overall, people affect language more than language affects people - if I find there isn't a good word for something I want to express, I just make a new word. It happens all the time. – Luaan Apr 05 '16 at 08:16
  • @Luaan Arguably body language doesn't count as a natural language because there's no fixed system. Unless mimes are communicating with some sort of sign language. – Pharap Apr 05 '16 at 12:24
  • 1
    @Pharap It's a lot more flexible than "spoken language", sure. But ultimately, the goal is to enable communication between two people. And it does that, even though it's imperfect (What language is perfect, though? Even Lojban allows for ambiguities, although it makes them more obvious). If you see a mime miming climbing a rope, and you understand it to mean "climbing a rope"... why wouldn't that be a language? – Luaan Apr 05 '16 at 12:53
  • @Ray: A language will not "do" anything if there is no one listening/observing. – JacquesB Apr 06 '16 at 10:41
  • William Shatner does things with human language. – emory Apr 07 '16 at 15:37
103

Let Σ be a non-empty, finite set of symbols, called an alphabet. Then Σ* is the countable infinite set of finite words that can be formed by concatenating zero or more symbols from Σ. Any well-defined subset L ⊆ Σ* is a language.

Let's apply this to XML. Its alphabet is the Unicode character set U, which is non-empty and finite. Not every concatenation of zero or more Unicode characters is a well-formed XML document, for example, the string

<tag> soup &; not <//good>

is clearly not. The subset XML &subset; U* that forms well-formed XML documents is decidable (or “recursive”). There exists a machine (algorithm or computer program) that takes as input any word wU* and after a finite amount of time, outputs either 1 if w ∈ XML and 0 otherwise. Such an algorithm is a sub-routine of any XML processing software. Not all languages are decidable. For example, the set of valid C programs that terminate in a finite amount of time, is not (this is known as the halting problem). When one designs a new language, an important decision to make is whether it should be as powerful as possible or whether the expressiveness would better be restricted in favor of decidability.

Some languages can be defined by means of a grammar that is said to produce the language. A grammar consists of

  • a finite set of literals (also called terminal symbols),
  • a disjoint finite set of variables of the grammar (also called non-terminal symbols),
  • a distinguished starting symbol, taken from the set of variables and
  • a finite set of rules (so-called productions) that allow certain kinds of replacements.

Any word that consists exclusively of literals and can be derived by starting with the starting symbol and then applying the given rules belongs to the language produced by the grammar.

For example, the following grammar (in rather informal notation) lets you derive exactly the integers in decimal notation.

  1. The literals of the grammar are the digits 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0.
  2. The variables are the symbols S and D.
  3. S is the starting symbol.
  4. Any occurrence of the variable S may be replaced
    • with the literal 0 or
    • by any of the literals other than 0 followed by the variable D.
  5. Any occurrence of the variable D may be replaced
    • by any of the literals followed by another instance of the variable D or
    • by the empty string.

Here is how we derive 42:

S —(apply rule 4, 2nd variant)→ 4 D —(apply rule 5, 1st variant)→ 42 D —(apply rule 5, 2nd variant)→ 42.

Depending on how elaborate rules you allow in your grammar, differently sophisticated machines are required to prove that a given word can actually be produced by the grammar. The example given above is a regular grammar, which is the most simple and least powerful. The next powerful class of grammars are called context-free. These grammars are also very simple to verify. XML (unless I'm overlooking some obscure feature I'm not aware of) can be described by a context-free grammar. The classification of grammars forms the Chomsky Hierarchy of grammars (and therefore languages). Every language that can be described by a grammar is at least semi-decidable (or “recursively enumerable”). That is, there exists a machine that, given a word that actually belongs to the language, derives a proof that it can be produced by the grammar within finite time, and will never output a wrong proof. Such a machine is called a verifier. Note that the machine may never halt when given a word that doesn't actually belong to the language. Clearly, we want our programming languages be described by less powerful grammars for the benefit of being able to reject invalid programs within finite time.

Schemata are an addition to XML that allow refining the set of well-formed documents. A well-formed document that follows a certain schema is called valid according to that schema. For example, the string

<?xml version="1.0" encoding="utf-8" ?>
<root>all evil</root>

is a well-formed XML document but not a valid XHTML document. There exists schemata for XHTML, SVG, XSLT and what not else. Schema validation can also be done by an algorithm that is guaranteed to halt after finite amount of steps for every input. Such a program is called a validator or a validating parser. Schemata are defined by so-called scema definition languages, which are a way to formally define grammars. XSD is the official schema-definition language for XML and is, itself, XML-based. RELAX NG is a more elegant, much simpler and slightly less powerful alternative to XSD.

Because you can define your own schemata, XML is called an extensible language, which is the origin of the “X” in “XML”.

You can define a set of rules that gives XML documents an interpretation as descriptions of computer programs. XSLT, mentioned earlier, is an example of such a programming language built with XML. More generally, you can serialize the abstract syntax tree of almost any programming language quite naturally into XML, if this is what you want.

5gon12eder
  • 6,956
  • 2
  • 23
  • 29
  • What do you mean by well-defined subset? And why do you require that a subset of Σ* be well-defined in order for it to be a language? Wouldn't any subset be called a language? – Giorgio Apr 03 '16 at 20:33
  • 7
    @Giorgio: In mathematics, "well-defined" is largely just an intensifier: everything that mathematically exists is already well-defined. – Kevin Apr 03 '16 at 21:01
  • 9
    @Giorgio With “well-defined” I mean that there is a formal predicate that tells whether an item belongs to the set or not. This predicate will in general not be computable but it has to be clearly specified without contradiction. Otherwise, [bad things might happen](https://en.wikipedia.org/wiki/Russell%27s_paradox). “The pairs of strings (*w*, *M*) where *M* is the smallest description of a Turing machine that outputs *w* and then halts” is a well-defined but non-computable (see [Kolmogorov complexity](https://en.wikipedia.org/wiki/Kolmogorov_complexity)) predicate. … – 5gon12eder Apr 03 '16 at 21:02
  • 1
    … “The set containing all strings that are not contained in the set” is self-contradicting and not useful. – 5gon12eder Apr 03 '16 at 21:02
  • 2
    @5gon12eder: That set does not exist under ZFC (because the axiom schema of separation is not powerful enough to describe it); if you're using some other set theory, you should specify it. – Kevin Apr 03 '16 at 21:03
  • @Kevin To which set are you referring? The Kolmogorov tuples? I'm pretty sure the language exists, though I have to admit that I've only been taught it informally and didn't work through a formal prove. – 5gon12eder Apr 03 '16 at 21:08
  • 5
    @5gon12eder: “The set containing all strings that are not contained in the set” does not exist. The term "well-defined" is ironically not well-defined. – Kevin Apr 03 '16 at 21:09
  • @Kevin Yes, it was meant as an example of a silly definition of something that doesn't actually exist. Maybe a better (less obvious) one could be given. – 5gon12eder Apr 03 '16 at 21:11
  • 1
    @5gon12eder: “The set containing all strings that are not contained in the set” is not a set. Any set is, by definition, well-defined. IMO the term well-defined is redundant here. – Giorgio Apr 04 '16 at 05:41
  • 1
    @kevin: Exactly: any mathematical object for which there is a proper definition is well-defined, by definition. Normally, you do not say: let's consider a well-defined graph, with a well-defined node, etc. You only say well-defined if there is a chance to have an ill-defined one. So, since in the answer there is no mention of what could be an ill-defined subset of Σ*, I do not see why one should stress that the subset should be well-defined. If I speak of a set of strings, it is obvious that it should be a well-defined one. – Giorgio Apr 04 '16 at 05:47
  • While your answer is not wrong that is a very scary answer to pose to a layperson with no knowledge of set theory and little experience with fancy mathematical terms like 'concatenate' and 'well-defined'. – Pharap Apr 05 '16 at 00:39
  • 3
    The *well-formed* property or *validation* is performed by a **grammar**. This answer had been perfect if you had mentioned that. – Thibault D. Apr 06 '16 at 08:10
  • 1
    @ThibaultD. Good point, I've added that. It didn't exactly make the answer simpler, though. – 5gon12eder Apr 06 '16 at 15:48
  • Haha good job @5gon12eder , I did not expect you to change that much. You do things thoroughfully :) – Thibault D. Apr 06 '16 at 18:31
  • The first paragraph made me feel I was back in theory of computation class. – cst1992 Apr 07 '16 at 13:45
  • @Kevin in mathematics, "well-defined", is a welcome reminder that you're still in mathematics. – candied_orange Apr 13 '16 at 21:42
31

In computer science, formal language is just a set of strings, usually infinite and often described using rules (two common versions of those rules are regular expressions and formal grammars).

Note that this means that all a language needs is syntax, language doesn't need to describe what each valid string means (that's called semantics).

Now, this means that programming languages are formal languages that also have semantics, which describes some computation. And for example XHTML is a formal language, whose semantics describe (roughly and informally) how a hypertext document looks and behaves.

XML is still a language, even though it doesn't have semantics itself (but many languages derived from XML do, like XHTML and XAML).

Technically, binary formats are also languages, but they're not called that way. The term "language" is reserved for human-readable formats.

svick
  • 9,999
  • 1
  • 37
  • 51
  • 1
    Then why aren't other storage formats called languages too? The format used in BMP or JFIF files? Database files? – Mr Lister Apr 03 '16 at 13:26
  • 11
    @MrLister Because they're not human-readable. When they aren't human-readable we tend to call them *formats* or *data formats* instead. – Mason Wheeler Apr 03 '16 at 13:43
  • @MrLister - It's worth pointing out that XML (like HTML) borrowed heavily from SGML which borrowed heavily from IBM's GML (generalized markup language) which dates to the 60's. Having the ML demonstrates that there is some relationship between those languages. Plus given the dates involved, the definition you use for what denotes a language in computing is something that has changed to a great extent since the use was coined. – James Snell Apr 03 '16 at 13:56
  • 4
    @JamesSnell Not to be confused with the other ML language family, of course. Yay for over-crowded acronyms! – Mason Wheeler Apr 03 '16 at 14:07
  • Mason is correct, but it is important to note that formats such as BMP _do_ have languages. The difference is solely in how we refer to them. –  Apr 03 '16 at 17:58
  • 4
    If one is using formal tools to build a parser (or especially a validator) for JFIF etc. than the engineers may indeed refer to it as a "language". More likely though as a "grammar". – JDługosz Apr 03 '16 at 20:24
  • 3
    @MrLister: Well, they _are_ languages, but because they define reusable data structures they have a special name: _formats_. But, yes, these are languages too. – Lightness Races in Orbit Apr 04 '16 at 00:58
  • @svick But there are lots of human-readable data formats that aren't called languages. RTF springs to mind. Why wouldn't that be called RTL then? Still not sure I like any of the answers enough to accept one. – Mr Lister Apr 04 '16 at 07:21
  • 5
    @MrLister: Naming a format is more a question of marketing. The XML people called XML for XML because "*ML" indicates a family relationship to predecessor format like GML and SGML, and because they thought it looked cool with an X in the front. And GML was called GML because it was a generalized markup language, but also because it was the initials of the three language designers. So basically the L in XML is because a guy called Raymond **L**orie was among the designers of the first markup language. – JacquesB Apr 04 '16 at 08:19
  • @JacquesB That's the best answer so far, but I'm afraid this means the question is asking for opinions. It was just a design decision based on some arbitrary criteria! Oh well. – Mr Lister Apr 04 '16 at 08:27
  • @MrLister: It is not really an opinion question since it is known why the developers chose the name. It would be an opinion question if you asked if it is a good name. – JacquesB Apr 04 '16 at 09:17
  • 1
    @MrLister But all those formats are *also* languages. I'm not sure if the two are orthogonal or inclusive, but they definitely aren't exclusive. BMP is a language used to describe a two-dimensional set of picture elements. JFIF is a language used to describe a two-dimensional set of picture elements - note that there's many different JFIFs and specific algorithms, but only one language. Is XML a format? Maybe. But it's definitely a language, especially in the full self-descriptive form. Anyway, XMF was already used for music files, so... :D – Luaan Apr 04 '16 at 15:34
  • 1
    I had to dig far too damn deep to read the right answer, containing the word **grammar**. – Thibault D. Apr 06 '16 at 08:11
  • @ThibaultD. same here, except that I could have been content with *syntax* and *semantics*. – Tobia Tesan Apr 06 '16 at 17:22
12

A language is a method of conveying information.

A programming language is a method of conveying algorithms.

A markup language like XML is a language for conveying data.

Philipp
  • 23,166
  • 6
  • 61
  • 67
  • ... and that data may very well be a description of an algorithm. – Luaan Apr 05 '16 at 08:18
  • @Luaan ...and a programming language can also be abused to convey data. Like with JSON, for example. – Philipp Apr 05 '16 at 08:34
  • 2
    You can even recurse. I've seen NAnt scripts (an XML-based language) that contained C# code, which was only used for data storage. Using string literals that contained XML. Yes, it's the kind of thing that makes grown men cry :P – Luaan Apr 05 '16 at 08:43
2

XML is a meta-language. You use it to define specific languages. Languages never do anything, they just allow us to express things. Also, it is not true that XML is a "storage language". Just the opposite, in fact. You can store XML docs however you please. XML is better thought of as a transfer language. PS. If you don't think XML "does" anything, you'll have to explain how it is that many systems (e.g. jetty) use XML as a (bad) programming language. It's a lamentable abuse of XML, but it exists in the wild, and that just one of many examples.