What does "context-free" mean in the term "context-free grammar"?

Question

Given the amount of material that tries to explain what a context-free grammar (CFG) is, I found it surprising that very few (in my sample, less than 1 in 20) give an explanation on why such grammars are called "context-free". And, to my mind, none succeeds in doing so.

My question is, why are context-free grammars called context-free? What is "the context"? I had an intuition that the context could be other language constructs surrounding the currently analyzed construct, but that seems not to be the case. Could anyone provide a precise explanation?

look up "most vexing parse" for C++ that will teach you why context-freeness is handy — ratchet freak, Aug 15 '14 at 18:28
I thought I knew what a context-free grammar was until I just read some Googled definitions. Now I wish I had an etch-a-sketch and a soft blanky...maybe I'll just go outside... +1 for a good question. Looking forward to some intelligible answers! — BrianH, Aug 15 '14 at 18:29
Your intuition is what I understand it to be, even if the formal definition of "other language constructs surrounding the currently analyzed construct" is suitably arcane. But I'm not _sure_ enough to post that as an answer. — Telastyn, Aug 15 '14 at 18:39
See wikipages on [Context-free grammar](http://en.wikipedia.org/wiki/Context-free_grammar) and [Chomsky hierarchy](http://en.wikipedia.org/wiki/Chomsky_hierarchy). In practice programming language [parsing](http://en.wikipedia.org/wiki/Parsing) has some context, often handled "outside" of "context-free" (LR or LL) parsing, e.g. by some symbol table, attributes, or environment — Basile Starynkevitch, Aug 15 '14 at 19:07

Doval · Answer 1 · 2014-08-15T19:47:19.950

It means all of its production rules have a single non-terminal on their left hand side.

For example, this grammar which recognizes strings of matched parentheses ("()", "()()", "(())()", ...) is context-free:

S → SS
S → (S)
S → ()

The left-hand side of every rule consists of a single non-terminal (in this case it's always S, but there could be more.)

Now consider this other grammar which recognizes strings of the form {a^n b^n c^n : n >= 1} (e.g. "abc", "aabbcc", "aaabbbccc"):

S  → abc
S  → aSBc
cB → WB
WB → WX
WX → BX
BX → Bc
bB → bb

If the non-terminal B is preceded by the terminal/literal character c, you rewrite that term to WB but if it's preceded by b, you expand to bb instead. This is presumably what the context-sensitivity of context-sensitive grammars is alluding to.

A context-free language can be recognized a push-down automaton. Whereas a finite state machine makes use of no auxiliary storage, i.e. its decision is based only on its current state and input, a push-down automaton also has a stack at its disposal and can peek at the top of the stack for taking decisions.

To see that in action, you can parse nested parentheses by moving left to right and pushing a left parentheses onto a stack each time you encounter one, and popping each time you encounter a right parentheses. If you never end up trying to pop from an empty stack, and the stack's empty at the end of the string, the string is valid.

For a context-sensitive language, a PDA isn't enough. You'll need a linear-bounded automaton which is like a Turing Machine whose tape isn't unlimited (though the amount of tape available is proportional to the input). Note that that describes computers pretty well - we like to think of them as Turing Machines but in the real world you can't grab arbitrarily more RAM mid-program. If it's not obvious to you how an LBA is more powerful than a PDA, an LBA can emulate a PDA by using part of its tape as a stack, but it can also choose to use its tape in other ways.

(If you're wondering what a Finite State Machine can recognize, the answer is regular expressions. But not the regexes on steroids with capture groups and look-behind/look-ahead you see in program languages; I mean the ones you can build with operators like [abc], |, *, +, and ?. You can see that abbbz matches regex ab*z just by keeping your current position in the string and regex, no stack required.)

Very nice explanation. Although, a Turing machine's tape does not need to be infinite, only unlimited. There can be a tape-factory at either end that, when the machine bumps into it, simply makes more tape. That way, at any point in time, it is finite. — Mike Dunlavey, Aug 15 '14 at 19:36
But the tape factory would need infinite tape making materials, or infinite tape making materials making materials, or ... [stack overflow] — flamingpenguin, Aug 15 '14 at 22:08
Kind of a tangent, bhat happens if you give a PDA one or two extra stacks? Can it recognize an interesting, broader class of languages in that case? — user541686, Aug 16 '14 at 08:46
@Mehrdad: You can simulate any number of stacks using two stacks: keep all the stacks stacked on top of each other on one stack and when you need to access some stack further down pop the upper stacks off and push them onto the second stack. This proves that n>2 stacks are not more powerful than 2 stacks. Now, whether 2 stacks are more powerful than 1 stack, that I don't know. My intuition says no, but that might depend on exactly what the stack primitives are. — Jörg W Mittag, Aug 16 '14 at 08:58
@JörgWMittag: two stacks is as good as a tape. Hand-wavily: use one stack as the left hand side of the tape and the other stack as the right hand side, relative to your current position. So a 2-PDA is a Turing machine. For primitives you just need to be able to pop a value from one stack and push it on the other, which is how you move along your tape. — Steve Jessop, Aug 16 '14 at 13:07
@MikeDunlavey good explanation: I had actually forgotten that caveat until now. I had a professor many years ago say the same thing, except he also said you're really just delaying the inevitable. Such a factory would need an infinite amount of trees to build the "unlimited tape" so in the end, the number of atoms in the universe is still relevant. — , Aug 19 '14 at 04:21

score 24 · Answer 2 · 2015-10-20T16:06:15.757

24

The other answers are quite long, even if accurate and correct. This is the short version.

If you have a string of characters (terminals and nonterminals) and you wish to replace a nonterminal in the string, a context-free grammar allows you to do that regardless of the characters surrounding the nonterminal.

Consider the following rules (lowercase are terminals, uppercase are nonterminals):

A -> a
AB -> a

In the first rule, you can replace an A regardless of what appears around it (context). In the second rule, you cannot replace A unless it is followed by B. While both nonterminals will be replaced in that case, the important point is that the nonterminals surrounding the A matter. One cannot replace BA with a, or B with a: only an A followed by a B because the order, the context of the nonterminals is important. This means the context of a nonterminal matters in the second rule, making it context-sensitive, while the first rule is context-free.

edited Oct 20 '15 at 16:06

answered Aug 18 '14 at 19:01

1

This is a really good explanation, though I'm not qualified to vouch for its accuracy or completeness. Is it all there is to it? – rick Aug 19 '14 at 02:57
1

Computer grammars are part of the [Chomsky hierarchy](http://en.wikipedia.org/wiki/Chomsky_hierarchy). That article is a good place to start. Also, this topic _should_ be part of any baccalaureate program in computer science. At the very least, universities should teach regular and context-free grammars since those comprise the overwhelming majority of languages that we programmers are likely to encounter. – Aug 19 '14 at 04:17
@Snowman:Very crisp.It would be better if you say that "you can't derive to `a` from `AB` unless `A` is followed by `B` instead of saying "you can't replace `A`" which might not be possible because actually you're replacing `AB` isn't it? – justin Oct 20 '15 at 10:41
@justin correct. I updated my answer to be more clear on this. – Oct 20 '15 at 16:06
@Snowman:Do you mean to replace `A` or `AB` in the second rule(context-sensitive)?I think you're still trying to replace `A` as said from your answer. – justin Oct 26 '15 at 06:34
@justin `AB`, I thought I explained it in my answer. – Oct 26 '15 at 14:10
@Snowman:That's right.But you haven't still updated your post.May be it's my fault to always persist you. – justin Oct 27 '15 at 06:46

Fizz · Answer 3 · 2014-08-15T23:50:14.213

To understand the distinction and the terminology better, it's a good idea to contrast a context-free language like aⁿbⁿ with a context-sensitive one like aⁿbⁿcⁿ. (Notation: a, b, and c are literals here and the exponent n means repeating the literal n times, n>0, say.) For instance, aabbc or aabbbcc is not in the latter language, whereas aabbcc is.

An acceptor for the context-free language aⁿbⁿ can contract a pair of a and b regardless what's around it (i.e. regardless of the context in which ab appears) and it will function correctly, accepting only strings in the language and rejecting anything else, i.e. the grammar is S -> aSb | ab. Note that there are no terminals on the left side of the production(s). (There are two production rules, but we're just writing them compactly.) The acceptor can basically make a local, context-free decision.

In contrast, you can't do something like that for the context-sensitive language aⁿbⁿcⁿ, because for latter you must remember somehow the context you were in, i.e. how many contractions of ab you do to match them with contractions of bc. A grammar for the latter language is

S -> abc | aBSc
Ba -> aB
Bb -> bb

Note that you have both terminals and non-terminals on the left in the last two rules. The terminals on the left are the context in which the non-terminals can be expanded.

Bootnote regarding "contract" vs. "expand" terminology etc.: although the formal grammars are [formally, hah] generative, the way they're actually implemented in parsers is actually reductionist, i.e. you contact everything to a non-terminal, basically applying the rules "in reverse", which is why even the first grammar given above is not practical in a program (it would give you the famous shift-reduce conflict because you can't decide which rule to apply), but the above two grammars suffice for illustrating the distinction between context-free and context-sensitive. The issue of ambiguity in context-free grammars is rather complicated, and not really the topic of this question so I'm not going to say more here, especially since it turns out that Wikipedia has a decent article on that. In contrast its articles on context-free and especially the one on context-sensitive language are !@#$@!#$ especially if you are new to the topic... I guess that's more on my TODO list.

score 5 · Answer 4 · answered Aug 15 '14 at 23:52

The answers above give a pretty good definition of what it is. Let's see if I can put it in my own words, so that you will have 23 explanations instead of 20. The whole purpose of a grammar, any grammar, is to figure out if a particular sentence is a sentence in the given language. However, what we really use grammars and parsing for is to figure out what the sentence means. It's like the old diagramming of a sentence you may or may not have done back in English class back in school. A sentence is made of a subject part and a predicate part, a subject part has a noun and maybe some adjectives, a predicate part has a verb and perhaps an object noun, with some more adjectives, etc.

If there were a grammar for English (and I don't think there is, not in the computer science sense) then it would have rules of the following form, called productions.

Sentence -> SubjectPart PredicatePart
SubjectPart -> Adjective Noun

etc...

You could then write a program and hand it any sentence, and the program could use the grammar to figure out which part of the sentence each word is, and what relation they have to each other.

If in every production, there is only one thing on the left side, then that means that whenever you see the right side in the sentence, you are allowed to substitute in the left side. For instance whenever you saw adjective noun, you could say "That's a SubjectPart" without paying any attention to anything outside of that phrase.

However, English (even the simplified description of English I gave above) is context-sensitive. "Adjective noun" isn't always a SubjectPart, it could be a NounPhrase in a PredicatePart. It depends on the context. Let's expand our pseudo-English grammar a bit:

Sentence -> SubjectPart PredicatePart
SubjectPart -> Adjective Noun
PredicatePart -> VerbPhrase ObjectNounPhrase
VerbPhrase ObjectNounPhrase -> VerbPhrase Adjective Noun

You can only make an "adjective noun" into an ObjectNounPhrase if it comes right after a VerbPhrase.

Basically, if you have a production and you can apply it any time you want, no matter what surrounds it, it is context-free.

You can always tell if a grammar is context free easily. Just check if there is more than one symbol on the left side of the arrows.

Any language might be described by more than one grammar. If some grammar for a language is context-free, the language is context free. It can be proven for some languages that there is no context-free grammar possible. I suppose there might be a context-free grammar for the simplified pseudo-English subset I am describing above.

As for why it matters, it requires a simpler kind of program to parse a context-free grammar. As noted in the other answers, it doesn't require the full power of a Turing machine to parse a context-free grammar. A lookahead LR(1) parser (which is a kind of pushdown machine) for a particular context-free grammar can parse any sentence in that grammar in time and space linear to the length of the sentence. If the sentence is in the language, the parser will produce a structure tree identifying what each symbol in the sentence means (or at least what part it plays in the structure). If the sentence is not in the grammar, the parser will notice and stop on the first symbol which is impossible to reconcile with the grammar and preceding symbols (on the first "error").

What's even better is that there are programs which you can give a description of a grammar, and a list of instructions about what to do with each part (in a sense attaching a "meaning" to each production) and the program will write the parser for you. The program will parse the sentence, find the structure, and run your instructions on each part of the structure. This kind of program is called a parser-generator or compiler-compiler.

This kind of language analysis was invented for automatic analysis of natural language (such as English) but it turns out that this is most useful for analyzing computer languages. A language designer can write a grammar which captures his new language, then run it through the parser-generator to get a program which parses his language, and translates, interprets, compiles, executes, etc if he wants.

In fact, in most cases you can't really do this. For instance, balanced parentheses are a context-free language, but a language where it is required to declare all variables before you use them is context-sensitive. The parser is a part of the compiler, but additional logic is required to enforce these other requirements. What you then have to do is write a grammar which captures as much of your language as possible, run that through a parser-generator, then write code which enforces the rest of the requirements (symbol table handler, etc).

We don't generally use context-sensitive grammars because they are much more poorly supported. I don't know if there is an equivalent to an LR(k) parser-generator for context-sensitive languages. Yes, a Turing machine (or linear bound machine) can parse one, but I don't know if there is a general algorithm for turning a context-sensitive grammar into a program for a Turing machine, in the sense that an LR(1) generator makes parse tables for a pushdown machine. My guess is that the tables which underlie the parser would be exponentially larger. In any case, CS students (like myself, back in the day) are usually taught context-free grammars and LR(1) parser generators such as YACC.

score 0 · Answer 5 · answered Aug 18 '14 at 18:46

0

Context-free grammars do not consider any context for production rules. Context are either terminals or non-terminals.

So: Context-free grammars only have a single non-terminal on the left side of the production rules.

answered Aug 18 '14 at 18:46

Martin Thoma

557
5
17

3

What does this add to the existing answers? Also, a production rule with two or more *non-terminals* on the left hand side isn't context free either. – Aug 18 '14 at 18:47
I think the given answers are much too long. If one would add a TL;DR, I would delete this one. – Martin Thoma Aug 18 '14 at 18:51
1

Nice! Would you say that the "context" is the extra characters that qualify when each production rule could be applied? – rick Aug 19 '14 at 03:02
1

This answer concisely answers the question, though the other answers explain it further. – Rod Maniego Jun 27 '22 at 11:30

What does "context-free" mean in the term "context-free grammar"?

5 Answers5