The answers above give a pretty good definition of what it is. Let's see if I can put it in my own words, so that you will have 23 explanations instead of 20. The whole purpose of a grammar, any grammar, is to figure out if a particular sentence is a sentence in the given language. However, what we really use grammars and parsing for is to figure out what the sentence means. It's like the old diagramming of a sentence you may or may not have done back in English class back in school. A sentence is made of a subject part and a predicate part, a subject part has a noun and maybe some adjectives, a predicate part has a verb and perhaps an object noun, with some more adjectives, etc.
If there were a grammar for English (and I don't think there is, not in the computer science sense) then it would have rules of the following form, called productions.
Sentence -> SubjectPart PredicatePart
SubjectPart -> Adjective Noun
etc...
You could then write a program and hand it any sentence, and the program could use the grammar to figure out which part of the sentence each word is, and what relation they have to each other.
If in every production, there is only one thing on the left side, then that means that whenever you see the right side in the sentence, you are allowed to substitute in the left side. For instance whenever you saw adjective noun, you could say "That's a SubjectPart" without paying any attention to anything outside of that phrase.
However, English (even the simplified description of English I gave above) is context-sensitive. "Adjective noun" isn't always a SubjectPart, it could be a NounPhrase in a PredicatePart. It depends on the context. Let's expand our pseudo-English grammar a bit:
Sentence -> SubjectPart PredicatePart
SubjectPart -> Adjective Noun
PredicatePart -> VerbPhrase ObjectNounPhrase
VerbPhrase ObjectNounPhrase -> VerbPhrase Adjective Noun
You can only make an "adjective noun" into an ObjectNounPhrase if it comes right after a VerbPhrase.
Basically, if you have a production and you can apply it any time you want, no matter what surrounds it, it is context-free.
You can always tell if a grammar is context free easily. Just check if there is more than one symbol on the left side of the arrows.
Any language might be described by more than one grammar. If some grammar for a language is context-free, the language is context free. It can be proven for some languages that there is no context-free grammar possible. I suppose there might be a context-free grammar for the simplified pseudo-English subset I am describing above.
As for why it matters, it requires a simpler kind of program to parse a context-free grammar. As noted in the other answers, it doesn't require the full power of a Turing machine to parse a context-free grammar. A lookahead LR(1) parser (which is a kind of pushdown machine) for a particular context-free grammar can parse any sentence in that grammar in time and space linear to the length of the sentence. If the sentence is in the language, the parser will produce a structure tree identifying what each symbol in the sentence means (or at least what part it plays in the structure). If the sentence is not in the grammar, the parser will notice and stop on the first symbol which is impossible to reconcile with the grammar and preceding symbols (on the first "error").
What's even better is that there are programs which you can give a description of a grammar, and a list of instructions about what to do with each part (in a sense attaching a "meaning" to each production) and the program will write the parser for you. The program will parse the sentence, find the structure, and run your instructions on each part of the structure. This kind of program is called a parser-generator or compiler-compiler.
This kind of language analysis was invented for automatic analysis of natural language (such as English) but it turns out that this is most useful for analyzing computer languages. A language designer can write a grammar which captures his new language, then run it through the parser-generator to get a program which parses his language, and translates, interprets, compiles, executes, etc if he wants.
In fact, in most cases you can't really do this. For instance, balanced parentheses are a context-free language, but a language where it is required to declare all variables before you use them is context-sensitive. The parser is a part of the compiler, but additional logic is required to enforce these other requirements. What you then have to do is write a grammar which captures as much of your language as possible, run that through a parser-generator, then write code which enforces the rest of the requirements (symbol table handler, etc).
We don't generally use context-sensitive grammars because they are much more poorly supported. I don't know if there is an equivalent to an LR(k) parser-generator for context-sensitive languages. Yes, a Turing machine (or linear bound machine) can parse one, but I don't know if there is a general algorithm for turning a context-sensitive grammar into a program for a Turing machine, in the sense that an LR(1) generator makes parse tables for a pushdown machine. My guess is that the tables which underlie the parser would be exponentially larger. In any case, CS students (like myself, back in the day) are usually taught context-free grammars and LR(1) parser generators such as YACC.