Let me ask you a completely serious counter-question: What, in your view, is the difference between "data" and "code"?
When I hear the word "data", I think "state". Data is, by definition, the thing that the application itself is designed to manage, and therefore the very thing that the application can never know about at compile time. It is not possible to hard-code data, because as soon as you hard-code it, it becomes behaviour - not data.
The type of data varies by application; a commercial invoicing system may store customer and order information in a SQL database, and a vector-graphics program might store geometry data and metadata in a binary file. In both of these cases and everything in between, there is a clear and unbreakable separation between the code and data. The data belongs to the user, not the programmer, so it can never be hard-coded.
What you seem to be talking about is, to use the most technically accurate description available to my current vocabulary: information governing program behaviour which is not written in the primary programming language used to develop the majority of the application.
Even this definition, which is considerably less ambiguous than just the word "data", has a few problems. For example, what if significant parts of the program are each written in different languages? I have personally worked on several projects which are about 50% C# and 50% JavaScript. Is the JavaScript code "data"? Most people would say no. What about the HTML, is that "data"? Most people would still say no.
What about CSS? Is that data or code? If we think of code as being something that controls program behaviour, then CSS isn't really code, because it only (well, mostly) affects appearance, not behaviour. But it isn't really data, either; the user doesn't own it, the application doesn't even really own it. It's the equivalent of code for a UI designer. It's code-like, but not quite code.
I might call CSS a kind of configuration, but a more practical definition is that it is simply code in a domain-specific language. That's what your XML, YAML, and other "formatted files" often represent. And the reason we use a domain-specific language is that, generally speaking, it's simultaneously more concise and more expressive in its particular domain than coding the same information in a general-purpose programming language like C or C# or Java.
Do you recognize the following format?
{
name: 'Jane Doe',
age: 27,
interests: ['cats', 'shoes']
}
I'm sure most people do; it's JSON. And here's the interesting thing about JSON: In JavaScript, it's clearly code, and in every other language, it's clearly formatted data. Almost every single mainstream programming language has at least one library for "parsing" JSON.
If we use that exact same syntax inside a function in a JavaScript file, it can't possibly be anything other than code. And yet, if we take that JSON, shove it in a .json
file, and parse it in a Java application, suddenly it's "data". Does that really make sense?
I argue that the "data-ness" or "configuration-ness" or "code-ness" is inherent to what is being described, not how it's being described.
If your program needs a dictionary of 1 million words in order to, say, generate a random passphrase, do you want to code it like this:
var words = new List<string>();
words.Add("aa");
words.Add("aah");
words.Add("ahhed");
// snip 172836 more lines
words.Add("zyzzyva");
words.Add("zyzzyvas");
Or would you just shove all those words into a line-delimited text file and tell your program to read from it? It doesn't really matter if the word list never changes, it's not a question of whether you're hard-coding or soft-coding (which many rightly consider to be an anti-pattern when inappropriately applied), it's simply a question of what format is most efficient and makes it easiest to describe the "stuff", whatever the "stuff" is. It's fairly irrelevant whether you call it code or data; it is information that your program requires in order to run, and a flat-file format is the most convenient way to manage and maintain it.
Assuming you follow proper practices, all of this stuff is going into source control anyway, so you might as well call it code, just code in a different and perhaps very minimalistic format. Or you can call it configuration, but the only thing that truly distinguishes code from configuration is whether or not you document it and tell end users how to change it. You could perhaps invent some bogus argument about configuration being interpreted at startup time or runtime and not at compile time, but then you'd be starting to describe several dynamically-typed languages and almost certainly anything with a scripting engine embedded inside of it (e.g. most games). Code and configuration are whatever you decide to label them as, nothing more, nothing less.
Now, there is a danger to externalizing information that isn't actually safe to modify (see the "soft coding" link above). If you externalize your vowel array in a configuration file, and document it as a configuration file to your end users, you are giving them an almost foolproof way to instantly break your app, for example by putting "q" as a vowel. But that is not a fundamental problem with "separation of code and data", it's simply bad design sense.
What I tell junior devs is that they should always externalize settings that they expect to change per environment. That includes things like connection strings, user names, API keys, directory paths, and so on. They might be the same on your dev box and in production, but probably not, and the sysadmins will decide how they want it to look in production, not the devs. So you need a way of having one group of settings applied on some machines, and other settings applied on other machines - ergo, external configuration files (or settings in a database, etc.)
But I stress that simply putting some "data" into a "file" isn't the same as externalizing it as configuration. Putting a dictionary of words into a text file doesn't mean that you want users (or IT) to change it, it's just a way of making it much easier for developers to understand what the hell is going on and, if necessary, make occasional changes. Likewise, putting the same information in a database table does not necessarily count as externalization of behaviour, if the table is read-only and/or DBAs are instructed never to screw with it. Configuration implies that the data is mutable, but in reality that is determined by process and responsibilities rather than the choice of format.
So, to summarize:
"Code" is not a rigidly-defined term. If you expand your definition to include domain-specific languages and anything else which affects behaviour, a lot of this apparent friction will simply disappear and it will all make sense. You can have non-compiled, DSL "code" in a flat file.
"Data" implies information that is owned by the user(s) or at least someone other than the developers, and not generally available at design time. It could not be hard-coded even if you wanted to do so. With the possible exception of self-modifying code, the separation between code and data is a matter of definition, not personal preference.
"Soft-coding" can be a terrible practice when over-applied, but not every instance of externalization necessarily constitutes soft-coding, and many instances of storing information in "flat files" is not necessarily a bona fide attempt at externalization.
Configuration is a special type of soft-coding that is necessary because of the knowledge that the application may need to run in different environments. Deploying a separate configuration file along with the application is far less work (and far less dangerous) than deploying a different version of the code to every environment. So some types of soft-coding are actually useful.