How did separation of code and data become a practice?

Question

Please read the question carefully: it asks how, not why.

I recently came across this answer, which suggests using a database to store immutable data:

It sounds like many of the magic numbers you describe - particularly if they are part dependent - are really data, not code. [...] It may mean an SQL type database, or it may simply mean a formatted text file.

It would seem to me that if you have data that is part of what your program does, then the thing to do is to put it in the program. For example, if your program's function is to count vowels, what's wrong with having vowels = "aeiou" in it? After all, most languages have data structures designed for precisely this use. Why would you bother to separate data by putting it in a "formatted text file", as suggested above? Why not just make that text file formatted in your programming language of choice? Now is it a database? Or is it code?

I'm sure some will think this is a dumb question, but I ask it in all seriousness. I feel like "separate code and data" is emerging culturally as some sort of self-evident truth, along with other obvious things like "don't give your variables misleading names" and "don't avoid using whitespace just because your language considers it insignificant".

Take for example, this article: The Problem with Separating Data from Puppet Code. The Problem? What problem? If Puppet is a language for describing my infrastructure, why can't it also describe that the nameserver is 8.8.8.8? It seems to me that the problem isn't that code and data are mingled,¹ but that Puppet lacks sufficiently rich data structures and ways to interface to other things.

I find this shift disturbing. Object oriented programming said "we want arbitrarily rich data structures", and so endowed data structures with powers of code. You get encapsulation and abstraction as a result. Even SQL databases have stored procedures. When you sequester data into YAML or text files or dumb databases as if you are removing a tumor from the code, you lose all of that.

Can anyone explain how this practice of separating data from code came to be, and where it's going? Can anyone cite publications by luminaries, or provide some relevant data that demonstrates "separate code from data" as an emerging commandment, and illustrates its origin?

^{1: if one can even make such distinctions. I'm looking at you, Lisp programmers.}

Feel free to bury all of the html & css in your language of choice. — JeffO, Feb 19 '14 at 06:08
I think what the author of the quote meant is that the magic numbers aren't really immutable. — Pieter B, Feb 19 '14 at 09:28
There's nothing wrong with hard-coding the vowels. If your application will only ever be used to count the vowels in English. — Michael Paulukonis, Feb 24 '14 at 22:06
A big technical reason for separating code and data is to not have to recompile the code when the data changes. Therefore, I'd question whether it applies to the same extent to scripting languages. — user16764, Feb 24 '14 at 23:22
@MichaelPaulukonis: And putting it in a database is a fake solution. Changes needed for Dutch? Zero (not even a DB change). Changes needed for French/German? At least ISO-8859-1 support. (More than DB). Changes needed for Greek/Russian? Unicode support (more than DB). In fact, I can't think of any language where that DB is of any help. — MSalters, Feb 25 '14 at 11:30
Changes needed for Hungarian? There are 12 vowels; I could just add them to my UTF-16 config file, and Bob's yer uncle. I have no idea where you database-straw-man came from; but you can store data external to code that is not a database. Then there's the whole LISP-y code-as-data, which we see hinted at with JSON (elsewhere on this page). But separation of concerns is not a bad practice. — Michael Paulukonis, Feb 25 '14 at 14:22
I would have to spend time to find citations, so I answer short here: At school when they start teaching people, they say you shouldn't have magic numbers and strings not because you haven't externalized them into a DB or flat file or config file. It is because 'magic' numbers and strings are just magic. When people see them they can't infer why they have whatever value they have. Either externalizing into a variable or put into whatever mechanism of preferences, the programmer would have some idea about why the values are those values. — InformedA, Jul 13 '14 at 07:22
... after that, people continue with more ways to externalize to make things more convenient (thus more clues to the question of why this value is the value comes together naturally). That is the how for you. — InformedA, Jul 13 '14 at 07:26
the magic word here is "invariant" - vowels for instance don't change, as long as your application only supports english language, but things like π or 'e' are unlikely to change in your lifetime (or that of your planet) so it's nonsense to externalise them. Defining as symbolic constants though helps with readability. — robert, Dec 01 '14 at 09:40
Let's assume your user adds a new language with totally different vowels. Should you add those to the code as well, or add it to a database along with other languages and vowels? I frequently meet the same problem btw. and I did not find a "clean" solution yet. I always wanted to ask this question but could not find the words to describe it, especially not in English. — inf3rno, Oct 31 '17 at 22:43
I think the main reason for data and code separation that you need to change the code when you change your data, and data use to change frequently. I thought about this, and it seems like there is a strong coupling between data and code sometimes, which you should losen somehow. — inf3rno, Oct 31 '17 at 23:49
I think the proper way to give an init script, which generates the config file, instead of giving a config file with the vowels explicitly. So you can save your vowels to the database and bind them to the code with the generated config. Everybody is happy, because you can always change the config in the database or run the init script again if you want to change something and you don't have to touch the code. Or you can use the exact same code for a German and for an English site, only the init script and the config will change. — inf3rno, Oct 31 '17 at 23:52
Vowel counting is probably a trivial program and you don't need to think too hard about code and data being separate for something that's less than 20 lines long. But if you were to translate your vowel counting program into Welsh you'll need to include `w` and `y` as vowels. All rules should be broken if they don't make sense in your use case, but even here there are use cases that suggest using a datafile. — Nic, Sep 12 '20 at 02:37
No software is future-proof. Well, maybe very little is, but usually, data survives most of the software we code. For many companies, the capacity to change data and behaviour at different paces is valuable. There're design models strongly based on this premise: data-driven designs. Data is centric and everything else is circumstantial. It might change, come and go but the data must remain minable regardless of the software in use. — Laiv, Oct 18 '22 at 15:24

david.pfx · Answer 1 · 2014-02-20T04:37:38.370

There are many good reasons to separate data from code, and some reasons not to. The following come to mind.

Timeliness. When is data value known? Is it at the time the code is written, when it is compiled, linked, release, licensed, configure, started execution or while running. For example, the number of days in a week (7) is known early, but the USD/AUD exchange rate will be known quite late.

Structure. Is this a single data time set according to a single consideration, or might it be inherited or part of a larger collection of items? Languages like YAML and JSON enable combining of value from multiple sources. Perhaps some things that initially seem immutable are better made accessible to as properties in a configuration manager.

Locality. If all the data items are stored in a limited number of places it is far easier to manage them, particularly if some might need to be changed to new (immutable) values. Editing source code just to change data values introduces the risk of inadvertent changes and bugs.

Separation of concerns. Getting algorithms to work correctly is best separated from consideration of what data values to use. Data is needed to test algorithms, not to be part of them. See also http://c2.com/cgi/wiki?ZeroOneInfinityRule.

In answer to your question this is not a new thing. The core principles have not changed in more than 30 years, and have been written about repeatedly over that time. I can recall no major publications on the topic as it is generally not considered controversial, just something to explain to newcomers. There is a bit more here: http://c2.com/cgi/wiki?SeparationOfDataAndCode.

My personal experience is that the importance of this separation in a particular piece of software becomes greater over time, not less. Values that were hard-coded are moved into header files, compiled-in values are moved into configuration files, simple values become part of hierarchical and managed structures.

As to trends, I haven't seen any major changes in attitude amongst professional programmers (10+ years), but the industry is increasingly full of youngsters and many things I thought were known and decided keep getting challenged and reinvented, sometimes out of new insights but sometimes out of ignorance.

Could you expand on the history and trend of this practice? If everyone gave these considerations, I wouldn't have asked the question. The premise of the question is that people aren't carefully considering where their data should go (compiled constants, external databases, YAML...) but rather they are thinking only "CODE AND DATA MIXED BAD! HULK SMASH!" Why or when did this become a thing? — Phil Frost, Feb 19 '14 at 19:13
It's not part of my experience, so I can't tell you. I've added a couple of paras to my answer. — david.pfx, Feb 20 '14 at 04:38
I think "influx of youngsters" is a valid explanation, but I'm holding off on accepting because I'd like to hear from some of these youngsters to see where they got the idea. Clearly they got the "separate code and data" part, but I don't think they got the rest. Did they read it in a blog post? A book? Where and when? — Phil Frost, Feb 20 '14 at 12:31
You will always get "_____ BAD! HULK SMASH!" - that doesn't mean it's true. Often this sort of thing (e.g. "'GOTO' BAD! HULK SMASH!") is taught to beginners, without teaching them why, or what the exceptions are. — AMADANON Inc., Feb 25 '14 at 03:44
`Locality` also works in reverse: We ended up with a sort-of plugin-type system due to custom requirements for different clients, and through several years of trial and error learned to keep their constants (even tables, by way of lists of dicts) out of the database and in the code. Both because using it anywhere other than that "plugin" is incorrect, and because changes are automatically versioned when changes happen. — Izkata, Feb 25 '14 at 04:50

score 9 · Answer 2 · edited Oct 18 '22 at 14:49

Data scales much better, and can be queried and modified with much more ease, when it is separated from the code. Even if your data is codish in nature - for example, your data represents rules or commands - if you can store represent that code as structured data, you can enjoy the benefits of storing in separately:

permissions

If the data is hard-coded, you'll need to edit source file in order to edit that data. That means that either:

Only developers can edit data. This is bad - data entry is not something that requires developer's skills and knowledge.
Non-developers can edit source file. This is bad - they might screw the source file without even knowing it!
The data is hard-coded into separate source files, and non-developers have access only to those files. But this doesn't really count - now the data is separated from the code and stored in it's own files...

editing

So, regarding who can edit the data, it's best to store it separately. How about how they'll edit the data? If you have lot's of data, typing it by hand is tedious and error-prune. Having some UI for this is much better! Even if you still have to type everything, you won't have to type the boiler-plate of the format, so there's less chance you'll mess up the format and screw the whole file!

If the data is hard coded, creating that UI will mean that an automated tool will edit you hand-written source files. Let that sink in - an automated tool will open your source files, attempt to find where the data should be, and modify that code. Brrr... Microsoft introduced partial classes to C# just to avoid those things...

If the data is separate, your automated tool will just have to edit datafiles. I'd rather believe that computer programs editing datafiles is not that uncommon nowadays...

scaling

Code and data scale very differently. As your code grows, you want to separate it into more classes and methods(or data structures and functions), but your data - no matter how much it grows - you want to keep in one place. Even if you have to separate it to multiple files, you want to bundle those files together somehow, so it'll be easier to access that data from the code.

So, imagine that you have thousands of lines of data inside a source file. The compiler/interpreter has to go through all that data each time it reads the file, and parse it with it's expensive lexer&parser - even if you are not going to access that data in this particular run of the program. Also, when you edit the actual code in that file, you have to go around the data, which cumbersome the whole process. Also, datafiles can be indexed. Hard-coded data? Not so much...

searching

You have tons of data - it's only natural you'll want to search through it.

If you store it in a database - you can use the database query language.
If you store it in an XML file - you can use XPath.
If you store it in JSON/YAML - you can load it in your favorite scripting language's REPL and search it.
Even if you store it in plain old text file, since it has a structure your program can recognize you can use grep/sed/awk to search it.

While it's true that you can also grep/sed/awk through hard coded data in a source file, it doesn't work as well, since your query can match with other, non-related lines, or miss lines that were written differently because the programming language's data representation syntax allows it.

There are tools for searching through code, but they are good for finding declarations, not hard coded data.

That being said...

It's very important to distinguish between data and code. Just because something is written as code doesn't mean it can't be data. And just because something is written with a data representation doesn't mean it isn't, in fact, code.

I had a class when we had very strict rules about "magic numbers" - we couldn't have any numbers in our code. That means we had to do things like:

#define THE_NUMBER_ZERO 0
//....
for(int i=THE_NUMBER_ZERO;i<cout;++i){
//....

which is outright ridiculous! Yes, 0 is technically "data", but it's just as part of the code as the rest of the for loop! So even though we can represent it as data and separate it from the code, that doesn't mean we should. Not because we want to leave data inside the code, but because it's not really a data - not any more than the rest of the code, which is also compiled to ones and zeros...

score 7 · Answer 3 · answered Feb 19 '14 at 08:02

7

I think there is some confusion going on. You are mixing two things together: "Separating code and data" and "expressing program's behavior as data".

In your case, you are actually worried about the second and mixing the first one into it. When you express program's behavior as data it makes it easier to extend. In your example with vowels = "aeiou", then adding new vowel is as simple as adding a character. If you have this data externally, you can change this behavior without having to recompile the program.

And when you think about it, OOP is extension of this thinking. Binding data and behavior together will allow you to change program's behavior based on program's data.

answered Feb 19 '14 at 08:02

Euphoric

36,735
6
78
110

2

Cause naturally, the list of vowels is going to change. – cHao Feb 19 '14 at 08:43
13

@cHao As soon as i18n steps in, it *is.* – Angew is no longer proud of SO Feb 19 '14 at 08:55
2

i18n can break your head - see some perverse examples in Java in http://www.javaspecialists.eu/archive/Issue209.html – Rory Hunter Feb 19 '14 at 09:35
3

@Angew: As soon as i18n steps in, though, *you're screwed anyway*. You need code for this; the naïve solution is incapable of handling every case even in English. (Forget the `ï` for a second; let's talk about `y` and `w`!) Moving the list out to a database isn't going to fix that, and is actually harmful -- it's complexity that'll be worthless if done wrong, but you won't even know what "wrong" *is* unless you're designing for i18n from the ground up. At which point you're already realizing that a list of vowels just isn't going to cut it anyway. – cHao Feb 19 '14 at 14:54
@cHao That was just an example. Of course I meant "adding new vowel" with slight sarcasm. – Euphoric Feb 19 '14 at 15:54
@cHao, forget about i18n and the vowels example. The point is you'd be surprised how often something you think is constant to you program and will never change turns out to be something you need to be able to modify or configure. – Ben Lee Feb 24 '14 at 21:27
2

@BenLee: I wouldn't be a bit surprised, actually. I'm currently working on changing some code like that as we speak. But outsourcing everything to the database is fortune-telling of a whole other sort. If you don't already know whether something will need to be modified -- and more importantly, if you don't yet know *how* it will need to be modified -- then IMO it's better to wait til you need that flexibility before adding it. – cHao Feb 24 '14 at 22:43
@cHao, I actually agree with you entirely. I was just trying to clarify Euphoric's point, since the discussion was sidetracked by a poor example. Even though I don't really agree with said point, it is a worthy of consideration, which is why I thought it was a good idea to clarify it. – Ben Lee Feb 24 '14 at 23:46
@cHao +1, moving everything to the database does not mean that you need to make them editable for anybody. It is just a storage mechanism. I think high level "constants", which are relevant from app perspective should be in the database, for example default language. While low level "constants" should be separated from the data, for example database username and password. The latter one can stay in a config file or can go to a different database to a maintenance subdomain if we want to automate the installation, version update, etc.. – inf3rno Nov 04 '17 at 00:52

score 6 · Answer 4 · edited Apr 12 '17 at 07:31

6

For example, if your program's function is to count vowels, what's wrong with having vowels = "aeiou" in it?

Storing configuration externally allows you to have one version of the code that is expected to work with many configurations, the alternative is to maintain many versions of the software that differ only by configuration.

You mention vowels = "aeiou" , what if i sometimes want "y", should I have to rebuild the entire program? Can i upgrade versions easily now that i have modified the code? If there is an error, did I cause it, or is the program broken?

If this is inside your program, it implies that your program does not expect users to change the definition of vowels without scanning the code to see the possible side effects. If the definition is stored externally, it implies that the program should not break for any reasonable value set in the configuration.

When you sequester data into YAML or text files or dumb databases as if you are removing a tumor from the code

Some see it as the opposite, that is, you are removing the tumor of code from your precious data, see: Torvalds' quote about good programmer

edited Apr 12 '17 at 07:31

Community

1

answered Feb 19 '14 at 04:39

FMJaguar

3,039
18
14

4

The Torvalds quote refers to data structures, not data. – user949300 Feb 19 '14 at 04:50
The OP states: "Object oriented programming said "we want arbitrarily rich data structures", and so endowed data structures with powers of code." – FMJaguar Feb 19 '14 at 05:30
2

If you make a fundamental change to the definition of what a vowel is, you would at the need to rerun all automated tests. Systems rarely if ever have the ability to rerun tests when a configuration file changes on a deployed system. So such definitions need to be built into the system; perhaps as two hard coded sets with a configuration option to select between them. – soru Feb 19 '14 at 09:40
+1 for the Torvalds' quote. I agree with this sentiment: in the example of puppet, I think the issue is that puppet doesn't have a good data structure to represent the information people want to put in it. Rather than fix the data structures, the puppet developers asserted that "data in code" is the problem (why? That's the question!) and developed [hiera](http://docs.puppetlabs.com/hiera/1/), which I see as little more than moving the problem somewhere else, and additionally making it impossible to associate behavior with data. – Phil Frost Feb 19 '14 at 19:18
Yes, if you want to change what the program does, then you have to recompile the program. Why is that surprising? – user253751 Oct 28 '20 at 11:44

score 2 · Answer 5 · answered Feb 19 '14 at 06:46

I was on one project where the lead insisted on putting reference data into little tables, and I thought that was silly. But since we already had our persistence infrastructure and connectivity set up, it ended up being a pretty low cost on top of the other persistence operations we were doing.

Now, I still think that was a silly decision, and if we didn't have the infrastructure on hand, I just wouldn't have done it.

But some of the arguments in favor I see are:

If you have a database mindset, then putting reference data into the SQL database allows you join on it for reporting.
If you have an admin utility, or access to the database, then you can tweak the values at runtime. (Although that can be playing with fire.)

Also, sometimes policy gets in the way of coding practices. For example, I've worked at several shops where pushing an .xml file is A-OK, while touching a line in code requires a full regression cycle, and maybe a load test. So there was one team I was on where my .xml files for the project were extremely rich (and maybe -heh- might have contained some code).

I always ask myself if I'm going to enjoy the benefit of pushing data out of the code into an external data store, even if it's just a text file, but I've worked with people who just see it that way as their first impulse.

Good comment about shop procedures, where editing XML is "o.k." but editing the same thing in code is a big hassle. — user949300, Feb 19 '14 at 07:07
worked in one shop where everything was in the database that could be, down to the screen texts. Apart from the user interface code, the only thing not in the database was the database location and credentials... — jwenting, Feb 19 '14 at 07:27
it always sounds silly until, one day, someone asks "can we reconfigure this for user X who is demanding it", and then it doesn't seem so silly after all. Damn customers :) — gbjbaanb, Feb 19 '14 at 11:33
...and if that day is "never", then that's a long time feeling silly — sea-rob, Feb 19 '14 at 15:56

score 2 · Answer 6 · answered Feb 25 '14 at 00:39

Let me ask you a completely serious counter-question: What, in your view, is the difference between "data" and "code"?

When I hear the word "data", I think "state". Data is, by definition, the thing that the application itself is designed to manage, and therefore the very thing that the application can never know about at compile time. It is not possible to hard-code data, because as soon as you hard-code it, it becomes behaviour - not data.

The type of data varies by application; a commercial invoicing system may store customer and order information in a SQL database, and a vector-graphics program might store geometry data and metadata in a binary file. In both of these cases and everything in between, there is a clear and unbreakable separation between the code and data. The data belongs to the user, not the programmer, so it can never be hard-coded.

What you seem to be talking about is, to use the most technically accurate description available to my current vocabulary: information governing program behaviour which is not written in the primary programming language used to develop the majority of the application.

Even this definition, which is considerably less ambiguous than just the word "data", has a few problems. For example, what if significant parts of the program are each written in different languages? I have personally worked on several projects which are about 50% C# and 50% JavaScript. Is the JavaScript code "data"? Most people would say no. What about the HTML, is that "data"? Most people would still say no.

What about CSS? Is that data or code? If we think of code as being something that controls program behaviour, then CSS isn't really code, because it only (well, mostly) affects appearance, not behaviour. But it isn't really data, either; the user doesn't own it, the application doesn't even really own it. It's the equivalent of code for a UI designer. It's code-like, but not quite code.

I might call CSS a kind of configuration, but a more practical definition is that it is simply code in a domain-specific language. That's what your XML, YAML, and other "formatted files" often represent. And the reason we use a domain-specific language is that, generally speaking, it's simultaneously more concise and more expressive in its particular domain than coding the same information in a general-purpose programming language like C or C# or Java.

Do you recognize the following format?

{
    name: 'Jane Doe',
    age: 27,
    interests: ['cats', 'shoes']
}

I'm sure most people do; it's JSON. And here's the interesting thing about JSON: In JavaScript, it's clearly code, and in every other language, it's clearly formatted data. Almost every single mainstream programming language has at least one library for "parsing" JSON.

If we use that exact same syntax inside a function in a JavaScript file, it can't possibly be anything other than code. And yet, if we take that JSON, shove it in a .json file, and parse it in a Java application, suddenly it's "data". Does that really make sense?

I argue that the "data-ness" or "configuration-ness" or "code-ness" is inherent to what is being described, not how it's being described.

If your program needs a dictionary of 1 million words in order to, say, generate a random passphrase, do you want to code it like this:

var words = new List<string>();
words.Add("aa");
words.Add("aah");
words.Add("ahhed");
// snip 172836 more lines
words.Add("zyzzyva");
words.Add("zyzzyvas");

Or would you just shove all those words into a line-delimited text file and tell your program to read from it? It doesn't really matter if the word list never changes, it's not a question of whether you're hard-coding or soft-coding (which many rightly consider to be an anti-pattern when inappropriately applied), it's simply a question of what format is most efficient and makes it easiest to describe the "stuff", whatever the "stuff" is. It's fairly irrelevant whether you call it code or data; it is information that your program requires in order to run, and a flat-file format is the most convenient way to manage and maintain it.

Assuming you follow proper practices, all of this stuff is going into source control anyway, so you might as well call it code, just code in a different and perhaps very minimalistic format. Or you can call it configuration, but the only thing that truly distinguishes code from configuration is whether or not you document it and tell end users how to change it. You could perhaps invent some bogus argument about configuration being interpreted at startup time or runtime and not at compile time, but then you'd be starting to describe several dynamically-typed languages and almost certainly anything with a scripting engine embedded inside of it (e.g. most games). Code and configuration are whatever you decide to label them as, nothing more, nothing less.

Now, there is a danger to externalizing information that isn't actually safe to modify (see the "soft coding" link above). If you externalize your vowel array in a configuration file, and document it as a configuration file to your end users, you are giving them an almost foolproof way to instantly break your app, for example by putting "q" as a vowel. But that is not a fundamental problem with "separation of code and data", it's simply bad design sense.

What I tell junior devs is that they should always externalize settings that they expect to change per environment. That includes things like connection strings, user names, API keys, directory paths, and so on. They might be the same on your dev box and in production, but probably not, and the sysadmins will decide how they want it to look in production, not the devs. So you need a way of having one group of settings applied on some machines, and other settings applied on other machines - ergo, external configuration files (or settings in a database, etc.)

But I stress that simply putting some "data" into a "file" isn't the same as externalizing it as configuration. Putting a dictionary of words into a text file doesn't mean that you want users (or IT) to change it, it's just a way of making it much easier for developers to understand what the hell is going on and, if necessary, make occasional changes. Likewise, putting the same information in a database table does not necessarily count as externalization of behaviour, if the table is read-only and/or DBAs are instructed never to screw with it. Configuration implies that the data is mutable, but in reality that is determined by process and responsibilities rather than the choice of format.

So, to summarize:

"Code" is not a rigidly-defined term. If you expand your definition to include domain-specific languages and anything else which affects behaviour, a lot of this apparent friction will simply disappear and it will all make sense. You can have non-compiled, DSL "code" in a flat file.
"Data" implies information that is owned by the user(s) or at least someone other than the developers, and not generally available at design time. It could not be hard-coded even if you wanted to do so. With the possible exception of self-modifying code, the separation between code and data is a matter of definition, not personal preference.
"Soft-coding" can be a terrible practice when over-applied, but not every instance of externalization necessarily constitutes soft-coding, and many instances of storing information in "flat files" is not necessarily a bona fide attempt at externalization.
Configuration is a special type of soft-coding that is necessary because of the knowledge that the application may need to run in different environments. Deploying a separate configuration file along with the application is far less work (and far less dangerous) than deploying a different version of the code to every environment. So some types of soft-coding are actually useful.

score 1 · Answer 7 · answered Feb 24 '14 at 20:44

I suggest reading this classic article by Oren Eini (a.k.a Ayende Rahien)

http://ayende.com/blog/3545/enabling-change-by-hard-coding-everything-the-smart-way

My own takeaway from it is to focus on simplicity and readability. This can mean that things that are unlikely to be reconfigured are best left hard-coded (readably). This allows you to use a programming language's full syntax to express the parameters, as well as gain beneficial side effects like code completion and compiler errors on misuse.

This way you potentially avoid the complexities of parsing/interpreting ("but someone else parses my YAML/JSON" - mapping parsed text into the specific API calls can be a form of interpreting), and avoid the complexity of another step between the "data" and its use.

Some cases do lend themselves to being expressed in data even in a scenario like this : for example, specifying thousands of points in 3D space may be better suited for a text file than code, although in some languages, including C using struct initializers, code can be appropriate even for that.

score 1 · Answer 8 · answered Jul 13 '14 at 06:41

Ok, lets assume you want to write some kind of c++ program for your leisure. You know exactly what it has to do and what it will never need to do. Now take any book on "modern software design". Heres the rule of the game: For every class in your project and every even so tiny case you have to implement each and every fancy pattern you find described in that book in order to make your code a "clean design" . Well, "dependency injection" will be enough for many ppl, I guess. (It's c++, not java!) Programming is taught from a more and more theoretical point of view. It is not sufficient you get the job done, you have to write code that is maintanable, fool prove... all fine and right. The problem starts when ppl. stop thinking about the actual reason, design patterns were invented and become dogmatic.

Let me stop you writing your letter counting tool by (over)using a single simple designe principle: When you write code that does a certain job on input data of a certain type, make sure it is able to perform that task for any given input data of that type. - When you want to write a letter countig tool, it clearly makes sense to write it in a way so it will not only be able to count vowels, but "any letter". - Since you might not know what the corpus you are parsing actually is, you may as well choose a very general encoding (UTF-16) and cover most (all?) written languages and their symbols.

Up to that point, we have a function with two arguments (the corpus and the letters to be counted). We are only concerned to find a reasonably general "type" or "class" the letters belong too: we certainly can do better than ASCII symbols!

Enter a demon wielding the "generalisation and reusability"-dogma: - Why not count any symbol of any class in a input stream of that class? (abstract from letters to bit sequences of arbitrary but finite length as that is the most general you can get with a computer...) - Wait, even then we are still counting in natural numbers. However counting can be generalized as a mapping from a countable set to itself fulfilling the axioms ... [you get the idea]

Now that example might be silly, but if you consider more complex design tasks than a counting tool you might well find all opportunity to introduce additional abstraction required according to some kind of design pattern you found in your book.

Seperation of "data" and "code" will probably be either trivial (function arguments) or you will find yourself treating invariants as variable ("data").

If there is ever any confusion it is likely about "interfaces" and "services" and all the class specifics (eg. types) being suddenly "data", that is dependencies to be injected from outside. I feel that informatics courses taught at university have become much like lectures in philosophy and there is less time for real projects so that students can gain experience how to make software that works. If you ever wonder why you are required to use an insanely complex pattern instead of an obvious solution, this developement is (likely) how that requirement was "created"...

To your specific problem: If you could 1.) write a program with a maximum of hard-coding for your specific case and then 2.) generalize from that code in a straight forward way by eg. introducing more function arguments and using other "trivial patterns" you can be sure you are separating code and data, the obvious way, like it has been done since functional programming had be invented. (ofc you skip 1. and do 2. instantly...)

Anything non-obvious here is likely a case of "theory-deadlock": Like writing an interface refering to an interface and yet another interf... and at the end you have a neat little xml-file in order to configure all these interfaces and the dependencies to be injected into your class-interface-clutter.

Lets just hope, the xml-parser you then require does not need an xml-config in order to work...