How to avoid duplication with Data Sources

Question

I just started reading The Pragmatic Programmer e2. I came a cross the topic of avoiding Duplication with Data Sources, which I did not fully understand. The pragraph reads as follows:

Many Data sources allow you to introspect on their data schema. This can be used to remove much of the duplication between them and your code. Rather than manually creating the code to contain this stored data, you can generate the containers directly from the schema. Many persistence frameworks will do this heavy lifting for you.

I did not understand that. How introspection can be used to avoid duplication? and what does he mean by 'generate the code directly from the schema'? Is there any practical example?

There's another option, and one we often prefer. Rather than writing code that represents external data in a fixed structure (an instance of a struct of class for example), just stick it into a key/value data structure (your language might call it a map, hash, ditionary, or even object). On its own this is risky .... we recommend adding a second layer to this solution: a simly table-driven validation suite that verifies that the map you've created contains at least the data you need. Your API documentation tool might be able to generate this.

Does this mean I should not create a model representation of the tables in RDBMS? How key/value will be represented? and how does this help avoid duplication with data sources?

I am from Java background, but I have not been coding for a while now. I am not sure if that is the reason I cannot understand this. I'd appreciate if you could provide the explanation with practical example

score 1 · Answer 1 · answered Nov 21 '20 at 15:01

1

If you have a table with three columns, say Name, Address and Age, you need somewhere to put this data when reading it into your program. For static languages this is frequently a map of some sorts.

You can also write an object with three fields, "dto.name", "dto.address", "dto.age" which then needs to have code for reading from the database and putting in the dto, and code for getting from the dto and writing to the database. Doing this rapidly becomes tedious, especially if the database table changes because you need to keep the code in sync.

That work can be done automatically for you using one of these frameworks, so you don't have to. That is what is talked about here, and is usually considered a benefit.

answered Nov 21 '20 at 15:01

Thorbjørn Ravn Andersen

1,297
10
14

Interested to know some example frameworks. – lennon310 Nov 21 '20 at 15:45
@lennon310 Those frameworks are often known as ORMs (Object-Relational Mapper) - The first example which comes to my mind would be Entity Framework from the .NET ecosystem: https://docs.microsoft.com/en-us/ef/core/ – Ben Cottrell Nov 21 '20 at 19:08
Look at hibernate for java. – Thorbjørn Ravn Andersen Nov 21 '20 at 19:37
Thanks @BenCottrell and Thorbjørn Ravn Andersen – lennon310 Nov 22 '20 at 02:00

Flater · Accepted Answer · 2021-04-21T09:32:21.607

So, you have an application, and you have a database. Your database table looks like this:

CREATE TABLE People (
    Id int,
    LastName varchar(255),
    FirstName varchar(255),
    Address varchar(255),
    City varchar(255)
);

And your application takes data from this database and works with it. Regardless of how your code is structured, you'll always find that the code needs to know exactly which columns exist on the database table.

Whether it's via data tables:

foreach(DataRow row in dr)
{
    int id           = row.Field<int>("Id");
    string lastName  = row.Field<string>("LastName");
    string firstName = row.Field<string>("FirstName");
    string address   = row.Field<string>("Address");
    string city      = row.Field<string>("City");
}

Or, much more commonly nowadays, enshrining the table structure in an entity class:

public class Person
{
    public int Id { get; set; }
    public string LastName { get; set; }
    public string FirstName { get; set; }
    public string Address { get; set; }
    public string City{ get; set; }
}

Now let's assume this application works, but we need to make a change to it. In this change, we end up adding/changing/removing fields from Person. Now, we are forced to both update the database and the code, in both cases manually, and hope that we make the exact same change twice so that the system keeps working.

This is the duplication that's being talked about, i.e. the fact that you have to do the same job (e.g. adding a field) multiple times.

Wouldn't it be much nicer if we would only need to do this job once (whether database or code), and then have the system figure out how to change the other (code or database) based on what we did?

And then you realize that this can be automated. Suppose I only gave you the above Person class and told you to create a table that can contain that information. You'd have all the information you need to create your [People] table and its columns. Therefore, if we write a tool that interprets a given class and is able to generate the equivalent database structure, that means we don't have to manually do both anymore.

This is precisely what ORMs like EF do for you. Note that I'll use Entity Framework as an example here, but the answer applies to any ORM with the discussed features.

EF even allows you to do it in either direction.

In Code First, you write the C# classes that match your intended database structure, and then you tell EF to create a new database (or alter an existing database) to fit the structure described in your classes.
In Database First, you provide an existing database structure, and EF will generate the C# classes that match the structure.

Additionally, because EF is now closely aware of how your classes and your database tables are structured (and which class property is represented by which database column), EF is able to write your SQL statements for you.

You give EF a Person object and tell it to add it to the database, and EF will use that information to generate a complete INSERT SQL statement to do that for you. This almost completely cuts out having to manually write any SQL.

Many ORMs exist, not all of them work the same way. But overall, their goal is to automate the boring bits of mapping that are inherently required to have data from a database being used by an application.

Can we mention somewhere that EF encourage tight coupling of storage schema to application model? For example, the common antipattern is use of DTO as domain objects. — Basilevs, Apr 21 '21 at 13:03
@Basilevs: The coupling is as tight as you develop it to be. It's perfectly possible to loosely couple EF to the domain/application, but it requires an extra layer, which requires extra effort. As much as I'm not a fan of this extra layer, it's not inherently an anti-pattern just because I don't like it. It only becomes an anti-pattern when it disproportionately brings more drawbacks to the table than it does benefits - but that can be said about anything and isn't specific to EF. — Flater, Apr 21 '21 at 13:06

How to avoid duplication with Data Sources

2 Answers2