7

Right now in my current company I must "parse" a csv and extract some data out of it (in the sense of data mining).

I surprised myself by defining the following data structure (Java 8):

static List<Map<String, Map<String, LocalDate[]>>> intervalStructure

Is this considered bad practice? Should I use a self-define class instead to store some data?

gnat
  • 21,442
  • 29
  • 112
  • 288
lJoSeTe4ever
  • 296
  • 2
  • 6
  • 6
    When the format of the CSV file changes, how many places will you have to adjust the way it references the ninth date of the third item in the second map that is second the first map which is seventh in the list? – TZHX Jan 27 '16 at 16:33

2 Answers2

13

This is a dumb, stringly typed data structure. It is a:

  • List
    • of Maps
      • keyed by String
      • of Maps
        • keyed by String
        • of Local date array

This makes it difficult to create, traverse, and could very well get you into code that looks like:

foo.get(1).get("bar").get("qux")[42]

And guess what... you've got a null pointer exception that was thrown on that line.

Setting up the structure isn't much fun either. That this is a rawish data structure means that you have to do all of the adds and deletes by hand working on the interfaces (that can be rather deep) rather than working with a Constructor or Builder.

You've linked the representation that is passed around in your code to the representation of the file. When one needs to change, the other needs to change - and possibly in multiple locations. Think of the joys you will have when you have an xml or json or yaml data file behind it and need to go through and change things.

This all in all is a bad idea.

A better idea would be to have one or more classes that exposes the logic of fetching the data. You have a Structure. It exposes a get(String bar, String qux) and returns back a List. This allows you to isolate the traversal of the structure (if it is that) in private fields so that it can change as needed. It also exposes an add(String bar, String qux, LocalDate date) which allows you to clearly isolate the logic for building it in a testable way.

You may find that you want other questions of that data that you want answered. Maybe you want a isInRange(...) or contains(...) or something else that you ask. Failing to have this as a nice class with corresponding methods to answer those questions, you will find that you either have a static method in a helper class (this is beginning to smell) that acts on that data.

public static boolean contains(List<Map<String, Map<String, LocalDate[]>>> arg, LocalDate value)

That looks as ugly as it is. Don't do that.

If you don't do that, you will have a copy of that logic each time you use it which is not at all DRY.

Either way, working on raw data structures like this is undesirable.

  • 1
    `stringly typed` -- not sure if that is clever pun or typo? – TZHX Jan 27 '16 at 16:56
  • 3
    @TZHX it is a [well accepted term](http://c2.com/cgi/wiki?StringlyTyped). –  Jan 27 '16 at 16:58
  • @MichaelT I clearly do not spend enough time on the Internet. – TZHX Jan 27 '16 at 17:04
  • @TZHX I should have warned you though... some consider [c2.com](http://c2.com/cgi/wiki?WelcomeVisitors) to be the equivalent of TVTropes for developers and software architects. Follow that link at your own risk of loss of productivity. –  Jan 27 '16 at 17:08
  • @MichaelT The site timed out for me, so I googled the term and ended up on Jeff Atwood's blog. :) Will check it out some other time. – TZHX Jan 27 '16 at 17:10
  • @MichaelT Do you mean "traverse" and "traversal" instead of "transverse" and "transversal". I think in english "transverse" is not a verb. I didn't just do an edit bc, below the min and not 100% sure you don't use some new cool term I never heard of. – joshp Jan 28 '16 at 00:21
  • @joshp thank you for catching them. I blame my poor spelling and the nearly correct word along with autoincorrect in the browser fixing it to what I didn't mean. –  Jan 28 '16 at 00:25
4

I'm going to respectfully disagree with Michael here, and some rather smart programmers, like Rich Hickey, tend to think along the same lines as me. The key here is that you take care not to pass this data structure all over the place, but you use it as a base layer from which you build the sort of abstractions that Michael talks about in his answer.

These sorts of data structures get a bad rap. They are extremely easy to manipulate, with highly reusable tools. Without knowing anything about your application other than that one line of code defining a data structure, I could write an interface to convert it to JSON, HOCON, or XML. I could store it in a database. I don't have to write any custom serialization/deserialization code, which lets me reuse generic libraries. I can easily write queries and filters to give me nice strongly-typed class-based layers above this one. When my CSV file format changes, I have this much more flexible and dynamic layer of isolation between it and my more rigid and static classes above it.

In summary, don't make it your sole data structure, but using it as an intermediate data structure can solve a lot of maintainability problems.

Karl Bielefeldt
  • 146,727
  • 38
  • 279
  • 479
  • Agree. As you describe, have this structure largely hidden, with a nice facade layer with abstractions a la @MichaelT for the common getters and setters. But, perhaps, add a `getRawMapKidsDontTryThisAtHome()` method, perhaps `protected`, in case you need to get down and dirty with some JSON etc... – user949300 Jan 28 '16 at 00:16
  • Hmmmm... now _that_ aspect of it (i.e. concern for flexibility) never occurred to me. Of course, the reasonableness of this approach depends on how much that will apply in the given situation. – Joshua Grosso Reinstate CMs Jan 28 '16 at 02:44