5

I am learning design patterns in Java and also working on a problem where I need to handle huge number of requests streaming into my program from a huge CSV file on the disk. Each CSV line is one request, and the first field in each line indicates the message type. There are 7 types of messages, each of which should be handled differently.

The code is something like the following:

class Handler {
    private CustomClass {….}
    Map<String, CustomClass> map = new HashMap<String, CustomClass>();
    Public void runFile() {
        // Read the huge CSV file with millions of records, line by line
        // for each line, get the string before the first comma (say X)
        switch (X) {
        case 1 : myMethod1(….); break;
        case 2 : myMethod2(….); break;
        case 3 : myMethod3(….); break;
        // ...
        default: // ...
        }
    }
    // Methods 1, 2, 3 declarations
}

Note 1: Some of the methods affect the map and others don't.
Note 2: Each request (method) uses different variables from the CSV line and executes a different logic.
Note 3: requests/methods are NOT connected; i.e. myMethod2() does not logically follow myMethod1().

Now my question – What is an appropriate design pattern for this problem? Is it fine if I keep the whole logic in one class (similar to the code above) without changing?

Shad
  • 61
  • 1
  • 6
  • What is CustomClass role? – Tulains Córdova Jun 24 '17 at 01:45
  • @TulainsCórdova - The Custom Class holds some data captured from the input file. Say something like: Map – Shad Jun 24 '17 at 01:49
  • Do myMethod1, myMethod2, etc. need access to the Map? – Tulains Córdova Jun 24 '17 at 01:51
  • @TulainsCórdova Yes, all of the methods need access to the map – Shad Jun 24 '17 at 01:53
  • Some of the methods modify the map, others just use its data – Shad Jun 24 '17 at 02:01
  • Possible duplicate of [Choosing the right Design Pattern](https://softwareengineering.stackexchange.com/questions/227868/choosing-the-right-design-pattern) – gnat Jun 24 '17 at 02:26
  • Millions of requests means use an Executor of some sort. – user949300 Jun 24 '17 at 04:31
  • NB: Do not write code to parse the CSV file. Use an existing CSV library that will parse the file into records, and write code to handle the records. – kevin cline Jun 24 '17 at 06:47
  • For C#, I adapted a custom ORM I made for myself to be able to work with CSVs as if they were databases, using a bit of Reflection Magic. My technique accepts the contents of the CSV and the types of objects I want back, and it pops out a list with the parsed contents in strong typed lists - without a bunch of specific parsers. I have no idea if that is even possible in java. – T. Sar Jun 25 '17 at 01:55

2 Answers2

5

Keep It Simple, Stupid (KISS)

What is an appropriate design pattern for this problem? Is it fine if I keep the whole logic in one class (similar to the code above) without changing?

Yes, just leave all the logic in that class. Don't go looking to apply exotic design patterns just because design patterns are nifty. The only thing you'll accomplish with a design pattern is making things complicated unneccessairly.

Winston Ewert
  • 24,732
  • 12
  • 72
  • 103
  • One should always assume that the sample code is just a simplified example made to illustrate the real problem/code which is probably much longer or complex. – Tulains Córdova Jun 24 '17 at 02:37
  • 2
    @TulainsCórdova, maybe. I usually find people going for the complexity far too quickly. – Winston Ewert Jun 24 '17 at 03:33
  • What "simple" does mean for you? :-). I have found the Tulains approach to be simple. – Laiv Jun 24 '17 at 12:12
  • @Laiv, in the original code, it was trivial to see that if the original column was a 1, it would end up invoking myMethod1. Your IDE will even let you jump to the function with a single keypress. In Tulains approach, you've got to look at the processLine() call figure out its for an interface, figure out where the map is populated, find the line that inserts the value for 1, go to that class, and then follow the logic there. – Winston Ewert Jun 24 '17 at 16:33
  • @Laiv, Tulain's approach advantage is that it eliminates the duplication that would have been present in a switch statement. But it replaces it with an almost equally complicated piece of code setting up the hashmap. The savings, in my opionion, aren't there unless there are many many cases. – Winston Ewert Jun 24 '17 at 16:36
  • In that case, I agreed. Not a big deal improvememnt. But it's less procedural than the original code. And less prone to become a big ball of mud. Anyways, so far, the only problem I see is that the OP didn't state the goals to achive with the re-design. So any solution is fine as soon as it works as expected – Laiv Jun 24 '17 at 16:53
  • @Laiv - My main goal is learning the real-word practice for such a problem. The original code perfectly worked, and the re-engineered version with strategy pattern did the same with the same run time. The problem is about doing some calculations using the CSV data and presenting the results. I already got the correct results but I am currently trying to take my "programming" skills to the "engineering" level. All comments on this posts are helping me so thanks all! – Shad Jun 24 '17 at 21:33
5

I assume the code sample you show is just a simplified example and that the real problem is more complex, so as to deserve using a pattern.

  • Make CustomClass an external class(*).
  • Have several processors that implement the same interface
  • Have a map of processors using the integer that identifies the format of the CSV line as the key (you call it x).
  • Retrieve a processor from the map (with the correspoding key) and make it process the line.
  • This similar to strategy pattern, it defines a family of algorithms,encapsulates each algorithm, and makes the algorithms interchangeable within that family.

Advantages: flexibility, if you create the map of processors outside the handler and pass it in the constructor, more processors can be added later and Handler will not need to be changed (for example to add a new case the switch control structure).

enter image description here

(*) You can achieve the same results having the interface and the processors as well as the customclass as inner classes/interfaces inside Handler, but it would pollute the solution a lot.

==> CustomClass.java <==

public class CustomClass {}

==> IMessageProcessor.java <==

import java.util.Map;

public interface IMessageProcessor {
    public void processLine(Map<String, CustomClass> map, String line);     
}

==> ProcessorA.java <==

import java.util.Map;

public class ProcessorA implements IMessageProcessor {
    @Override
    public void processLine(Map<String, CustomClass> map, String line) {
        // TODO Auto-generated method stub
    }
}

==> ProcessorB.java <==

import java.util.Map;

public class ProcessorB implements IMessageProcessor {
    @Override
    public void processLine(Map<String, CustomClass> map, String line) {
        // TODO Auto-generated method stub
    }
}

==> ProcessorC.java <==

import java.util.Map;

public class ProcessorC implements IMessageProcessor {
    @Override
    public void processLine(Map<String, CustomClass> map, String line) {
        // TODO Auto-generated method stub
    }
}

==> Handler.java <==

import java.util.HashMap;
import java.util.Map;

public class Handler {
    private Map<String, CustomClass> map = new HashMap<String, CustomClass>();
    private Map<Integer,IMessageProcessor> processors = new HashMap<Integer,IMessageProcessor>();
    public processFile(){
        // store the processors in their map with the appropiate keys 
        processors.put(1, new ProcessorA());
        processors.put(2, new ProcessorB());
        processors.put(3, new ProcessorC());

        // Read the huge CSV file with millions of records, line by line
        // for each line, get the string before the first comma (say x)
        processors.get(x).processLine(map,line);
    }
}

Note: You might wish to validate first whether the processor for the key x exists, and it it doesnt, fall back to a default processor, say, stored with key -1, or any other value garanteed not to exist in the CSV file.

Tulains Córdova
  • 39,201
  • 12
  • 97
  • 154
  • 1
    This is great as I first thought about the Strategy pattern. My only concern was creating too many objects but your code solved it using the processors map. – Shad Jun 24 '17 at 02:44
  • Question - What if not all methods have the same signature? i.e. some of them want a map with the line and others want the map, the line, and another data structure (Say a List)? – Shad Jun 24 '17 at 02:46
  • 1
    @Shad Well, you would be violating Liskov Substitution Principle, and polymorphism would not be possible. To avoid that, I'd suggest a not-so-elegant solution, which is adding the List to the interface signature, even when some implementors (processors) would happily accept null or completely ignore the list. – Tulains Córdova Jun 24 '17 at 02:50
  • @Shad, if the different processors naturally take different parameters, then I strongly recommend that you drop the whole processor map that is being suggested here. It isn't buying you anything in that case. – Winston Ewert Jun 24 '17 at 03:40
  • @WinstonEwert, By sending the String line, they all need the same parameters. They need different parameters only if I parse the CSV line before calling the methods. In this case I need to send elements extracted from the line to the methods. – Shad Jun 24 '17 at 03:55
  • Do you think it might be good if I initialize the map using a static initializer? or -instead- in the class constructor? – Shad Jun 24 '17 at 04:01
  • @TulainsCórdova "or any other value guaranteed not to exist in the CSV file." - wanna bet on that, in a file millions of lines long and probably outside the OP's control? I just love pretty solutions (stuffed with "ceremonial" code) that still leave a nice big hole for the bugs to get in! – alephzero Jun 24 '17 at 04:10
  • @Shad if you are going to initialize the map once and use it millions of times, even making the initialization 100 or 1000 times more efficient will probably not have any measurable *global* effect on the system performance. – alephzero Jun 24 '17 at 04:13
  • @alephzero I didn't get it. What do you suggest and what's the buggy part of the solution? I would like to learn – Shad Jun 24 '17 at 04:25
  • @alephzero - Are you suggesting using the simple code with no design pattern? – Shad Jun 24 '17 at 04:26
  • @alephzero OP refers to the very first datum in a comma separated line, not the whole string. Most probably that first column is the ID from the source data origin. So findind a dummy number to map the default processor should not be impossible. Another option is having the default processor as a class member. Also, my answer is not a full app that passes tests and so on, it's just an answer to a question. There's a lot of code missing, like the loop that reads the lines from the file, for example. – Tulains Córdova Jun 24 '17 at 04:56
  • @shad A static initializer would be a good idea. – Tulains Córdova Jun 24 '17 at 04:57
  • `IMessageProcessor` already exists in the java standard library, and it's called `BiConsumer` – corvus_192 Jun 24 '17 at 14:06
  • @corvus_192 Only in Java 8 or newer. – Tulains Córdova Jun 24 '17 at 20:42
  • @TulainsCórdova Yes, I should've added that, but if you're using Java 8, there's no reason to reinvent this class. – corvus_192 Jun 24 '17 at 21:12