Best approach parsing input in multiple scenarios

Question

I am developing an API that has one call that accepts a big JSON object.

Based on this object, there are 10 possible parsing scenarios, i.e. if field xxxx.xxx is present, go with scenario 5.

We currently have around 10 if statements in a controller function of over 200 lines, which decides which scenario to choose. The cyclomatic complexity is over 5000 and the nPath complexity is over 10 million

As this is not scalable, I finally made time to refactor this and make it more loosely coupled

What would be the best possible design pattern for this case?

I was thinking of having multiple parser classes that I loop through and each of them has a function called canHandle($jsonObject), and if certain fields are present, canHandle will return true and that class can then do it's logic.

Is this similar to the strategy pattern?

More background:

It is a booking tool for point to point travel.

So say I want to travel from an airport to a trainstation, the JSON would have the following fields:

route.pickuppoint.airport.iata
route.dropoffpoint.trainstation.id

There are multiple entities that can exist under pickuppoint and dropoffpoint(trainstations/airports/addresses/flight numbers) and they have to be parsed differently into our database

So if anyone calls the API with an airport in the pickup, I need to parse the airport and translate it to an address using some external api's and put it in the DB, the same goes for the other entities

Code example:

if (isset($data['Address']['Street'])) {
        // Parse address
    } elseif (isset($data['Location']['Latitude'])) {
        // Parse Location
    } elseif (isset($data['Airport']['Id'])) {
        // Parse airport
    } elseif (isset($data['Flight']['FlightNumber'])) {
        // Resolve flight number and parse
    } elseif (isset($data['Trainstation']['id'])) {
        // Resolve train station
    }

As you can imagine, if we were to add more options, this code would only get uglier

Possible duplicate of [Approaches to checking multiple conditions?](http://softwareengineering.stackexchange.com/questions/191208/approaches-to-checking-multiple-conditions) — gnat, Feb 06 '17 at 15:33
How are you processing the incoming JSON? Did you write code that actually looks at every JSON element? If you're using Java, you might want to look at something like JSONPath (https://github.com/jayway/JsonPath) which lets you query a JSON file in a way similar to XPath. You could also use jq (https://stedolan.github.io/jq/) to more easily process the JSON. — FrustratedWithFormsDesigner, Feb 06 '17 at 16:06
I'm having trouble understanding why you'd want multiple parsers for what seems like a ***unified input*** language. Why not use a single parser that can handle all the constructs/scenarios (and a grammar/language that can be checked for ambiguities). Splitting up the logic between multiple parsers (`canHandle`), means you also have to have to have parser priority ordering, and, potential ambiguities are not necessarily analyzed (not necessarily analyzable in one place). I'm sure you have reasons, but I don't get them. — Erik Eidt, Feb 06 '17 at 17:41
`We currently have around 10 if statements in a controller function of over 200 lines, which decides which scenario to choose. The Cyclomatic Complexity is over 5000 and the nPath complexity is over 10 million` -- That doesn't make any sense, unless there are 10! possible combinations. If there are only 10, or a few hundred possible combinations, you're doing something wrong. Have a look at *guard clauses.* Can you post your code? — Robert Harvey, Feb 06 '17 at 19:48
Robert Harvey is correct. The best solution is probably to change the format of your JSON to make it easier to parse. But we have no idea why it would be so complex in the first place. — Frank Hileman, Feb 06 '17 at 20:50
@RobertHarvey I have edited the code and provided more examples — Mazzy, Feb 07 '17 at 09:57
@ErikEidt Because every parser behaves differently based on what is in the input, it seems cleaner to have the logic that resolves an airport not in the same class/function as the logic that resolved a trainstation — Mazzy, Feb 07 '17 at 10:00
@Mazzy given the additional information, I think your original strategy is the best (canHandle, etc). — Frank Hileman, Feb 08 '17 at 22:29

score 1 · Answer 1 · answered Feb 07 '17 at 16:08

1

Can every piece of ode that creates the JSON in question add a uniform, unambiguous marker for the type of the information it produces?

Consider:

kind = $data['kind']  // always present.
if (kind == POSTAL_ADDRESS) {
  ...
} 
else if (kind == COORDINATES) {
  ...
}
// etc

answered Feb 07 '17 at 16:08

9000

24,162
4
51
79

that is somewhat what we are doing now, but the problem is the amount of if statements I don't find pretty/scalable. Also where should I place the logic inside those if statements? right now it is in functions in the same file, which makes quite a mess, hence why I want to make a structure for the if's and the logic inside the if's – Mazzy Feb 08 '17 at 13:58
@Mazzy The point here is: findig the right abstraction. You have heterogenous data which you have to put into abstract categories. From your examples above your abstraction is bad, because it uses a too finegrained heuristics to decide which parser to use for what. If you divide your model into 5 `kinds` or so, you only have to choose between 5 different parsers. So your conditional becomes much smaller. I think @9000 points you into the right direction. – Thomas Junk Apr 09 '17 at 12:27

score 0 · Answer 2 · answered Apr 09 '17 at 16:45

Here's a general approach: write each parser as a function into some appropriate domain that can fail. I would tend to use an Optional return value to signal failure, but it may be more convenient to use exceptions depending on the language / situation. You would then have

parser1 : JSON -> Optional[Type1]
parser2 : JSON -> Optional[Type2]
...
parserA : JSON -> Optional[TypeA]

And then write

parser : JSON -> Optional[Type1 | Type2 | ... | TypeA]
parser json = parser1(json).or(parser2(json))...or(parserA(json))

The reason I think this is preferable to using canHandle functions is that these sorts of guards tend to introduce duplication between the check and execution (e.g. your canHandle has to check the same paths exist as what your parser reads), introduce temporal coupling in your consumers (your consumer must remember to always check canHandle before parsing) and often lead to wasted cycles (canHandle doing some preliminary parsing work on its own).

If the question is what pattern fits here, then I agree that a strategy pattern is appropriate.

Best approach parsing input in multiple scenarios

2 Answers2