-3

In the Netflix series Suits, Season 1, Episode 8 (Identity Crisis), the legal team, with the help of a hacker, is tasked with proving that a business magnate embezzled funds, splitting them and opening several different accounts across different overseas banks (i.e. to launder the funds via structuring). They know the amount stolen down to the penny. Their solution for bringing this magnate down was to search every transaction against every account across the various banks to locate a series of deposits that summed exactly to the embezzled amount. (The team ends up discovering seven different accounts, each with a different bank, whose opening deposits sum to this number - which enables them to trace the accounts/transactions back to the perpetrator.)

Of course, I don't really expect Hollywood's portrayal of hacking/mathematics to be accurate. But this one stood out to me as an obvious dismissal of a well-known NP-complete problem. Unless these banks serviced a trivially small number of accounts/customers, each comprising a trivial number of transactions - or if they knew to look only for accounts' opening deposits - this would be next-to-impossible to solve in the real world, as it is a classic example of the subset sum problem, which is related to the knapsack problem.

Am I overlooking something here? Did this show actually ignore the difficulty of the subset sum problem? Or am I failing to apply an accurate understanding of this problem?

Cade Bryant
  • 185
  • 3
  • 2
    From your description, I think it sounds like the subset sum problem. But this SE site is definitely not the right place to discuss. Experts for the subset sum can be found here: https://cs.stackexchange.com . Discussions about TV series are best placed here: https://movies.stackexchange.com/ Before you ask there, please delete your question here, for not creating a crosspost (but even if you don't ask there, self-deletion of the question would be a good idea). – Doc Brown Aug 06 '23 at 06:27
  • 4
    I’m voting to close this question because it is about a movie series and a CS problem, and for both exist dedicated Stackexchange sites. – Doc Brown Aug 06 '23 at 06:29
  • 3
    "an obvious dismissal of a well-known NP-complete problem" I think you may be dramatically overestimating how much the general public knows **or cares** about this. – Philip Kendall Aug 06 '23 at 07:27

3 Answers3

1

NP-complete problems are only unsolvable in the limit. As you state, exponential effort is entirely achievable if the basis is not too large. The question how large it can get depends on the state of current computing technology, which tends to advance rapidly. So in principle, there definitely is a number of accounts for which this plot is reasonably realistic. (The question why the suspect wouldn't leave out some penny or other, just to defeat this type of forensic accounting, is of course another matter.)

But what should this number be? We can model this roughly as a combinatorial problem: if you have N transactions in an account, then the effort of checking every possible subset sum is 2n. 2100 is roughly 1030, which is about a billion billion billions. Large Language Models such as ChatGPT take about a billion billion operations to train, and this is a multi-month undertaking (unfortunately I can't remember a reference for this and may be wrong!), and 100 seems low as the number of transactions in a business account.

So it seems that the producers have exaggerated the capabilities of computing technology for the sake of the plot. It could be saved by transferring the combinatorial issue to another entity, e.g. by combining only the balances of various accounts, of which there is a reasonable number. (Although this probably wouldn't come over as impressive enough for an crime series, since people are notoriously bad at appreciating how large exponential expressions are.)

Kilian Foth
  • 107,706
  • 45
  • 295
  • 310
1

I know that 30 years ago, if you made a payment to the German Inland Revenue, they would check if that payment matched exactly the sum of 1, 2 or 3 amounts that you owed them, and in that case marked these debts as paid. If there was no exact match, then they considered the oldest, second oldest etc. debt as paid, as long as your payment had enough money. Since the number of different debts is usually small, that was a simple subset of an NP-complete problem.

Now things depend on the numbers. If you have hundreds and hundreds of payments, then the number of possible subsets far exceeds the number of possible dollar amounts, so it is very likely that a match is found, even by sheer coincidence. If there are not enough payments then it is easy to prove that there is no match. If there are too many payments then it is easy to find matching subsets. The range in between is where the problem gets really, really hard.

But reading the problem again, if there are n bank accounts, n large, and just 7 opening numbers added up to the embezzled amount, that would be just n^7 / 7! possible combinations. That's not very hard to solve. You make a table with all values that can be created with 3 accounts, and another table with all values that can be created with 4 accounts as a bitmap, then check for two numbers that match up to the embezzled amount quite quickly. With say 12 accounts the problem would have been an awful lot harder; on the other hand it would have been very likely to find 12 random accounts.

gnasher729
  • 42,090
  • 4
  • 59
  • 119
-1

Got it now: subset sum has a pseudo-polynomial solution in the number of possible sums. Say you are looking for a sum of exactly 1,000,000 from n items, that can be done in 1,000,000 n steps.

Say a billion dollars was embezzled. Then n accounts can be handled in n billion steps.

gnasher729
  • 42,090
  • 4
  • 59
  • 119