Does chaining higher order functions mean worse performance?

Question

For clarification, let's review the following example:

Consider that we have the following array of Ints:

let array = [1, 2, 3, 4, 5]

and we've been asked to generate a new array from array corresponding to the following two conditions:

The element in array should be > 3.
The new generated element should be a string instead of int, as: "this is (element * 2)".

By following the approach of chaining higher order functions, we could achieve it like this:

let newArray = array.filter { $0 > 3 }.map { "This is \($0 * 2)" }

At this point, it should iterate through all elements for filter and then reiterate once again through the filtered elements for the map.

However, when doing it using standard for-loop:

var transformed = [String]()
for i in 0..<array.count {
    if array[i] > 3 {
        transformed.append("This is \(array[i] * 2)")
    }
}

it should iterate through array elements for only one time.

AFAIK, on the other hand, using higher order functions leads to get rid of mutability (we declared newArray as let instead of var).

In such a case, what would be better for dealing with performance?

See [Is micro-optimisation important when coding?](https://softwareengineering.stackexchange.com/questions/99445/is-micro-optimisation-important-when-coding) — Doc Brown, Mar 27 '19 at 01:48
are you sure it iterates twice? I would expect it to iterate once, applying each function in turn as your expanded method does — Ewan, Mar 27 '19 at 09:48
@Ewan not that familiar with swift but it appears it does unless you explictly ask for laziness https://stackoverflow.com/questions/51917054/why-and-when-to-use-lazy-with-array-in-swift — jk., Mar 27 '19 at 12:04

score 7 · Answer 1 · answered Mar 26 '19 at 21:52

7

Performance depends on many factors. Language. Compiler optimizations (a compiler may rewrite the first version into the second one, or the other way around, or transform both versions in the exact same code). Ecosystem (JIT, etc.) There is no answer which would be exact for every situation.
Don't trust anyone telling that a piece of code is faster than another one. Test and see by yourself.
Why would it matter? If you profiled your code, and the profiler told you that this piece of code is the actual bottleneck, then it should be easier and faster for you to just rewrite it differently and rerun the profiler than to actually write the question in the first place. Did you do the profiling?
In general, your primary focus should be on the readability of the code. In your case, the first version is more readable (if your team mates disagree, their opinion matters, and mine is then irrelevant).

answered Mar 26 '19 at 21:52

Arseni Mourzenko

134,780
31
343
513

Thanks for the answer. Honestly, I did not profile it, however I can see that using higher order functions iterates more than the standard `for-loop` (that's what is displayed on the playground)... – Ahmad F Mar 26 '19 at 21:58
3

@AhmadF: what you or I see is irrelevant, because we would *usually* be wrong, and the profiler would *always* be right when it comes to finding the bottleneck. Let the machine do what it does the best; profilers are good at spotting problematic points, and you are good at deciding whether the code is readable or not, and how to make it fast enough while keeping high readability. – Arseni Mourzenko Mar 26 '19 at 22:04
1

@AhmadF you may be underestimating the compiler: a compiler can often inline such higher-order functions and can fuse multiple loops so that the original array is only iterated once. In this case, you probably have to use `array.lazy` in order to make loop fusion possible (see the [assembly](https://godbolt.org/z/ss62YY)). Note that multiple iteration passes are not always slower than a single pass. Both are O(n), and what is faster in practice often depends on subtle cache effects. – amon Mar 27 '19 at 09:21

score 0 · Answer 2 · answered Mar 27 '19 at 11:41

As you say, you need two passes through the array; one to do the filtering, another to do the transformation albeit on a smaller set of rows.

That's fine with only 5 elements in the array. You probably wouldn't see much of a difference whichever way you did it.

How would it cope with 5,000 elements?
Or 5 million?

It's not even the repeated iterating that's really the issue, here; it's the intermediate [memory] storage required for each step in the process.
To borrow a plumbing analogy; Your array iterating method is a slim pipeline, where data flows through filtering and transforming processes only once. Your filtering and mapping method involves pumping the data out of one tank into another and, only once it's all arrived, pump it again into another tank, and then another, and then another, depending on the complexity of the processing required.

We see this a lot with Object Relational Mapping tools (ORMs), which pull a slab of data out of the database, then do something with the whole slab, then do something else with the result of that and so on. Sure, it's just memory allocation and de-allocation which is fast, but finitely so. Life is good when the amount of data involved is small (say, on a Development machine), but performance can drop like a stone as the volume of data (the size of those intermediate "water" tanks) increases.

Does chaining higher order functions mean worse performance?

2 Answers2