Optimisations should not kept in a permanent long term branch
They can be developed in a branch, and then merged back into the release branch once complete, but it should always be possible to test both versions side-by-side and it should be possible to easily switch between optimised and unoptimised versions of any given algorithm, ideally at run time.
Full answer
Remember that there is more than one form of optimisation.
Algorithmic optimisation (replacing an O(n^2) algorithm with an O(n log n) algorithm for instance) almost always has the potential to provide more significant savings than implementation optimisation (hand unrolling a loop or ensuring your inner loops data fits in a certain level of cache).
Unfortunately, when people are thinking about optimisation, they tend to think more about implementation optimisation, which is about specialising software in such a way that it runs more efficiently on one architecture, possibly at the expense of execution speed on another architecture.
I have already answered many other aspects related to your question in my answer to the question How to document and teach others “optimized beyond recognition” computationally intensive code?
Essentially though, it boils down to the golden rules of optimisation:
The First Rule of Program Optimisation: Don't do it.
The Second Rule of Program Optimisation (for experts only!): Don't do it yet."
— Michael A. Jackson
In order to know whether now is the time to optimise requires benchmarking and testing. You need to know where your code is being inefficient, so that you can target your optimisations.
In order to determine whether the optimised version of the code is actually better than the naive implementation at any given time, you need to benchmark them side-by-side with the same data.
Also, remember that just because a given implementation is more efficient on the current generation of CPU's, doesn't mean it will always be so. My answer to the question Is micro-optimisation important when coding? details an example from personal experience where an obsolete optimisation resulted in an order of magnitude slowdown.