In most cases you don't have to sacrifice readability to have a well-performing code.
It's all a matter of choosing the right data structures and algorithm for the job. Most of the cases with poorly performing code are caused by the developer not understanding the structures he/she uses.
A good example of such behavior that I commonly see is the overuse of List in C# and vector in C++ - for a lot of programmer these two data structures are the only way to express "a bunch of T" or a "collection of T". Where some other structures like linked lists or trees that are contained in well-abstracted and conveniently implemented classes would do the same thing but with lower magnitude of complexity.
Another example is using huge locks statements / large critical sections, where in reality locking/protecting only one or two of lines of code would be more than enough.
Yet another example that I had to deal with last week when I had to optimize some code was, when the original code needed to import about 5000 xml files to the database, but it had only to import one field of it contained in the very first 100 bytes of the file. Instead of using XmlReader and just reading until reaching that field, that code was loading the whole xml into XmlDocument and calling X-Path on it. The execution time was cut to about 1/30 of the original code and I would say, that the latter version was more readable and maintainable.
Writing more performant code is just a matter of understanding the fundamental algorithms and data structures behind the implemented abstractions in the provided frameworks. It's very rare (and I mean really rare) to see a code that needs highly unreadable code (like assembly) to do the same job but faster.