Additional assumptions
This answer is based on the following additional assumptions:
- you can easily determine the timestamp for the beginning of the log, and
- it is feasible to store the positions of the hiccups (optional)
Search algorithm
The search is split into two different algorithms really. In case that you search for a log with a timestamp that is after the beginning of the log, you know that it will not be found in the hiccups and use non-hiccup search below. In case that you search for a timestamp before the beginning of the log you use hiccup search instead. If you do not search by the timestamp, but some other criteria, you first try the non-hiccup search due to its 95% coverage and then try the hiccup search if you failed to find anything.
Optionally, you can speed up the non-hiccup search by a preprocessing step.
Preprocessing (optional)
If possible, pre-analyze your data with a linear search to find out the positions of hiccup data. This depends entirely on the feasability of being able to store these ranges, which might be possible given that they only amount to 5% of your logs (or it might not, in which case performance degrades).
Note that the corresponding datastructure should be updated whenever you write new logs, or at least it should be able to tell you up to which point of the logs the preprocessing was performed.
Non-hiccup search
Searching the non-hiccup data is possible via a combination of binary and linear search. You perform a normal binary search, however, when your pivot element is timestamped before the beginning of the log, i.e. the pivot element is a hiccup, you need to determine the first log entry before the pivot element and use that as the real pivot element of your binary search.
This first log entry with a timestamp after the beginning of the log is found via a linear search starting from the hiccup pivot element. If you know from preprocessing or incremental updates to your cached hiccups where the relevant pivot element is positioned, you can jump there in constant time. If you had to run the full linear search, you update a datastructure to store that these positions are covered by hiccup data, such that next time you can quickly determine the right pivot element.
Hiccup-search
This depends on whether you have made the preprocessing. If not, it is a linear search through all your data (and you can do the preprocessing parts at this time as well). Otherwise, your preprocessed datastructure is able to tell you the positions of the hiccup data and you can search directly through these, i.e. perform a linear search only through the 5% of hiccup data.
Performance
For non-hiccup search with fully preprocessed data you can get almost the performance of a default binary search. Instead of a simple comparison at each step though you need an additional comparison to determine if you hit a hiccup element and if so you need an access to your datastructure to find the real pivot element. This additional work, however, is slightly alleviated by the fact that you not only rule out half your dataset, but you also rule out the hiccup data-part.
Of course, if you have to resort to linear search, this degrades accordingly.
The hiccup search is a bad case if you do not have information about existing hiccups available and need to search linearly through all the data.
Finally, if you do not search for a timestamp, but some other criteria and no such log entry exists, this is your worst-case. In fact, if you have no datastructure for the hiccups, it ends up being slower than linear search, as both search runs may linearly scan the same hiccup positions (though it is still O(n)).
If you do have the datastructure available and fully preprocessed, the worst-case runtime comes down to O(max(log(M)*log(N), M)) where M is the amount of hiccup data, which is searched linearly, and assuming you can lookup the end of hiccup data given any hiccup position from your datastructure in O(log(M)).