7

I use buffer for quite a long time when I need to copy a stream or read a file.

And every time I set my buffer size to 2048 or 1024, but from my point of view a buffer is like a "bucket" which carries my "sand" (stream) from one part of my land (memory) to an other part.

So, increase my bucket capacity will in theory allow me to do less travel? Is this a good things to do in programming?

gnat
  • 21,442
  • 29
  • 112
  • 288
Cyrbil
  • 183
  • 1
  • 1
  • 5
  • 2
    Please be more specific. Do you refer to a specific programming language? Can you give an example? Buffer for what - network transfer, streaming video, copying file? – Patkos Csaba Sep 06 '12 at 16:36
  • When I had this question I was using buffers for network transfer in c++. But my question was for general purpose. Is there a specific reason why 1024 are majorly used ? @gnat: thx for the edit, english in not my native language. – Cyrbil Sep 06 '12 at 17:34
  • The best buffer size depends on the application. Where does the data come from, where does it go, and what kind of processing do you do on it in between? 1KB seems relatively small and cheap by general-purpose CPU standards -- in that context, it makes sense as a lower bound. – comingstorm Sep 06 '12 at 18:42
  • 1
    A bigger buffer will always make transfers faster... right up until it gets big enough that it breaks your cache, and then things get slow again. You really need to measure this. – Mason Wheeler Sep 06 '12 at 18:50
  • Also make sure your "bucket" is not smaller than your "shovel." For example, a disk i/o buffer which is much smaller than your cluster size means rereading the same hard disk clusters over and over again, like filling a thimble with a shovel and then dumping most of the sand back onto the beach. – Brian Sep 06 '12 at 18:57
  • Just for the story 1MB was the bigger I can have. More do not increase speed (but do not decrease also) in my case. Tests have shown that for a 100MB random file my download take now 5s less (which is not that much but better than nothing ;-) ). – Cyrbil Sep 07 '12 at 14:37
  • Thanks for asking your first question. Please, as much as possible, make it relevant for all programmers as described in http://programmers.stackexchange.com/faq Welcome. – DeveloperDon Sep 15 '12 at 14:03

4 Answers4

11

There is an optimal size to a buffer. Too small a buffer can trigger more system calls than necessary, while too big a buffer can trigger unnecessary reloads of the CPU cache. The best way to answer this question for your specific situation is to use a profiler.

Sergey Kalinichenko
  • 17,393
  • 4
  • 57
  • 73
7

The answer is: it depends. Unfortunately, there is NO single answer to your question. The number of variables (which include the speed of the hardware, the source of the stream, the type of the disk the file is being read from, memory available, OS file caching algorithm, etc, etc) all affect the answer.

For particular situations, I advise performance measurement to see if a bugger buffer helps.

Michael Kohne
  • 10,038
  • 1
  • 36
  • 45
6

Let's pretend you are copying a data structure from one file to another, and you use a buffer to store the data between the time you read it and the time you write it.

There is an overhead when you read and write data. On disk, the head has to find the sector and read or write the track. In memory, it takes a processor instruction to move a chunk of memory (usually 1-8 bytes at a time) plus a bus operation to move data from one part of memory to another, or between memory and the processor or memory and disk. Each chunk that you read is processed in a loop somewhere and the smaller the chunks, the more times the loop has to be executed.

If your buffer is a single byte, you will incur this overhead every time you read or write a byte of data. In our example, the disk can't read and write simultaneously, so the write may have to wait until the read is finished. For a one-byte file, this is the best you can do, but for a 1MB file, this will be extremely slow.

If you have a 10MB buffer and want to copy a 10MB file, you can read the whole thing into your buffer, then write it all out again in one step.

Now, if you want to copy a 20GB file, you probably don't have that much memory. Even if you do, if every program allocated 20GB of memory for buffers, there wouldn't be anything left! When you allocate memory, you have to release it, and both the allocation and release can take time.

If a client of some kind is waiting for whole chunks of data, sometimes smaller chunks are better. If the client gets a few chunks and knows they don't want the rest, they can abort, or maybe they can display what they have while waiting for more so that a human user can see that something is going on.

If you know the amount of data you are copying before you have to allocate your buffer, you can make a buffer that's the ideal size for the data you are copying. Either the exact size of all your data, or big enough for the data to be copied in a reasonable number of chunks. If you have to guess, some size around 1MB is reasonable for an unknown purpose.

To create the perfect sized buffer, you need to study the data that you are going to use it for. If you are copying files, how big are most of the files people copy? Then you guess at a good buffer size and time it. Tweak the size and time it again. Your total available memory may limit your maximum size. Eventually you arrive at the ideal buffer size for your specific goal.

GlenPeterson
  • 14,890
  • 6
  • 47
  • 75
  • Thx for the long answer, it clarify many things for me. I appreciate you take the time to write it down. – Cyrbil Sep 07 '12 at 14:33
2

It all depends on what you are doing and with what machinery and so forth. Try different numbers and see what happens.

However, I have found that the bigger the buffer, the faster the reads and writes. I mention this because you talk about 1024 and 2048. Try some really big buffers instead. I found in one case I was reading 8 times as fast by switching from 8Kb up to 100Kb, and I got noticable improvements up to 1Mb.

I'm no hardware expert, but I've found generally computers do sequential byte copies at many times the speed of an individual byte copy. Maybe they do things in parallel, maybe it moves data through caches faster, maybe it's magic. But using big buffers and array copies (or loops that optimizers can turn into array copies) can save a lot of time.

RalphChapin
  • 3,270
  • 1
  • 14
  • 16
  • 1
    It's not magic, it's never magic.... – jmoreno Sep 06 '12 at 20:30
  • -1: I had a situation when a bigger buffer decreased performance by a factor of 2-3. One of my colleges refused to believe me, even after showing him the results, he could not get his head around it - surely a bigger buffer is less work therefore faster? Rare - yes, specific - yes. Only way to know - measure. – mattnz Sep 06 '12 at 22:53
  • 1
    @mattnz: I suggested measuring in the first line. The OP seemed to be using very small buffers, and I wanted to suggest very large buffers _might_ be the best bet. I would not expect a larger buffer to give _worse_ performance, but after a point the improvement will be irrelevant or nonexistent. I do know of general problems that can arise in Java/C#, but C++ ought to be free of them. My best guess is your problem was how your buffer aligned w/ the hardware memory cache. Apt to be different in different machines/compiles/links. Makes measuring and then using the resulting data somewhat tricky. – RalphChapin Sep 07 '12 at 13:24
  • jmoreno: I don't really believe in magic, but after watching an array outperform a linked list on linked list's own turf (inserts into the middle of a million element list), I'm starting to wonder. – RalphChapin Sep 07 '12 at 13:28
  • @RalphChaplin - I read "try different numbers..." as a casual approach, rather than a purposeful action. Your best guess is wrong - You have proved my point and justified the -1 I gave you. Maybe you boss is happy about guesses, by mine is not - if I guess wrong people can die - you really don't want me playing a crap-shoot and guessing. – mattnz Sep 09 '12 at 21:54