I actually have some relevant experience on this very subject. Many, many years ago I grabbed a bunch of PS2505 optoisolators which turned out to be PS2506s. No big deal right? It turns out the PS2506s are INCREDIBLY slow compared to the PS2505s. My friend and mentor, Don Shepherd, gave me this sage advice.

Choose R1 so that about half of the available collector current (from the opto) flows through it. This ensures that the PNP transistor won't turn on from noise but still leaves enough current for the transistor to do its work. The available collector current for an optoisolator is found in the opto's datasheet as the Current Transfer Ratio (CTR). It defines the amount of collector current generated as a result of the LED current.
Since the voltage across the B-E junction of a transistor is about 0.7V, this means there will be 0.7V across R1.
This circuit speeds up the optoisolator by keeping the phototransistor collector voltage change to a minimum. Any collector voltage change comes with its a proportionate collector current change and this current is capacitively coupled back to the internal base of the optoisolator's transistor via parasitic capacitance present in all devices. The coupled collector current opposes the photocurrent coming from the diode and thus slows down the opto switching time due to "infighting." By using the PNP transistor we keep the opto's collector voltage change to a minimum and that way as much of the photocurrent as possible goes into turning the phototransistor on instead of fending off the collector current.
For my specific problem, the PS2506 datasheet states that the CTR is 80% to 600%, with 300% being typical. This is measured with If at 5mA and Vce at 5V. Plugging these values in, I see 3.52V across a 476 ohm resistor in series with the opto's LED, so If is 7mA. On the other side of the opto, Ic = If * CTR, so 300% of 7mA is is 21mA (typically).
With 0.7V across R1: if I want 10mA of current that makes R=V/I = 0.7 / 10.5 or 67 ohms.
I built this circuit and measured the pulse rise and fall times across the 100 ohm resistor at about 100ns. That's quite a speedup without this helper transistor.
BTW: The difference between the PS2505 and PS2506 is that the latter uses a photodarlington transistor. Since darlington transistors have such insanely high gain I am guessing that between the more complex structure of the darlington transistor (higher parasitic capacitance) and higher gain, the PS2506 spent much more of its available current fighting the parasitics.