I don't have an answer, as I don't know the internal details of Xilinx slices. I do have some pointers.
First, you can't save data in a LUT6. You can only save data in registers (which are numbered 2/LUT6) and distributed RAM/blockRAM. Which somewhat annul your LUT6 = 64 bits assumption.
Distributed RAM and shift registers are somewhat unrelated to LUT6 primitives. If you look at the Spartan-6 Configuration Logic Blocks User Guide, you will see that there are 3 types of slices on that architecture:
- Normal slice, having only LUT6 (they do have other things, just nothing relevant to this discussion).
- SLICEL, which have LUT6 and fast-carry logic blocks.
- SLICEM, which have what SLICEL have, plus distributed RAM.
Shift registers (and distributed RAM) are only available on SLICEM CLBs, which means they need the distributed RAM block as well as the LUT6. Xilinx doesn't gives details on the RAM block, but we know from documentation that a LUT6+RAM can implement a 64x1 bit single port distributed RAM or a SRL32.
A VLSI guru could probably deduce what the internal RAM's details are, but I find reasonable that:
- 1 LUT6 + 1 RAM block = SRL32
- 1 LUT6 + 1 RAM block = 64x1b single port RAM
- 2 LUT6s + 2 RAM blocks = 64x1b simple dual port RAM
As a shift registers is probably implemented as 32 bits dual-port RAMs with specific controls provided by the LUT6.