As an example to illustrate the workings, let's first talk about a classic 32-bit processor like the good old 68020.
For various reasons (compatibility, usability for ASCII characters, ...), even 32-bit and 64-bit CPUs have names (called "addresses") for individual bytes. If the CPU needs to name a bigger chunk of memory, e.g. 4 bytes, it uses the name for the first byte, and implicitly the following bytes are the ones with the immediately following address numbers.
On the other hand, the bus can transfer 32 bits at once, because the CPU has 32 data lines D0...D32. So, of the 32 bit of an address, the lower 2 bits select a group of data lines (bits 00 select D0...D7, 01 selects D8...D15, 10 selects D16...D23, and 11 selects D24...D31). Only the higher 30 address bits exist as real address lines A2...A32. Instead of the lower 2 bits, the CPU has four distinct byte-select signals that it can use individually in different combinations.
Now, if the CPU wants to read 32 bits from address 0x1000, it can use all 32 data lines in parallel. It places the higher 30 bits on the address bus, and sets all four byte-select signals, so it transfers 4 bytes.
Let's get some overview:
- read byte from 0x1000: Address bus = 0x1000, Byte-Select 0
- read byte from 0x1001: Address bus = 0x1000, Byte-Select 1
- read byte from 0x1002: Address bus = 0x1000, Byte-Select 2
- read byte from 0x1003: Address bus = 0x1000, Byte-Select 3
- read byte from 0x1004: Address bus = 0x1004, Byte-Select 0
- read short from 0x1000: Address bus = 0x1000, Byte-Select 0+1
- read short from 0x1002: Address bus = 0x1000, Byte-Select 2+3
- read int from 0x1000: Address bus = 0x1000, Byte-Select 0+1+2+3
- read (unaligned) int from 0x1002: (first cycle) Address bus = 0x1000, Byte-Select 2+3, then (second cycle) Address bus = 0x1004, Byte-Select 0+1
With current CPUs the principle stays the same, but with internal cache memory you can no longer observe the individual byte transfers of the CPU core from the outside. They happen between the CPU core and the cache (both on-chip). The memory only talks with the cache now, and that transfer is always done in bigger chunks than a byte.