First of all, GSM technology is more than 30 years old. What wasn't possible back then isn't necessarily impossible for handsets today. Modern phones can receive and transmit at the same time, i.e. the premise of your question is outdated.
Now, let's restrict ourselves to why GSM in the late 1980's decided to specify that transmission and reception don't happen at the same time.
The idea is pretty simple: your receiver needs to be very sensitive – GSM handsets are supposed to still work with less than -100 dBm in received power (that's 10⁻¹³ watt, i.e. 1/10 of a picowatt), while they are allowed to send up to 2 W (that's 2·10¹² picowatt).
That means that if you want to receive while you're already transmitting, you need to have an isolation that reduces the power that leaks from your transmitter into your receiver by by a factor of more than 10¹³ if you want to still have any SNR at which you can make out data in between the interference (and noise).
That is very hard to achieve, technically. These are microwaves, and just as sand, they get everywhere, they travel between the sheets of your PCB, and the only hope you have is that your receiver is much much more sensitive to the carrier frequencies it is supposed to receive compared to those the transmitter transmits.
Of course, going half-duplex halves the available time for transmission and reception, and hence the net rate. However, that really doesn't matter at all for cellular systems like GSM: you don't have to plan for the case where a single handset gets 100% of the air time, since all phones need to share the up- and downlink times anyways! So, if you split airtime 50/50 between only two phones, you can just pick the time slots so that RX and TX of the same phone don't overlap. That solves the problem, without reducing the available data rate. Thus, everyone does it in TDMA systems.
Why can't I have cellular data and voice at the same time with 2G phones? What changed with 3G phones and made that possible?
The first question is simple: 2G doesn't have a data mode. There's CSD (circuit-switched data), which really is just using the exact same methods used for speech data (I mean, these are all-digital devices... anything you transmit is data, no matter whether it's voice or cat pictures...).
Later standards, such as EDGE/2.5G added more or less separate physical layer specifications, and such restrictions did no longer apply. In 3G and on, the problem de facto doesn't exist anymore.