Atmel's AVR355 app note describes how to do this with an 8-bit microcontroller with on-board A/D, an (Optional) external SPI flash chip, an LM324 quad op-amp for the microphone amplification and PWM'd output filtering, amplification, and feedback prevention. They use an outdated AT90S8535, but you could do this with any 8-bit micro. You're going to run at about 7,812 (half of your 15,625 PWM frequency) 8-bit samples for every second of sound, so something big like a 644 or 128 would get you, assuming a 8k or less code, 8 seconds on the 64k or 17 with 128, if you can swing the self-flashing code - you'll likely want the external flash. Less than $\$$3 for a 64MB SST25VF064C will get you 2 1/2 hours of talk time, or $\$$0.68 for a 1MB AT25FS010N will get you just over 2 minutes.
However, the easiest way to do this is to buy a ready-made recordable greeting card, and take it apart. Some places even will sell just the module, so you can put it in a custom card/enclosure. Here's one site for cards and bare modules, and here's another with more cards and modules in bulk (min qty 20). Looks like they're both using bare dies, but hey, if you can just buy it ready to go, why do you need to make your own? I know this paragraph wasn't in the spirit of Chiphacker like the first paragraph was, but it is a solution.
PS if you do spring for a premade module, and can figure out what chip is under that ubiquitous black blob [Images], and (optionally, but would be nice) can get a source on it in a hackable package, please let us know!