Generating speech using ICs

Question

I have seen many a toys using just discrete components and IC's to produce songs (with words and everything!). I want to gift someone a piece of electronic art in which I want to use such a circuit. Does anyone know of these? I don't think those toys use uControllers to produce speech so there has to be some cheap alternative. What can it be? Please suggest.

Edit

For anyone who reads this question afterwards, I found all the answers very much doable and these are great options. The reason I had marked Leon Heller's answer as my accepted answer is just that it was very very easy to do. And can be implemented using the cute little Attiny uControllers.

are you looking for an audio recording-type ic or full blown speech synthesis ? — jeremy, Jul 18 '10 at 06:58
Not exactly speech synthesis, recordings will do. You know those greeting cards which say "Happy B'day to you" when you open them. Something like that — Rick_2047, Jul 18 '10 at 08:06

Kevin Vermeer · Answer 1 · 2011-05-04T15:28:35.183

5

Atmel's AVR355 app note describes how to do this with an 8-bit microcontroller with on-board A/D, an (Optional) external SPI flash chip, an LM324 quad op-amp for the microphone amplification and PWM'd output filtering, amplification, and feedback prevention. They use an outdated AT90S8535, but you could do this with any 8-bit micro. You're going to run at about 7,812 (half of your 15,625 PWM frequency) 8-bit samples for every second of sound, so something big like a 644 or 128 would get you, assuming a 8k or less code, 8 seconds on the 64k or 17 with 128, if you can swing the self-flashing code - you'll likely want the external flash. Less than $\$$3 for a 64MB SST25VF064C will get you 2 1/2 hours of talk time, or $\$$0.68 for a 1MB AT25FS010N will get you just over 2 minutes.

However, the easiest way to do this is to buy a ready-made recordable greeting card, and take it apart. Some places even will sell just the module, so you can put it in a custom card/enclosure. Here's one site for cards and bare modules, and here's another with more cards and modules in bulk (min qty 20). Looks like they're both using bare dies, but hey, if you can just buy it ready to go, why do you need to make your own? I know this paragraph wasn't in the spirit of Chiphacker like the first paragraph was, but it is a solution.

PS if you do spring for a premade module, and can figure out what chip is under that ubiquitous black blob [Images], and (optionally, but would be nice) can get a source on it in a hackable package, please let us know!

edited May 04 '11 at 15:28

answered Jul 18 '10 at 20:42

Kevin Vermeer

19,989
8
57
102

you can get away with a much lower sampling rate than 15khz. At 15khz you can reproduce up to 7.5khz audio which is very high for just speech. If all you want is intelligible speech you only need up to about 4khz dropping your required sampling rate almost in half to 8khz which is what is used in many telephone systems. – Mark Jul 18 '10 at 22:16
I understand that you don't need to sample that fast. Read the app note for more info. If you want signals under 3,000 Hz (Voice), put in a lowpass filter. Then, by the Nyquist–Shannon sampling theorem, you need to sample at 6,000 Hz . To get an analog output without a discrete DAC (Added component) or special micro with an onboard DAC (Limited options) you need to use a PWM. To get a smooth output from the PWM, you need to run at a frequency at least 2x your sample rate. The PWM frequency is limited by (clock)/2*resolution; ex. 8MHz/2*2^8 = 15,625, which I approximated to 15,000, sry. – Kevin Vermeer Jul 18 '10 at 23:30
you miss understood the datasheet or are implying the wrong thing. "You're going to have 15,625 8-bit samples for every second of sound, so you'll likely want the external flash ($2 for 8-32MB will get you a lot of talk time)." This implies you need to store 15,625 samples when you do not. You only need to store data at your needed sampling rate, 8khz or whatever is closest to this that divides into the uC clock rate. The PWM runs at twice this rate but you do not need to store data for every PWM cycle as each pair of cycles is cut in half based on the value of the sample. – Mark Jul 19 '10 at 02:15
Ah, I see that you're correct. Rereading, I understand that the PWM must run at twice the sample rate, because they're using a count up/count down PWM. Therefore, maximum resolution is achieved with 7,812 samples/second, as limited by the PWM maximum frequency. Is this correct? If so, I'll edit my answer. – Kevin Vermeer Jul 19 '10 at 12:36
yup, you've got it. – Mark Jul 19 '10 at 19:54

score 3 · Accepted Answer · answered Jul 18 '10 at 21:30

3

Here's a simple way to do it with a PIC etc.

answered Jul 18 '10 at 21:30

Leon Heller

38,774
2
60
96

This can work. So I can just add the RC filter and attach the out to a speaker and that will produce a close approximation of original sound? So in principle if i record me talking will output have something like my voice? At least the accent? – Rick_2047 Jul 19 '10 at 14:20
Yes, that's it. You need to record the speech as a WAV file and convert it with Roman's software. – Leon Heller Jul 19 '10 at 18:39

score 2 · Answer 3 · answered Jul 18 '10 at 11:11

2

Sparkfun sells an IC that can record up to 64 seconds of audio. However, these sort of chips normally work by storing the audio in a bank of capacitors, so it may require a coin cell to keep the audio stored.

answered Jul 18 '10 at 11:11

jeremy

5,055
5
33
32

dram is actually a capacitor bank. S/He might me DRAM. – Kortuk Jul 18 '10 at 14:00
he, and I mean DRAM-ish. SRAM is flipflop based if i remember correctly. – jeremy Jul 18 '10 at 14:04
What are you two talking about? – Rick_2047 Jul 18 '10 at 15:47
They're talking about volatile memory. They're right, DRAM stores bits in a capacitor circuit (and as such, must be Dynamically accessed so the capacitors don't discharge), while SRAM stores data in a flipflop (and is therefore Static). Basically, you want non-volatile memory (aka NVM) (pref. Flash, or you can go with EEPROM) which allows you to, say, change the battery without losing everything. – Kevin Vermeer Jul 18 '10 at 20:19
i should probably add that to call it DRAM is not fully correct. DRAM is used to store ones and zeroes, whereas each capacitor in the bank on these ics stores an analog-type voltage. For it to be stored in DRAM the audio must be encoded. – jeremy Jul 18 '10 at 20:55

score 2 · Answer 4 · answered Jul 18 '10 at 17:35

The easiest way to do this is using an ISD17* IC, for example an ISD1730, take a look on the datsheet: link text

To buy that, go to ebay, it isn't cheap, but do the work, will work great on your case.

An other way is to pick some of these toys and other things that do speech and take the PCB apart and use it on your application.

score 0 · Answer 5 · answered Jul 23 '10 at 22:04

0

Radio Shack sells a 9v Recording Module that stores up to 20 seconds of audio and plays it back at the touch of a button. Cat # 276-1323, $10.99. Currently out of stock on-line, but might be available in stores.

answered Jul 23 '10 at 22:04

tcrosley

47,708
5
97
161

Generating speech using ICs

5 Answers5