Posting here, as this has more to do with the human voice itself than electronics.
I am planning to build a simple human voice synth, but I couldn’t find much on the internet. The overall plan is to generate a signal, pass it through about 10-ish bandpass filters and adjust each filter’s gain to create speech.
For my source signal, I found this, which seems to be the sound generated by the larynx before passing through the throat and mouth. From what I read online, a relaxation oscillator or a sawtooth wave seems to be a close approximation of it.
One of the things I am struggling to find is the frequency components corresponding to certain phonetics. Though I am pretty sure it is either because I can’t find the right keywords or because SEO ruined the internet.
US2121142A is the patent for Voder, the first human voice synthesizer by bell labs. It has a similar structure to what I’ve been modeling in my head. Should I just use the frequency values here for my bandpass filters or should I use something else?


Wikipedia has a pretty good overview:
https://en.wikipedia.org/wiki/Speech_synthesis#Synthesizer_technologies
Basically there are unit synthesis approaches (adding together lots of little sound files) and signal-based approaches. I think you want the signal-based approaches, so read that page, find the right terms, then go to scholar.google.com and search for the right terms plus “overview” or plus “toolkit”.