Jump to content

Theory on Voice recognition, Speech to Text on the fly, misc brain fart...


digip

Recommended Posts

Ok. I was thinking about a program that could take a video or song and then make the audio into a text file. I then thought, Voice recognition software already kinda does this to begin with. Vista (and other os's) can do voice recognition and it's well known the exploits people have already tried, sending various commands through mp3's, etc to people, but what I was thinking is, Vista is understanding these commands, which means it must be able to convert it to text, us tlike as if a hadicapped person was giving an oral dictation in word, etc.

So, my question is, how hard is it to get it to so they work for you, in creating a subtitle to a film, or dump the lyrics from a song to a text file? The should be a way to have Vista (or any other OS) create Speech to Text files on the fly for you. If someone can find a way to make a modded vb app to take advantage of this in vista, then theoretically you could have it dump the audio from an episode of Hak5 to a text file. And with a little counter system, time stamp each peice so it works like an .srt file to be merged with the video. It could then be translated into other languages so people from around the world could add subtitles to the episodes of Hak5.

Link to comment
Share on other sites

Ok. I was thinking about a program that could take a video or song and then make the audio into a text file. I then thought, Voice recognition software already kinda does this to begin with. Vista (and other os's) can do voice recognition and it's well known the exploits people have already tried, sending various commands through mp3's, etc to people, but what I was thinking is, Vista is understanding these commands, which means it must be able to convert it to text, us tlike as if a hadicapped person was giving an oral dictation in word, etc.

So, my question is, how hard is it to get it to so they work for you, in creating a subtitle to a film, or dump the lyrics from a song to a text file? The should be a way to have Vista (or any other OS) create Speech to Text files on the fly for you. If someone can find a way to make a modded vb app to take advantage of this in vista, then theoretically you could have it dump the audio from an episode of Hak5 to a text file. And with a little counter system, time stamp each peice so it works like an .srt file to be merged with the video. It could then be translated into other languages so people from around the world could add subtitles to the episodes of Hak5.

Isn't that kind of like the PodZinger thing that they interviewed some guys from in one of the Hak.5 episodes? That recognises the speech from podcasts and transcribes them so you can search on the actual podcast content. I'd say that could easily be used for film, perhaps less so for music since then it has to distinguish between music and lyrics, plus problems of multiple singers/backup singers and recognising words that a pronounced strangely to make them fit the music.

Edit: Although it's not that good, here's a few bits from 2x10:

"...some  kind  of  disk  imaging  software  I'd  chose  Acronis  true  image  nine  point  one.  Let  anything  just  mentioned  Maureen  Meehan  like  all  of  that  is  homily  sailing  --  reward  lots  it  is  and  others  partition..."

"...got  --  org  slash  Wiki  you  can  email  me  directly  west  of  Hak  five  dot  org.  Now  coming  up  we're  going  to  have  a  prerecorded  segment  with  our  whole  body  move  aches  and  his  US..."

"...say.  Almost  looks  militarily.  Weren't  able  to  use  that  link  slipped  one  point  yeah  but  it  wasn't  as  easy  wasn't  disease  or  is  fun  because  we  have  gradient  around  corners  --  We'll  instantly  very  cool..."

Link to comment
Share on other sites

I have used the podzinger a few times to find specific segments, like the ssh tunneling episode...

Thats sort of what I was thinking, only I want to see it generate the text to a file while watching the video so it can be saved and then translated later. Podzinger is cool for searching through segments and finding what you want, but what i am talking about is getting a fully transcribed text file generated from the video's audio feed.

If podzinger has full text transcripts from the show, the only thing left would be to time stamp it in an srt format and then translate to other languages to be merged as subtitles, but I have yet to see where it gives a full transcript of the show.

Any idea what the underlying software is that translates the audio for podzinger?

Link to comment
Share on other sites

I have used the podzinger a few times to find specific segments, like the ssh tunneling episode...

Thats sort of what I was thinking, only I want to see it generate the text to a file while watching the video so it can be saved and then translated later. Podzinger is cool for searching through segments and finding what you want, but what i am talking about is getting a fully transcribed text file generated from the video's audio feed.

If podzinger has full text transcripts from the show, the only thing left would be to time stamp it in an srt format and then translate to other languages to be merged as subtitles, but I have yet to see where it gives a full transcript of the show.

Any idea what the underlying software is that translates the audio for podzinger?

Yeah I know what you mean, I was just using PodZinger as an example of the technology. Obviously PodZinger does have full transcripts of the show, since otherwise it wouldn't be very useful to searching podcast content, but whether they will give them to people is another matter (probably not).

Link to comment
Share on other sites

I use Virtual Dub to merge pre made ssa files(converted from srt files to ssa, then run through virtual dub to merger subtitles), but it would be nice if it had a plugin to write the srt/ssa file from the audio track of the video. This way it time stamps it in the correct spot to line up with the video and can then be translated using something like google, and then imported and merged back in with the desired language for the video.(/me needs to stop writing run on sentences)

Anywho, just throwign out some ideas. If anyone knows of any Speech to Text programs that will capture from a video or combinations of plugins to maybe do this, pleas epost the links here. I have seen some PayFor Speech to Text programs, but they all seem to be based on standard voice recognition programs, and I don't need Darren or someone spouting out commands and my pc starts to go haywire executing them...

Link to comment
Share on other sites

This would be really simple. Although, my way would involve a Jack to Jack Cord.

I assume you mean out of the sound card and back in? As long as it is only recording the video's audio and not any other source, otherwise you get stereo feedback. You could of course go pc to pc, and just use the voice recognition to record it from on pc to the other as text, but there would be a lot of manual setup involved and I was looking more for an app or plug-in to do the whole process on the fly.

See, you can script a media player control into a vb app with visual basic, and I know you can do text to speech, but I havent found any source code to do speech to text in a visual basic app. If it is possible, and I think it is, what I woudl do is have the app start the video and then create the text in a text box control on the form with a time stamp every time there is new audio, so if there is silence, it starts counting and when it hears audio, places a time stamp at the beginning of the sentence. Then when done, you can save it as an srt file which can be used as a subtitle, or imported into srt converter to make an ssa subtitle file to merge in Virtual Dub. Either way, you woudl have the text from the show, and then it could easily be translated to other languages, once it has been transcribed and time stamped.

Anyone have any ideas on how to do this. I think it would be a good peice of software if someone could make it work.

videotrans.gif

This is an example of what I would be doing if I get all the parts worked out. Its basic and all it woudl do is load a video, play it and record the audio to text int he text window with a time stamp, then you can dump it to an srt file or whatever you wnat to do with it. This is a very basic conecpt and I understand it would probably require some sort of software programming beyond a simple voice recognition plugin, but hey, its a start, or an conecpt of what I want to do.

Link to comment
Share on other sites

Well, if anything, its worth a good laugh to see what the Speech recognitions will come up with. The following is taken from a clip of Wes going over how to use a laser pointer to send audio to a stereo. Its allmost jibberish, and unrecognizable. I have to laugh as I watch the video and read along what it is typing out on the screen though. Funny stuff.

I needed Evelyn battery that have of user: NT my in the wall ones R's Anne Darcy were certainly I had I had anyone with a envoy-mile at my house was your yellow sawmills lesson on the side of Allied attack of this because the U.S. wages I knew I'd suggest and in the floss on this man's original design was that I want so why the medical lasers just all in all his illness going on on a handling of new name on it days/on off-the the family had to do is wrong or wires in different directions and recover Bracken's tradeoff of the outline of the battery pack your time and then this of this disaster accolades India laser pointer itself to serve the time being will just have trusted on right here on Asahara is connections here shortly that distance tainan meant anywhere that vision of the eyes of events today-music to pay attention to how that is.  One major player (say this is a direct line then from there with Internationale have runs to one side of our audience transform the right hand side that and then the other native inspiration from the battery pack to transform you will distresses connections and I have a 10ms in the NASA can take a lot more time-made needs to do not a lot has given me and I boxed radio shack in the middle of pre-villages that have always had sailing in this way

This is just too funny. I wonder what it would do if I gave it a speech of G.W.Bush? I can barely understand half of his ramblings as it is, its sure to be funny what the computer interprets his words as.

Link to comment
Share on other sites

Which software did you use to transcribe that?

The Speech Recognition Howto refers to a number of voice recognition programs out there. No idea which one is best.

Thanks Cooper, but I am using WinXP at the moment. Haven't tried it in Linux. In fact, I haven't ever watched a video using linux, so I guess its a good time to start fiddling around with it...

Link to comment
Share on other sites

I read the first post and no more...

my response is.. metaphor and simile can not be translated.

language is a construct of culture (and more). It is like, in English we can understand how some words can mean other things. (think "rod") but in other languages you will not get the metaphor or simile that that word represents... words with multiple meanings.. but it doesn't stop there, think of how many word combinations, or word phrases that have alternative meanings... it's endless...

so we can do our best... but the true problem is to get a program to understand the context of the whole. (oh and then add on things like sarcasm)

Link to comment
Share on other sites

I read the first post and no more...

my response is.. metaphor and simile can not be translated.

language is a construct of culture (and more). It is like, in English we can understand how some words can mean other things. (think "rod") but in other languages you will not get the metaphor or simile that that word represents... words with multiple meanings.. but it doesn't stop there, think of how many word combinations, or word phrases that have alternative meanings... it's endless...

so we can do our best... but the true problem is to get a program to understand the context of the whole. (oh and then add on things like sarcasm)

Huh? Metaphor and simile can often be translated, especially simile. In the case they can't then another phrase with a similar meaning can be used in its place. I'm not really sure what your argument is, since obviously hundreds of movies and TV shows are subtitled in other languages every day.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...