Help me understand Voice Recognition tech

I am interested in getting an app that would allow me to make notes via voice-to-text. I work in a field with HIPAA protections. I’m having trouble figuring out the nuances of privacy related to these apps.

First off, is this kind of software considered “AI”? How does it even recognize that a sound equals a word? Do they use LLM tech? Does the tech learn to recognize my voice better over time? Does it use my recordings to learn to understand other’s voices? Is this all a black box? How can I take precautions such that no one except me hears the things I transcribe?

This is just such confusing tech! It seems like it’s fairly old and common but the more I think about it in relation to current age AI, the more creeped out I get! And yet my doctor uses one regularly… I’ll be asking her about it too, don’t worry.

Thank you!

3 points

My kid’s doctor had service to transcribe the visits. Patients may opt out verbally. This is all through the hospital, so presumably it is HIPAA compliant.

Instead of creating your own solution that complies with HIPAA, it is probably easier to use one that already exists.

permalink
report
reply
1 point
*

Well that’s why I said I will be asking my doctor what she uses! And I likely wont be transcribing anything professional, but I do still have my phone on me in those settings. It’s more about the fact that I don’t want my own personal notes to be automatically handed to an LLM and regurgitated out into the world without my knowledge. If it can recognize and transcribe my speech, what’s to stop it from using that to train an LLM, which in turn notoriously plagiarizes its training data?

permalink
report
parent
reply
4 points

There are offline solutions that never transmit the data

permalink
report
reply
3 points
permalink
report
reply
2 points

I haven’t tried it, but I really like the “local only” approach they’ve been using in other apps.

I just wish their licenses were more open (even AGPL would be fine). Here’s the source code in case someone is interested.

permalink
report
parent
reply
15 points

It’s AI and your voice won’t be used for training if you use a local model.

Use Whisper stt. It run on your computer so nothing will be out. You can adapt the model size based on how powerful your computer is. The bigger the model the better at transcribing it will be.

permalink
report
reply
0 points

That sounds interesting. I was hoping for something that I could use on a mobile app. I’m not sure what “adapting the model size” means so this might be more complicated than I’m looking for.

permalink
report
parent
reply
2 points
*

I was hoping for something that I could use on a mobile app.

Record then transcribe later ? But you can try https://whisper.ggerganov.com ( this runs on your browser but nothing is sent. So works even on your Android/IOS phone.) the website owner is a trusted dev that made whisper.cpp and llama.ccp, the latter basically being the backbone of the entire LLM industry.

I’m not sure what “adapting the model size” means so this might be more complicated than I’m looking for.

A bit of complexity is generally the price to pay for freedom from the constant surveillance and data gathering. Plus, It’s actually super easy. Bigger model means better transcription quality, but the smaller ones are really good already. The base.en is probably all you need anyway.

On pc, you can generally try any app from github. They basically all use the same backend.

I found a few : https://whishper.net/ https://github.com/chidiwilliams/buzz

permalink
report
parent
reply
1 point

Since this is for work, I would start by asking whoever does IT stuff. You really don’t want to be sending HIPAA data off to who knows where without permission.

permalink
report
reply
1 point

it is not for work.

permalink
report
parent
reply
2 points

Okay, if it’s personal use for yourself or friends or family, then I don’t think HIPAA is a concern because you’re not a HIPAA Covered Entity (https://www.hhs.gov/hipaa/for-professionals/covered-entities/index.html). You should be able to use any of the recommendations here, or others you may find in app stores like Google Play or F-Droid.

permalink
report
parent
reply
1 point

Makes sense. I think my main concern is how can I be certain it doesn’t listen to stuff when I dont let it.

Ever have that weird phenomenon when you’re discussing something and somehow what you were talking about is the first suggested search result?

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 11K

    Posts

  • 505K

    Comments