screenpipe icon indicating copy to clipboard operation
screenpipe copied to clipboard

identify persons in audio

Open louis030195 opened this issue 1 year ago • 13 comments

louis030195 avatar Sep 11 '24 22:09 louis030195

I tried myself with paynnote audio - this works pretty good. fully local. but python based. https://github.com/pyannote/pyannote-audio

highly recommended - tried different systems (also paid) and this one is really efficient and delivers good quality.

On top of this - once you have separate audio - microphone and display this is even more promising for better quality meeting notes. I've developed a Python tool that combines Whisper transcription and Pyannote diarization to create comprehensive meeting transcript. This automated system transcribes audio, identifies speakers, and integrates the results, laying the groundwork for AI-assisted prompt for good notes generations. Still got some issues on my side but its basically working and 100% local. So this is doable for sure. And it beats Rewind.ai / Limitless for sure :) Locally.

NicodemPL avatar Sep 12 '24 10:09 NicodemPL

/bounty 200

definition of done:

  • screenpipe-audio has some code that identify speakers - i guess after the transcription?
  • this is sent to the screenpipe-server which would insert speakers into DB
  • which is then returned in db queries & api

rules:

  • use rust, local
  • do not use too much compute, screenpipe must still be usable on normal consumer hardware (eventually if it's possible to use GPU/NPU...)
  • ideally separate file in screenpipe-audio
  • works on all OSes

louis030195 avatar Sep 13 '24 19:09 louis030195

~~## 💎 $200 bounty • Screenpi.pe~~

~~### Steps to solve:~~ ~~1. Start working: Comment /attempt #306 with your implementation plan~~ ~~2. Submit work: Create a pull request including /claim #306 in the PR body to claim the bounty~~ ~~3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts~~

~~Thank you for contributing to mediar-ai/screenpipe!~~

~~Add a bountyShare on socials~~

Attempt Started (GMT+0) Solution
🟢 @vishwamartur Nov 16, 2024, 4:09:31 PM WIP
🟢 @EzraEllette #672

algora-pbc[bot] avatar Sep 13 '24 19:09 algora-pbc[bot]

@louis030195 is this issue is still open ,would love to work on this

kernel-loophole avatar Nov 09 '24 10:11 kernel-loophole

@EzraEllette is on it i believe

louis030195 avatar Nov 09 '24 20:11 louis030195

Doing this now. Almost ready for a PR. Writing Tests.

EzraEllette avatar Nov 09 '24 20:11 EzraEllette

So it seems like all of the hallucinations match to the same speaker, which could be useful for determining if part of a transcript is hallucination... I will update my branch, but I need to do performance improvements because right now I am segmenting speech then performing stt on each segment. @louis030195

EzraEllette avatar Nov 09 '24 22:11 EzraEllette

So it seems like all of the hallucinations match to the same speaker, which could be useful for determining if part of a transcript is hallucination... I will update my branch, but I need to do performance improvements because right now I am segmenting speech then performing stt on each segment. @louis030195

?

louis030195 avatar Nov 11 '24 23:11 louis030195

#672 @EzraEllette

increased bounty to $200 for now

louis030195 avatar Nov 15 '24 18:11 louis030195

/attempt #306

Options

vishwamartur avatar Nov 16 '24 16:11 vishwamartur

💡 @EzraEllette submitted a pull request that claims the bounty. You can visit your bounty board to reward.

algora-pbc[bot] avatar Nov 16 '24 23:11 algora-pbc[bot]

🎉🎈 @EzraEllette has been awarded $200! 🎈🎊

algora-pbc[bot] avatar Nov 18 '24 21:11 algora-pbc[bot]