screenpipe identify persons in audio

Sep 11 '24 22:09 louis030195

Sep 11 '24 22:09 linear[bot]

I tried myself with paynnote audio - this works pretty good. fully local. but python based. https://github.com/pyannote/pyannote-audio

highly recommended - tried different systems (also paid) and this one is really efficient and delivers good quality.

On top of this - once you have separate audio - microphone and display this is even more promising for better quality meeting notes. I've developed a Python tool that combines Whisper transcription and Pyannote diarization to create comprehensive meeting transcript. This automated system transcribes audio, identifies speakers, and integrates the results, laying the groundwork for AI-assisted prompt for good notes generations. Still got some issues on my side but its basically working and 100% local. So this is doable for sure. And it beats Rewind.ai / Limitless for sure :) Locally.

Sep 12 '24 10:09 NicodemPL

/bounty 200

definition of done:

screenpipe-audio has some code that identify speakers - i guess after the transcription?
this is sent to the screenpipe-server which would insert speakers into DB
which is then returned in db queries & api

rules:

use rust, local
do not use too much compute, screenpipe must still be usable on normal consumer hardware (eventually if it's possible to use GPU/NPU...)
ideally separate file in screenpipe-audio
works on all OSes

Sep 13 '24 19:09 louis030195

~~## 💎 $200 bounty • Screenpi.pe~~

~~### Steps to solve:~~ ~~1. Start working: Comment /attempt #306 with your implementation plan~~ ~~2. Submit work: Create a pull request including /claim #306 in the PR body to claim the bounty~~ ~~3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts~~

~~Thank you for contributing to mediar-ai/screenpipe!~~

~~Add a bounty • Share on socials~~

Attempt	Started (GMT+0)	Solution
🟢 @vishwamartur	Nov 16, 2024, 4:09:31 PM	WIP
🟢 @EzraEllette		#672

Sep 13 '24 19:09 algora-pbc[bot]

@louis030195 is this issue is still open ,would love to work on this

Nov 09 '24 10:11 kernel-loophole

@EzraEllette is on it i believe

Nov 09 '24 20:11 louis030195

Doing this now. Almost ready for a PR. Writing Tests.

Nov 09 '24 20:11 EzraEllette

So it seems like all of the hallucinations match to the same speaker, which could be useful for determining if part of a transcript is hallucination... I will update my branch, but I need to do performance improvements because right now I am segmenting speech then performing stt on each segment. @louis030195

Nov 09 '24 22:11 EzraEllette

So it seems like all of the hallucinations match to the same speaker, which could be useful for determining if part of a transcript is hallucination... I will update my branch, but I need to do performance improvements because right now I am segmenting speech then performing stt on each segment. @louis030195

?

Nov 11 '24 23:11 louis030195

#672 @EzraEllette

increased bounty to $200 for now

Nov 15 '24 18:11 louis030195

/attempt #306

Options

Cancel my attempt

Nov 16 '24 16:11 vishwamartur

💡 @EzraEllette submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Nov 16 '24 23:11 algora-pbc[bot]

🎉🎈 @EzraEllette has been awarded $200! 🎈🎊

Nov 18 '24 21:11 algora-pbc[bot]