identify persons in audio
I tried myself with paynnote audio - this works pretty good. fully local. but python based. https://github.com/pyannote/pyannote-audio
highly recommended - tried different systems (also paid) and this one is really efficient and delivers good quality.
On top of this - once you have separate audio - microphone and display this is even more promising for better quality meeting notes. I've developed a Python tool that combines Whisper transcription and Pyannote diarization to create comprehensive meeting transcript. This automated system transcribes audio, identifies speakers, and integrates the results, laying the groundwork for AI-assisted prompt for good notes generations. Still got some issues on my side but its basically working and 100% local. So this is doable for sure. And it beats Rewind.ai / Limitless for sure :) Locally.
/bounty 200
definition of done:
- screenpipe-audio has some code that identify speakers - i guess after the transcription?
- this is sent to the screenpipe-server which would insert speakers into DB
- which is then returned in db queries & api
rules:
- use rust, local
- do not use too much compute, screenpipe must still be usable on normal consumer hardware (eventually if it's possible to use GPU/NPU...)
- ideally separate file in screenpipe-audio
- works on all OSes
~~## 💎 $200 bounty • Screenpi.pe~~
~~### Steps to solve:~~
~~1. Start working: Comment /attempt #306 with your implementation plan~~
~~2. Submit work: Create a pull request including /claim #306 in the PR body to claim the bounty~~
~~3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts~~
~~Thank you for contributing to mediar-ai/screenpipe!~~
~~Add a bounty • Share on socials~~
| Attempt | Started (GMT+0) | Solution |
|---|---|---|
| 🟢 @vishwamartur | Nov 16, 2024, 4:09:31 PM | WIP |
| 🟢 @EzraEllette | #672 |
@louis030195 is this issue is still open ,would love to work on this
@EzraEllette is on it i believe
Doing this now. Almost ready for a PR. Writing Tests.
So it seems like all of the hallucinations match to the same speaker, which could be useful for determining if part of a transcript is hallucination... I will update my branch, but I need to do performance improvements because right now I am segmenting speech then performing stt on each segment. @louis030195
So it seems like all of the hallucinations match to the same speaker, which could be useful for determining if part of a transcript is hallucination... I will update my branch, but I need to do performance improvements because right now I am segmenting speech then performing stt on each segment. @louis030195
?
#672 @EzraEllette
increased bounty to $200 for now
💡 @EzraEllette submitted a pull request that claims the bounty. You can visit your bounty board to reward.