ask-a-question icon indicating copy to clipboard operation
ask-a-question copied to clipboard

[DMP 2024]: Voice API

Open suzinyou opened this issue 2 years ago • 22 comments

Ticket Contents

Description

[Provide a brief description of the feature, including why it is needed and what it will accomplish.] Ask A Question is a free and open-source tool created to help non-profit organizations, governments in developing nations, and social sector organizations utilize Large Language Models for responding to citizen inquiries in their native languages.

Create new voice response API: the API will allow users to send questions and receive responses from AAQ using voice notes. This will increase the accessibility of AAQ to users for whom speaking/listening is easier than writing/reading.

Goals & Mid-Point Milestone

Goals

By mid-point

  • [ ] Develop an API endpoint in AAQ for sending queries in text and receiving responses in voice (text-to-speech, TTS). The first iteration may use an external TTS API
  • [ ] Develop a TTS service for AAQ using an open-source model that can replace an external TTS API

By project end

  • [ ] Develop an API endpoint for sending questions as voice notes and receiving responses as voice notes (speech-to-text, text-to-speech)
  • [ ] Integrate the TTS service into AAQ infrastructure on AWS
  • [ ] Publish a short blog post on AAQ website about the changes

For every goal listed, there will be a few rounds of design-feedback-implementation with support from the mentors and wider AAQ team.

Setup/Installation

AAQ contribution guide is here: https://idinsight.github.io/aaq-core/develop/contributing/

You will be given access to our testing environment on AWS.

Expected Outcome

  1. AAQ users can query the voice endpoints for voice questions and/or voice response. This can be seamlessly integrated into AAQ’s chat flow manager of choice, Typebot.io.
  2. AAQ users have an option to use an open-source TTS/STT model instead of an external API.

Acceptance Criteria

No response

Implementation Details

You will build the APIs in our core_backend component, which is built in Python, using FastAPI.

Our database is PostgreSQL + pgvector for managing document embeddings (contents) as well as other transactional data.

For the TTS/STT service that serves open-sourced models, you will make it as platform-agnostic as possible, which often means using Docker, but the integration will be to AWS, as our demo environment sits in AWS. You will be able to lead the architecture design for such a service. Of course, our mentors and the wider AAQ team will be available to support and think it through together.

Mockups/Wireframes

No response

Product Name

Ask A Question

Organisation Name

IDinsight

Domain

Open Source Library

Tech Skills Needed

AWS, Database, Python

Mentor(s)

@amiraliemami @lickem22 are Data Scientists at IDinsight!

Category

API, Backend, Database, Delpoyment, AI

suzinyou avatar Mar 15 '24 08:03 suzinyou

Hi @amiraliemami @lickem22 ,

I'm very interested in contributing to your project to add voice response capabilities to the Ask A Question (AAQ) chatbot. As an experienced backend developer with an internship at apnabot I have with expertise in integrating AI/ML models, databases, and cloud deployments, I believe I can help implement the text-to-speech, speech-to-text, and in-house TTS service you're looking to build. I'd welcome the chance to discuss how I can support the development and integration of these voice features into the existing AAQ infrastructure. Please let me know the best way for me to connect with your team and explore opportunities to collaborate on this exciting enhancement.

MustafaAkolawala avatar Apr 09 '24 15:04 MustafaAkolawala

Thanks @MustafaAkolawala ! Would love to see any proposed approach. Feel free to continue on this issue thread.

Also, this project is in fact part of Code4GovTech's Dedicated Mentoring Program -- see here.

suzinyou avatar Apr 10 '24 07:04 suzinyou

hello @suzinyou !

After thoroughly reviewing the various open-source TTS options, I'm convinced that ESPnet-TTS is the way to go for the AAQ voice response API project. You see, ESPnet-TTS is this super flexible, end-to-end speech processing toolkit that just fits the bill perfectly. Not only does it support the specific languages on AAQ's roadmap - Xhosa, Zulu, Hindi, and Igbo - but its modular architecture makes it easy to integrate and customize. That's crucial, given the project's need for a tailored TTS solution.

But what really seals the deal for me is that ESPnet-TTS is an actively maintained open-source project, backed by a strong community. That means you'll have ongoing improvements and the potential to expand language support down the line, as your user base grows. And the fact that it's Python-based, just like the AAQ backend, It'll make the integration process a breeze and reduce the learning curve for the dev team.

In short, ESPnet-TTS ticks all the boxes - from language support to technical alignment - to be the optimal TTS solution for this project. Although ESPnet-TTs requires some good technical knowledge to implement, i will dive deep into it and get myself familiar with it .

And yes i will be writing a detailed project proposal on this project for this year's C4GT :)

MustafaAkolawala avatar Apr 10 '24 08:04 MustafaAkolawala

Hello @suzinyou, I'm Ashutosh, a prefinal year student at IIT Jodhpur, specializing in Artificial Intelligence and Data Science. I have a strong proficiency in programming languages such as Python and C++. My experience includes working on diverse projects spanning machine learning and deep learning, including endeavors like Stock Price Prediction and Speech-to-Text Transcription. In addition to my programming skills, I have hands-on experience with various databases such as SQL (MySQL), Document-Oriented Databases (MongoDB), and Graph Databases (Neo4j). One notable project where I applied these skills is the development of a Video Search Engine.

I'm keenly interested in contributing to projects within your domain.

ashuashutosh2211 avatar Apr 13 '24 12:04 ashuashutosh2211

hey! @suzinyou @amiraliemami @lickem22

I just wanted to know is there any way I can contribute to this project before C4GT starts? cause I am really inclined to work towards this project as soon as possible

MustafaAkolawala avatar Apr 13 '24 17:04 MustafaAkolawala

Hello @suzinyou

I'm KANNAN B, a second-year B.Tech student majoring in Information Technology at Veltech Hightech Engineering College. I'm thrilled about the opportunity to contribute to your project, particularly in enhancing the Ask A Question (AAQ) chatbot with voice response capabilities. With my experience in backend development and AI/ML integration, I'm confident in my ability to assist in implementing the text-to-speech, speech-to-text, and in-house TTS service. I'm particularly excited about leveraging ESPnet-TTS for its versatility and alignment with the project's goals. I'm eager to dive into the technical aspects and contribute to the project's success. Looking forward to collaborating with your team!

Best regards, KANNAN B

kannanb2745 avatar Apr 14 '24 14:04 kannanb2745

Hello @suzinyou , I am thrilled to have the opportunity to work on the Ask A Question (AAQ) project under your mentorship. As a cloud computing student with a passion for leveraging technology to address social challenges, I believe I bring a unique blend of skills and enthusiasm to the table.

Firstly, my academic background in cloud computing has equipped me with a solid understanding of AWS services, which will be crucial for integrating the voice response API into AAQ's infrastructure on AWS. I am confident in my ability to navigate AWS environments efficiently and effectively.

Moreover, my proficiency in Python aligns well with the project's tech stack, particularly in developing APIs using FastAPI and working with PostgreSQL databases. I have hands-on experience in building backend systems, which will be invaluable for implementing the API endpoints and integrating the TTS service seamlessly into AAQ's core_backend component.

DhruvLamba avatar Apr 19 '24 13:04 DhruvLamba

Hello @suzinyou, I am thrilled to have the opportunity to work on the Ask A Question (AAQ) project under your mentorship. As a cloud computing student with a passion for leveraging technology to address social challenges, I believe I bring a unique blend of skills and enthusiasm to the table.

Firstly, my academic background in cloud computing has equipped me with a solid understanding of AWS services, which will be crucial for integrating the voice response API into AAQ's infrastructure on AWS. I am confident in my ability to navigate AWS environments efficiently and effectively.

Moreover, my proficiency in Python aligns well with the project's tech stack, particularly in developing APIs using FastAPI and working with PostgreSQL databases. I have hands-on experience in building backend systems, which will be invaluable for implementing the API endpoints and integrating the TTS service seamlessly into AAQ's core_backend component.

I have already developed a voice chat bot in python using gemini api, so it will be beneficial for your reference too.

vivekkumarsoni123 avatar Apr 22 '24 10:04 vivekkumarsoni123

Hello KANNAN B, Thank you for your interest in AAQ. To be part of this project, you can apply to Code4GovTech's Dedicated Mentoring Program here. However, you are encouraged to raise a PR in addition to your official proposal for us to review.

Best, Carlos Samey

On Sun, 14 Apr 2024 at 17:35, KANNAN B @.***> wrote:

Hello @amiraliemami https://github.com/amiraliemami @lickem22 https://github.com/lickem22,

I'm KANNAN B, a second-year B.Tech student majoring in Information Technology at Veltech Hightech Engineering College. I'm thrilled about the opportunity to contribute to your project, particularly in enhancing the Ask A Question (AAQ) chatbot with voice response capabilities. With my experience in backend development and AI/ML integration, I'm confident in my ability to assist in implementing the text-to-speech, speech-to-text, and in-house TTS service. I'm particularly excited about leveraging ESPnet-TTS for its versatility and alignment with the project's goals. I'm eager to dive into the technical aspects and contribute to the project's success. Looking forward to collaborating with your team!

Best regards, KANNAN B

— Reply to this email directly, view it on GitHub https://github.com/IDinsight/aaq-core/issues/128#issuecomment-2054082124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSGEEZEHHGJG4X7IMQE6OTY5KH43AVCNFSM6AAAAABEXRHN6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJUGA4DEMJSGQ . You are receiving this because you were mentioned.Message ID: @.***>

lickem22 avatar Apr 23 '24 08:04 lickem22

Hello Mustafa, Thank you for your interest in AAQ. To be part of this project, you can apply to Code4GovTech's Dedicated Mentoring Program here. However, you are encouraged to raise a PR in addition to your official proposal for us to review.

Best, Carlos Samey

On Sat, 13 Apr 2024 at 20:38, Mustafa Akolawala @.***> wrote:

hey! @suzinyou @amiraliemami @lickem22

I just wanted to know is there any way I can contribute to this project before C4GT starts? cause I am really inclined to work towards this project as soon as possible

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

lickem22 avatar Apr 23 '24 08:04 lickem22

Hello Ashutosh, Thank you for your interest in AAQ. To be part of this project, you can apply to Code4GovTech's Dedicated Mentoring Program here. However, you are encouraged to raise a PR in addition to your official proposal for us to review.

Best, Carlos Samey

On Sat, 13 Apr 2024 at 15:01, ashuashutosh2211 @.***> wrote:

Hello @suzinyou https://github.com/suzinyou, I'm Ashutosh, a prefinal year student at IIT Jodhpur, specializing in Artificial Intelligence and Data Science. I have a strong proficiency in programming languages such as Python and C++. My experience includes working on diverse projects spanning machine learning and deep learning, including endeavors like Stock Price Prediction and Speech-to-Text Transcription. In addition to my programming skills, I have hands-on experience with various databases such as SQL (MySQL), Document-Oriented Databases (MongoDB), and Graph Databases (Neo4j). One notable project where I applied these skills is the development of a Video Search Engine.

I'm keenly interested in contributing to projects within your domain.

— Reply to this email directly, view it on GitHub https://github.com/IDinsight/aaq-core/issues/128#issuecomment-2053626249, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSGEE4XNLOML56CQMJAOQLY5ENBFAVCNFSM6AAAAABEXRHN6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJTGYZDMMRUHE . You are receiving this because you were mentioned.Message ID: @.***>

lickem22 avatar Apr 23 '24 08:04 lickem22

Do not ask process related questions about how to apply and who to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on Unstop and any further queries can be taken up on our Discord channel titled DMP queries. Here's a Video Tutorial on how to submit a proposal for a project.

AbhimanyuSamagra avatar Apr 23 '24 10:04 AbhimanyuSamagra

Hello @lickem22 @amiraliemami, I'm happy to contribute to this project as I have already done many projects on voice recognition and TTS (text-to-speech) libraries. I have also designed a voice assistant that helps in resolving user queries for their laptop like opening apps, browsing and etc. This project is a bit similar to what I did. Also, I think I'm really good at making AAQ applications. Based on my previous experience I think I'm good for this opportunity. I look forward to making this API useful for the organizations. If you're okay to continue please assign me so that I can discuss it in detail.

Sunilstar-V avatar Apr 23 '24 18:04 Sunilstar-V

Hello @suzinyou , I am thrilled to have the opportunity to work on the Ask A Question (AAQ) project under your mentorship. As a python developer(Devops) student with a passion for leveraging technology to address social challenges, I believe I bring a unique blend of skills and enthusiasm to the table.

Firstly, my academic background in cloud computing has equipped me with a solid understanding of AWS services, which will be crucial for integrating the voice response API into AAQ's infrastructure on AWS. I am confident in my ability to navigate AWS environments efficiently and effectively.

Moreover, my proficiency in Python aligns well with the project's tech stack, particularly in developing APIs using FastAPI and working with PostgreSQL databases. I have hands-on experience in building backend systems, which will be invaluable for implementing the API endpoints and integrating the TTS service seamlessly into AAQ's core_backend component.

LuciferMorningstar33 avatar Apr 27 '24 10:04 LuciferMorningstar33

Hello @suzinyou ! I am thrilled to have the opportunity to work on the Ask A Question (AAQ) project under your mentorship. As a cloud computing student with a passion for leveraging technology to address social challenges, I believe I bring a unique blend of skills and enthusiasm to the table.

Firstly, my academic background in cloud computing has equipped me with a solid understanding of AWS services, which will be crucial for integrating the voice response API into AAQ's infrastructure on AWS. I am confident in my ability to navigate AWS environments efficiently and effectively.

Moreover, my proficiency in Database aligns well with the project's and working with PostgreSQL databases. I have hands-on experience in building backend systems, which will be invaluable for implementing the API endpoints and integrating the TTS service seamlessly into AAQ's core_ backend component.

nitish1804 avatar Apr 30 '24 18:04 nitish1804

Hello @Sunilstar-V , Thank you for your interest in AAQ. To be part of this project, you can apply to Code4GovTech's Dedicated Mentoring Program here. However, you are encouraged to raise a PR in addition to your official proposal for us to review.

lickem22 avatar May 03 '24 08:05 lickem22

Hello @nitish1804 , Thank you for your interest in AAQ. To be part of this project, you can apply to Code4GovTech's Dedicated Mentoring Program here. However, you are encouraged to raise a PR in addition to your official proposal for us to review.

lickem22 avatar May 03 '24 08:05 lickem22

Hello @LuciferMorningstar33 , Thank you for your interest in AAQ. To be part of this project, you can apply to Code4GovTech's Dedicated Mentoring Program here. However, you are encouraged to raise a PR in addition to your official proposal for us to review.

lickem22 avatar May 03 '24 08:05 lickem22

Hello @suzinyou , I am Aradhya Pitlawar, a Third Year Undergraduate studying Computer Science and Engineering at Walchand College of Engineering Sangli. I am a proficient python developer and have a lot of experience in developing APIs. I have previously worked in text-to-speech functionality for an internship in a startup and have a proficient experienc in it. Moreover, you can check out my project GIDEON in which i made a rule based C++ Desktop assistant in which i used eSpeak module for text-to-speech and speech-to-text function

I beleive that i can successfully complete this project: 1.first milestone - USing gTTs python library due to its latest stable releases and good community support because of Google's support. 2. Second Milestone - We can use ESPnet-TTS or espeak module it self depending on how customized responses we want.

I assure you the completion of this project by me through C4GT , if i get selected !

ThunderSmoker avatar May 07 '24 05:05 ThunderSmoker

Weekly Goals

Week 1

  • [x] Go through the codebase and get yourself familiar with it
  • [x] start first implementation of TTS end point with external API
  • [x] Finish technical design of TTS API

Week 2

  • [x] Finish technical design of STT API endpoint
  • [x] Progress implementation of TTS with external API
  • [x] Start implementation of STT endpoint with external API (edited)

Week 3

  • [x] Progress in implementation of TTS implementation with external API
  • [x] R&D on which inhouse TTS and STT source model to use

Week 4

  • [x] Finish implementation of TTS implementation with external API
  • [x] Start implementation of STT with external API

Week 5

  • [x] Continue implementation of STT with external API
  • [x] Start integrating it to STT with GCP

Week 6

  • [x] Raise PR for STT implementation with external API
  • [x] Research integrating Bhashini

Week 7:

  • [x] Improve GA integration and tests
  • [x] Start working on GCP integration

Week 8:

  • [x] Finish implementation of internal STT
  • [x] Raise PR for GHA workflow and seperate STT tests
  • [x] Raise PR for GCP integration in speech workflow

Week 9

  • [x] Merge STT tests PR
  • [x] Merge GCP integration PR
  • [x] Finish the Speech-to-Speech design
  • [x] Start implementing the Speech-to-Speech workflow.

Week 10

  • [x] Merge implementing the new Speech-to-Speech endpoint design.
  • [x] Finish external TTS and STT implementation
  • [x] Support on Turn.io workflow.

Week 11

  • [x] Merge external TTS and STT implementation
  • [x] Start implementing internal TTS
  • [x] Work on documentation

Week 12

  • [ ] Finish implementation and raise PR
  • [ ] Finish implementation of docs and raise PR
  • [ ] Write blogpost for voice component

lickem22 avatar Jun 20 '24 11:06 lickem22

Weekly Learnings & Updates

Week 1

  • Understanding production codebase structure and best practices.
  • Learning about automated workflows and CI/CD processes.
  • Mastering modular and clean code writing techniques.
  • Creating technical designs for system architecture.
  • Developing skills for effective team collaboration and communication in a professional software development environment

Week 2

  • How to use pytests and make my own tests.
  • End-to-End Testing of functionalities

Week 3

  • Learnt how to use Vosk as an external API for STT
  • Learnt how to pre process MP3/WAV files and convert them to mel-spectograms for further processing

Week 4

  • Learned about Hugging Face and how it could effectively assist in integrating Whisper.
  • Researched Bhashini and its potential integration for Indic languages
  • Studied the creation of separate Docker containers and their collaborative use to build a more efficient pipeline. Also learned about multi-stage building

Week 5

  • Learnt how to create my own images and dockerfile and use shared volumes across docker containers to persist data during runtime.
  • Learnt to create FastApi apps and recieving and handling of multipart/form data.

Week 6

  • Learnt how to use Monkeypatching and MagicMock to mock and patch external depdendencies and I/O during writing of unit and functional tests.
  • Studied about integrating GCP cloud buckets for the STT and TTS mp3 file storage.

Week 7

  • Learnt how to make Makefiles to automate the execution of tests.
  • Learnt how to make Github Action Workflows to create my own CI/CD pipeline to execute unit tests whenever a commit is pushed in the speech_api directory.

Week 8

  • Studied about Vertex AI and liteLLM proxy to integrate google STT and TTS external APIs for demo day
  • Gained Insights on UX design of endpoints for speech workflow

Week 9

  • Read google cloud TTS and STT documentation
  • Learnt how to manage different types of media types and convert the same

Week 10

  • researched about chat managers like gliffic, typebot, turn etc
  • learnt how to make chat voiceflows on various types of chatbots to connect to aaq

Week 11

  • Researched about more internal TTS models according to specific usecases
  • Learnt to write documentation using Mkdocs for aaq
  • Learnt to write blogposts

MustafaAkolawala avatar Jun 22 '24 12:06 MustafaAkolawala

Hi @amiraliemami, My name is Madhalasa, and I’ve recently completed my B.E in AI & ML from RNSIT, Bangalore. I’ve done an internship at Infosys Springboard an an AI Intern and have skills in database and Python. As a fresher , I'm eager to contribute to this project. Is there a preferred method for communicating with the mentors?

MadhalasaSJ avatar Aug 16 '24 15:08 MadhalasaSJ