NextChat [Feature] 支持多模态对话功能：对话的问题和回复都支持文本、图片、音频

你想要什么功能或者有什么建议？ 支持多模态对话功能：对话的问题和回复都支持文本、图片、音频。随着官方 ChatGPT 多模态的推出，期待未来 ChatGPT-Next-Web 有计划支持多模态的对话输入输出。

有没有可以参考的同类竞品？ 官方 ChatGPT 多模态的功能

Oct 29 '23 08:10 easeaico

Title: [Feature] Support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio

**What features do you want or have any suggestions? ** Support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio. With the launch of official ChatGPT multi-modality, we look forward to ChatGPT-Next-Web planning to support multi-modal conversation input and output in the future.

**Are there any similar competing products that we can refer to? ** Official ChatGPT multi-modal functionality

Oct 29 '23 08:10 Issues-translate-bot

希望支持多模态对话功能：对话的问题和回复都支持文本、图片、音频。

Nov 01 '23 03:11 chenminmin4

Hope to support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio.

Nov 01 '23 03:11 Issues-translate-bot

希望支持生成图片，语音，

Nov 07 '23 00:11 super999

Hope to support generating pictures, voices,

Nov 07 '23 00:11 Issues-translate-bot

GPT-4 Turbo with vision https://openai.com/blog/new-models-and-developer-products-announced-at-devday

Nov 07 '23 00:11 gutenye

希望支持多模态对话功能：对话的问题和回复都支持文本、图片、音频。同时支持文件上传，服务器缓存文件功能。

Nov 07 '23 00:11 xiaosatai

Hope to support multi-modal dialogue function: dialogue questions and replies support text, pictures, and audio. It also supports file upload and server caching file functions.

Nov 07 '23 00:11 Issues-translate-bot

github许愿池

Nov 07 '23 03:11 mountainguan

github wishing pool

Nov 07 '23 03:11 Issues-translate-bot

@Yidadaa would you appreciate any work on this (e.g. PoC) that you can leverage? Or are you in the middle of a major refactoring of the related app architecture?

Nov 09 '23 11:11 DirkSchlossmacher

多模态太重要了, 会带来无限可能.

Nov 13 '23 10:11 Avey777

Multimodality is so important and will bring endless possibilities.

Nov 13 '23 10:11 Issues-translate-bot

Multi-modal support is desired: support for external file uploads and server caching, and support for conversations and replies in text, images, audio, and video.

Nov 16 '23 03:11 ikexue

gpt-4-vision-preview 无法显示完整对话，估计是 openai 的 bug。需要传 max_token: 4096

可以参考这里：feat: 支持 gpt-4-vision-preview

Nov 17 '23 03:11 xcatliu

gpt-4-vision-preview cannot display the complete dialogue, which is probably an openai bug. Need to pass max_token: 4096

You can refer here:

https://github.com/xcatliu/Chatgpt-sxt/commit/64e893BC09BCFA5B62BAD461A488CBDCF1 #Diff-C92AE8BA73287976525D66897 EC86C7DA0A555C871123A0DEABA2F6R170

Nov 17 '23 03:11 Issues-translate-bot

希望支持语音对话的功能。

Nov 21 '23 06:11 yinbc

Hope to support voice conversation function.

Nov 21 '23 06:11 Issues-translate-bot

Same.

Nov 26 '23 13:11 zhuozhiyongde

希望增加dell-a，还有对图像的支持。

Nov 26 '23 18:11 jqjhl

I hope to add dell-a and support for images.

Nov 26 '23 18:11 Issues-translate-bot

Model DALL·E must use own storage service for stored a image, because when you generating a image, the image will disappear in 30 minute ~ 2 hours (approx)

Nov 26 '23 21:11 H0llyW00dzZ

Isn't "own storage" what the app already has built-in: an Upstash Redis DB integration - now for chat messages backup, but could be extended for persisting images, at least if downscaled

Nov 27 '23 07:11 DirkSchlossmacher

for image

In scenarios where an image is generated by DALL-E, the response is a JSON containing only the image URL, along with the date, time, and revised prompts (specifically for DALL-E 3). Since the image URL has a limited download duration, it would be beneficial if the image URL from DALL-E models could be automatically downloaded and then uploaded to a storage solution that allows for image display in markdown format.

Nov 27 '23 07:11 H0llyW00dzZ

同类竞品: OpenCat

Nov 27 '23 08:11 gutenye

Competing products: OpenCat

Nov 27 '23 08:11 Issues-translate-bot

希望增加DALL·E模型用于生成图片对话支持多模态，支持图片音频文档

Nov 27 '23 09:11 lph66152137

It is hoped that the DALL·E model can be added to generate pictures. The dialogue supports multi-modality and supports pictures, audio documents.

Nov 27 '23 09:11 Issues-translate-bot

已实现dall-e-3画图、gpt4-vision-preview识图、whisper语音转文字、tts文字转语音 https://github.com/vual/ChatGPT-Next-Web-Pro

Nov 30 '23 08:11 vual

My side supports the gpt4-vision-preview image recognition function, you can check it out. https://github.com/vual/ChatGPT-Next-Web-Pro

Nov 30 '23 08:11 Issues-translate-bot