MiniCPM [Feature Request]: Add a Google Colab link that users can enjoy the model in a few minutes

Feature request / 功能建议

In many cases of deploying software, especially AI software, resolving potential errors or technical dependencies can be very time-consuming. It would be good if you could provide a Google Colab link, which will enable everyone to enjoy it immediately. Your 2B model is promising. We just cannot wait to try it...

Feb 02 '24 07:02 jimuyouyou

try https://colab.research.google.com/drive/1tJcfPyWGWA5HezO7GKLeyeIso0HyOc0l#scrollTo=bjqvPUp013uv

Feb 02 '24 08:02 soulteary

Feature request / 功能建议

In many cases of deploying software, especially AI software, resolving potential errors or technical dependencies can be very time-consuming. It would be good if you could provide a Google Colab link, which will enable everyone to enjoy it immediately. Your 2B model is promising. We just cannot wait to try it...

we have made a Colab code samples which use transformers to test MiniCPM-2B, just click this link to run:

https://colab.research.google.com/drive/1Ipfi93mVkKP3OrOdxRH482UKU-5y65W5?usp=sharing

Feb 03 '24 06:02 DataLearnerAI

Thanks for your links.

I tried them, and it seems that MiniCPM-2B does not support CPU for now. Because Google Colab reports error: '... only support GPU'.

I think the support of the CPU is a crucially important use case for this model. Many people are trying to cut down their costs on LLMs, and it is too high to afford with GPU. A model with 2B parameters enables the possibility to deploy it in a traditional VPS using CPU (without GPU), meaning that this model can be applied widely in all sectors.

Setup your 2B model locally in a VPS is also a huge demand in commercial use cases. That's because existing LLMs (including chatGPT and Bard) require around 2 seconds to respond to end users - 2 seconds is too long to be accepted. When considering further processing of data (additional 1-2 seconds), for a real app feature, it means 3-4 seconds waiting for end users. This discourages users too much, and many users will end up losing patience after initial interest. In this case, your 2B model could help, both in computing speed and the local deployment (meaning almost no network transfer time).

Go ahead, please. I'd like to see more exciting features from your team.

Feb 03 '24 14:02 jimuyouyou