StableLM issues

More than 4096 context length?

8

Is it possible to have larger context as this allows to do more complicated things with smaller models? A lot of the negatives of a smaller model can be rectified...

StoyanStAtanasov

Is it feasible to combine diffusion models and language models to mimic divergent thinking?

2

Does using a diffusion model in a language model increase the generality of the language model?

win10ogod

question

GPU support Table & VRAM usage

34

It would be great to get the instructions to run the 3B model locally on a gaming GPU (e.g. 3090/4090 with 24GB VRAM). ### Confirmed GPUs From this thread |...

enricoros

Hi, I want to fine-tune the 7b model, am I supposed to download the provided checkpoint and fine-tune it as shown in this repo: https://github.com/EleutherAI/gpt-neox#using-custom-data . Would they be compatible...

berkecanrizai

question

Poor Benchmark Results (Needs Addressed)

6

As seen [in this popular spreadsheet](https://docs.google.com/spreadsheets/d/1kT4or6b0Fedd-W_jMwYpb63e1ZR3aePczz3zlbJW-Y4) by @lhl , StableLM-Alpha-7B currently scores below 5 year old 1GB models with 700M parameters and well below its architectural cousin GPT-J-6B which is...

MarkSchmidty

Watching and chatting video with StableLM, and Ask anything in video.

2

Thanks for your amazing work! We have simply extended StableLM for video question answering in our project [Ask-Anything](https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat_with_StableLM). In our attempts, it can generate longer content than chatGPT, but without...

yinanhe

Dataset used to pre-train

5

Hi there! First of all, thank you for the amazing work! The readme says the models were trained on "the new dataset based on The Pile" which is 3x the...

agademic