Michael Feil issues

Results 83 issues of


                                            Michael Feil

Integrate multiple workflows as tabs or subpaths

![image](https://github.com/Chainlit/chainlit/assets/63565275/c0a783f0-f1d3-4268-a6b5-e5c5fa427297) Idea: - have multiple chains / workflows running side-by-side one tab could then be - Q&A - one could be Google Query - Image Generation with DallE This would...

enhancement

stale

Fine-tuning of OpenKoala

Thanks for opening up LLama! Writing this as issue is easier said than the training itself, so propably this issue to you as authors of Koala.. As the diff weights...

Adding support for CTranslate2 acceleration

## Feature request The feature would be to support accelerated inference with the CTranslate2 framework https://github.com/OpenNMT/CTranslate2 ## Motivation Reasons to CTranslate2 #### faster float16 generation In my case, outperforms VLLM...

Stale

CodeGen Converter

This PR aims to integrate CodeGen. Work in progress, not ready.

Creating a Docker image or Dockerfile from this repo.

It would be awesome to build a docker image for this repo.

Split converted model.bin into multiple .bin

Something that might be coming up more often with Models such as LLama-2-70b / Bloom etc. Is there any way to split the model.bin into multiple subfiles Example: e.g in...

enhancement

[Feature] support PagedAttention in cuda attention.cc

[VLLM ](https://github.com/vllm-project/vllm) implemented a mechanism called "PagedAttention", which helps in fast generation of long sequences. This is might be quite a large feature request. Blog: https://vllm.ai/ and maybe this https://github.com/vllm-project/vllm/blob/665c48963be11b2e5cb7209cd25f884129e5c284/vllm/model_executor/layers/attention.py#L16...

enhancement

Integrate Infinity Framework for Enhanced Embedding Inference Speed

### 🚀 The feature I propose the integration of the [Infinity framework](https://github.com/michaelfeil/infinity) into embedchain to significantly speed up embedding inference. Infinity is a pure Python framework designed to enhance the...

enhancement

medium

Adding Bert-Embeddings server / how to add torch

I am building https://github.com/michaelfeil/infinity and would love to contribute to ollama. It is compatible with cuda, cpu and mps, with the option to run onnx models. Beta also for torch...

feature request

Warning on FastText "load model"

Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar. https://github.com/facebookresearch/fastText/issues/1056