Silver

Results 10 issues of Silver

It seems that your demo webset can not be accessed. Can you fix it?

You have mentioned in the paper that you will release these filtered single-domain data sets, along with the code to create them from the original SGDD data. However, I do...

### Is your feature request related to a problem? Please describe. ChatGLM-6B使用了icetk,在其词表中,前20000个token是预留给图片的,在文本模型中没有用到这些图片token,但是在infer和微调的时候,这些token对应的embedding依然需要被加载,并且在解码每一个token的时候需要多计算20K个logits,会占用不少显存。 ### Solutions 我实现了ChatGLM-6B-Slim ChatGLM-6B-Slim是在ChatGLM-6B的基础上通过裁剪词表构建的。裁剪了前20K个image token。节省了一些显存和计算。解码结果完全一致。 代码请见: https://github.com/silverriver/ChatGLM-6B-Slim 有需要的同学可以直接使用。ChatGLM-6B-Slim可以认为是ChatGLM-6B的一个低显存版等价平替。 ### Additional context _No response_

I am wondering why we should set `fan_in_fan_out` based on `len(peft_config.target_modules)` when we are using lora? https://github.com/huggingface/peft/blob/64f63a7df2a02cfd144592d9aa9c871b59258c55/src/peft/mapping.py#L120 In my understanding, I can set any layer to lora layer, and control...

### Describe the bug When load a large dataset with the following code ```python from datasets import load_dataset dataset = load_dataset("liwu/MNBVC", 'news_peoples_daily', split='train') ``` We encountered the error: "OverflowError: Python...

arrow

RT 和对话相关的数据集也有很多呀

Summary I would like to propose the addition of constrained decoding support. This feature would allow the output sequence to be constrained by a Finite State Machine (FSM) or Context-Free...

triaged
feature request

### System Info - 8*A800 80G ### Who can help? @kaiyux ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [X]...

bug

I am reading the script for reproducing fineweb. I have noticed that in the first pipeline that you use Trafilatura to extract text out of WARC Records: ```python main_processing_executor =...

When launching dependent `LocalPipelineExecutor`, using the flag `skip_completed=False` in previous executor will lead to the following exector wait forever. For example: ``` executor1 = LocalPipelineExecutor( pipeline=[ ... ], tasks=10, logging_dir=f"logs/tokz",...