yucc-leon comments

Results 23 comments of


                                            yucc-leon

> > 如果这是出错原因的话那岂不是说构造数据的时候模板没对齐？ > > 我没看完所有源码，所以我的说法很可能不对。我猜应该是对齐了，理由是dummy data的格式是和GLM2一致的。 sorry for the ambiguity. user_input_ids = tokenizer(user_input, return_tensors="pt", add_special_tokens=False).input_ids 与 user_input_ids = tokenizer(user_input, return_tensors="pt").input_ids 的差别在于生成出的id序列不同，可以打印看一下。通常第二行代码会根据tokenizer的默认配置基于input在首尾处添加特殊的token_id，这类代码对于不同tokenizer设置的效用是不同的。我说的“没对齐”是指本项目没有完整覆盖chatglm2-6b的所有适配细节。不过还没试过这些代码，不知道有没有其他人遇到过。如果想进一步确认建议你可以加群问一下，获得反馈可能更快些。

Load codeparrot/apps raising UnicodeDecodeError in datasets-2.18.0

Here is the requirements.txt from a clean virtual environment (managed by conda) where I only install `datasets` by `pip install datasets`. The pip list: ``` aiohttp==3.9.3 aiosignal==1.3.1 attrs==23.2.0 certifi==2024.2.2 charset-normalizer==3.3.2...

SuccessorWithDelete.java doesn't stand logarithmic time requirement

> Hi, > I was looking for a solution for Successor with delete problem and came across this repo. > From a glance it looks like your solution is correct...

评测疑似数据泄漏？

我记得CEval的评测集部分是原创且封闭的。示例数据不是

falcon相关代码

尝试加载 Falcon 仍然提示： ModuleNotFoundError: No module named 'transformers_modules.OpenBuddy.openbuddy-falcon-7b-v1' 我的代码： model = AutoModelForCausalLM.from_pretrained("OpenBuddy/openbuddy-falcon-7b-v1.5-fp16", trust_remote_code=True) 请问是哪里有问题吗？

官方示例输出，是否符合期望？

![image](https://github.com/baichuan-inc/baichuan-7B/assets/17897916/99c98262-ba52-4dc9-b208-2f72f33d8aff) ![image](https://github.com/baichuan-inc/baichuan-7B/assets/17897916/50aa4044-1461-4fd6-a0b0-92b09821f5da) 看来部分输出是符合预期的，但需要做一些限制。

What prevents you from throughly opensourcing?

> Just want to point out that the data processing pipeline is open-source (https://github.com/bigcode-project/the-stack-v2). It is also the case for StarCoder1 (https://github.com/bigcode-project/bigcode-dataset/). To my knowledge, StarCoders are the only code...

What prevents you from throughly opensourcing?

> #4 The reasons for not fully open-sourcing pretraining and data processing code in projects like BigCode/StarCoder(2) may include:BigCode/StarCoder(2) 等项目中未完全开源预训练和数据处理代码的原因可能包括： > > Intellectual Property: Protecting unique innovations or proprietary techniques.知识产权：保护独特的创新或专有技术。...

yucc-leon

你好，群满了，还有其他途径进群交流吗？

ChatGLM2-6B推理时胡言乱语现象及可能的解决办法

ChatGLM2-6B推理时胡言乱语现象及可能的解决办法

Load codeparrot/apps raising UnicodeDecodeError in datasets-2.18.0

SuccessorWithDelete.java doesn't stand logarithmic time requirement

评测疑似数据泄漏？

falcon相关代码

官方示例输出，是否符合期望？

What prevents you from throughly opensourcing?

What prevents you from throughly opensourcing?