liuyaox issues

Results 6 issues of


                                            liuyaox

HAN中的Document编码形式似乎不妥？

https://github.com/ShawnyXiao/TextClassification-Keras/blob/a447bd9b0561a9364482e0e77eee9214d97d9887/model/HAN/main.py#L22 如上line22-25这4行代码，所示编码过程好像如下： Step1: 强行在document(所有句子)后面padding一次，而不是在每个句子后面都padding一次，形如：（---表示句子） -----------,------,--- ------------,-------- --,000000000000000000 00000000000000000000 Step2: 强行把document按maxlen_sentence(假设为20)划分看，而非原本句子的自然划分，形如：（|表示向量划分） -----------,------,---|------------,--------|--,000000000000000000|00000000000000000000 我认为，应该是每个句子内先进行Word Level的编码，然后再进行句子间的Sentence Level编码？形如： ----------- 000000 000|------000000 00000000|-- -------------00000|----------0000000000 大家如何看待？

X和Y应该要使用相同的mask_value吧？

https://github.com/stephen-v/zh-NER-keras/blob/78ab9b314b5c77971fa08a2c8edcf194f40567d5/process_data.py#L52 这里处理Y的mask_value是-1，上面X的是0，不一致？我看keras_contrib中计算crf_accuracy的源代码里，处理y_pred时使用的mask是input_masks[0]，也就是其实跟处理X时是同一个。

main_feature为all时，处理了2次word_embedding，没有处理char_embedding？

https://github.com/nlpjoe/daguan-classify-2018/blob/e2539e55769b1286631d95e786b20b8344266738/src/model/attention1.py#L41 代码很好，学到很多~ 有个疑问：这里应该是char_embedding吧？因为上面line21-24表示当main_feature为all时处理的是word_embedding，此时应该补充处理一下char_embedding吧？同样的疑问也在textcnn_model.py中出现了，line37-40， line53-55

chatgpt template `get_prompt` got error ValueError: Invalid style: None

I was using `gpt-4`, and I found that `sep_style=None` in its template, so `get_prompt` got the error: ValueError: Invalid style: None ![img_v3_0298_61fcdc3a-e0ed-4513-91f8-021ad208693g](https://github.com/lm-sys/FastChat/assets/7260977/64270709-4262-431d-9b3f-c17831f2855d)

How to eval output with ideal_answer directly without having to define the completion_fn ?

### Describe the feature or improvement you're requesting I have already had the output (generated from LLM) and ideal_answers in my jsonl file. For a look: ``` {'input': 'what is...

After lsh.query, remove text itself ?

https://github.com/onesuper/HuggingFace-Datasets-Text-Quality-Analysis/blob/92f66886bf96824ebbe59f55e037413069f8429c/app.py#L486 After lsh.query, before added into unique_documents, results.remove(str(i)) ?