knowledge-graph-learning issues

[Resource] Useful tools & lecture related to data science

1

这个issue打算放一些我个人觉得不错的工具和教程，做一个备份。 ## 2025/02 [What is an Agent?](https://huggingface.co/learn/agents-course/unit1/what-are-agents) - 这里对于agent的定义挺好的：Agent is: an AI model capable of reasoning, planning, and interacting with its environment. ``` Think of the Agent as having two...

BrambleXu

Resource

AAAI-2017-Distant Supervision for Relation Extraction with Sentence-Level Attention and Entity Descriptions

2

**一句话总结：** 基于 #14 , 通过考虑输入新的特征，entity descriptions，然后通过attention model来选择valid instances。 **资源：** - [pdf](https://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/viewFile/14491/14078) - [code]( **关键字：** - dataset: NYT - **笔记：** 基于PCNN，所以叫做APCNN, 多了个attention. 输入和PCNN一样。word+position. **模型图：** ![image](https://user-images.githubusercontent.com/10768193/56017105-40037c00-5d39-11e9-8bea-d62eec346929.png) **结果**： ![image](https://user-images.githubusercontent.com/10768193/56017158-5e697780-5d39-11e9-9beb-f6c86e5e04fe.png)

BrambleXu

RE(T)

DS(M)

Attention(M)

NA(P)

ICLR-2021-Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

**Summary:** 有名的数据集（测试集）里结果充满了标注错误。 **Resource:** - [pdf](https://arxiv.org/abs/2103.14749) - [code](https://github.com/cgnorthcutt/cleanlab) - [paper-with-code]( **Paper information:** - Author: - Dataset: - keywords: **Notes:** 被认为是错误的标签要通过confident learning和human两重验证。在被算法认为是错误标签的数据中，发现54%确实是被错误标注的。该研究发现低复杂度的模型比高复杂度模型在错误标注的数据上效果更好。 **Model Graph:** **Result:**： **Thoughts:** **Next Reading:**

BrambleXu

Annotation(T)

NA(P)

Slide-2013-Active Learning 入門

**Summary:** AL介绍的日语资料 **Resource:** - [pdf](https://www.slideshare.net/shuyo/introduction-to-active-learning-25787487) - [code]( - [paper-with-code]( **Paper information:** - Author: - Dataset: - keywords: **Notes:** **Model Graph:** **Result:**： **Thoughts:** **Next Reading:**

BrambleXu

AC(T)

arXiv-2020-A Survey of Deep Active Learning

**Summary:** DL的本质是需要大量数据的，而AC的本质是用少量数据。随着DL的崛起，依赖于传统ML模型的AC有点不合潮流。如何将AL用于神经网络是一个课题。于是有了DAL。 **Resource:** - [pdf](https://arxiv.org/abs/2009.00236) - [code]( - [paper-with-code]( **Paper information:** - Author: - Dataset: - keywords: **Notes:** # 1 INTRODUCTION AL很难处理高维数据，所以出现了DL和AL的合体，DAL。 > DAL has been widely utilized in various...

BrambleXu

AC(T)

Book-2012-Natural Language Annotation for Machine Learning

**Summary:** 这是本关于标注的书。 **Resource:** - [pdf](https://doc.lagout.org/science/Artificial%20Intelligence/Machine%20learning/Natural%20Language%20Annotation%20for%20Machine%20Learning_%20A%20Guide%20to%20Corpus-...%20%5BPustejovsky%20%26%20Stubbs%202012-11-04%5D.pdf) - [code]( - [paper-with-code]( **Paper information:** - Author: - Dataset: - keywords: **Notes:** > 我发现自己对于AL的看法是错的。我一直以为AL是用于annotation的，但其实是为了训练模型的一种方法。因为AL的目标并不是尽可能多的去标注数据。只要模型正确率变高了，那么就可以不用再标注下去了。关于下面第12章，众包标注是今后一个趋势。而对于大量数据，boostring， active learning, semi-supervised learning则是三种方案。 - Amazon’s Mechanical Turk -...

BrambleXu

Annotation(T)

EBOOK-2018-An Introduction to Active Learning

**Summary:** 来自figure eight的关于AC的介绍资料 **Resource:** - [pdf](https://github.com/BrambleXu/knowledge-graph-learning/files/6283412/Figure_Eight_Intro_Active_Learning.pdf) - [link](https://www.kdnuggets.com/2018/12/figure8-ebook-introduction-active-learning.html) - [code]( - [paper-with-code]( **Paper information:** - Author: - Dataset: - keywords: **Notes:** 课题：快速标注的同时，标注那些更加具代表性的样本 ![image](https://user-images.githubusercontent.com/10768193/113950475-5a9b5d80-984c-11eb-9a32-b61308274cb4.png) Not all examples carry the same quality...

BrambleXu

AC(T)

LREC-2004-Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy

**Summary:** 这个是日语的工作。将原本只有7个分类的NER数据扩大到了200个分类 **Resource:** - [pdf](https://www.aclweb.org/anthology/L04-1051/) - [项目主页：拡張固有表現](https://nlp.cs.nyu.edu/ene/ene_j_20160801/start.htm) - [code]( - [paper-with-code]( **Paper information:** - Author: - Dataset: - keywords: **Notes:** 这个工作是构建了一个阶层性的NE结构，主要是新闻领域，因为目标是general IE/QA。通过下面3个步骤构建NE层级: 1. 基于新闻语料库，抽取3500个候补NE 2. 基于既存的系统和NER任务，获取NE 3. 基于类义词典 **Model...

BrambleXu

Annotation(T)

NER(T)

JP(P)

Dict(M)

COLING-2002-Language Independent NER using a Unified Model of Internal and Contextual Evidence

**Summary:** 基于 iterative learning in a co-training fashion 的跨语言NER模型，创新点在于使用word-internal和contextual information作为独立的信息来源。对象是西班牙语和荷兰语。 **Resource:** - [pdf](https://www.aclweb.org/anthology/W02-2007/) - [code]( - [paper-with-code]( **Paper information:** - Author: - Dataset: - keywords: **Notes:** 2. Entity-Internal Information 原理是考虑前缀和后缀。但是这种方法有很大的问题:...

BrambleXu

NER(T)

LA(T)

ACL-2009-A Web Survey on the Use of Active Learning to Support Annotation of Text Data

**Summary:** 关于AC的综述 **Resource:** - [pdf](https://www.aclweb.org/anthology/W09-1906/) - [code]( - [paper-with-code]( **Paper information:** - Author: - Dataset: - keywords: **Notes:** AC的主要思想，用lachine learner控制数据，然后learner向人类提问那些learner学到的不靠谱的标签。 AC的流程： - 输入：一些标注样本，和一大堆未标注样本 - 输出：分类器，一小部分新标注的样本 - 目标：1 在不需要提供更多数据的前提下，构建一个尽可能优秀的分类器。2 尽可能将人类标注的劳力控制在一个最小限度下。 **Model...

BrambleXu

Annotation(T)

AC(T)

knowledge-graph-learning
knowledge-graph-learning copied to clipboard

Metadata

[Resource] Useful tools & lecture related to data science

AAAI-2017-Distant Supervision for Relation Extraction with Sentence-Level Attention and Entity Descriptions

ICLR-2021-Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks

Slide-2013-Active Learning 入門

arXiv-2020-A Survey of Deep Active Learning

Book-2012-Natural Language Annotation for Machine Learning

EBOOK-2018-An Introduction to Active Learning

LREC-2004-Definition, Dictionaries and Tagger for Extended Named Entity Hierarchy

COLING-2002-Language Independent NER using a Unified Model of Internal and Contextual Evidence

ACL-2009-A Web Survey on the Use of Active Learning to Support Annotation of Text Data

← Metadata

Owner

Metadata

knowledge-graph-learning knowledge-graph-learning copied to clipboard

Metadata

← Metadata

Owner

Metadata

knowledge-graph-learning
knowledge-graph-learning copied to clipboard