用户:月心雪/沙盒

BERT

变换器的双向编码器表征技术(BERT)是通过Google进行的自然语言处理(NLP)的一种预训练技术^[1]^[2]。2018年，雅各布·德夫林和他来自谷歌的同事创建并发布了BERT。谷歌正在利用BERT来更好地理解用户的搜索含义。^[3] 原始的英语语言 BERT 模型有两种预先训练的一般类型:^[1](1)BERTBASE模型，一个12层，768隐藏，12头，110M的参数神经网络结构，(2) BERTLARGE模型，一个24层，1024隐藏，16头，340M的参数神经网络结构; 两者都是在有800M单词的{[BooksCorpus]]^[4]以及一个拥有2500M单词的英文版维基百科上训练的。

性能

当 BERT 出版时，它在一些自然语言的理解任务上表现得最为先进：^[1]

GLUE (通用语言理解评估)任务集(包括9个任务)。
SQuAD (Stanford Question Answering Dataset) v1.1和 v2.0。
SWAG (对抗生成的情境)

分析

BERT 在这些自然语言理解任务上表现出最先进水平的原因还没有得到很好的解释。^[5]^[6]目前的研究主要集中在精心选择的输入序列背后的 BERT 输出关系，^[7]^[8]通过探测分类器分析内部向量表示，^[9]^[10]以及注意力权重表示的关系。^[5]^[6]

历史

BERT起源于训练前的语境表示，包括半监督序列学习,^[11]生成预训练，ELMo，^[12]和ULMFit. ^[13]与以前的模型不同，BERT是一种深度双向的、无监督的语言表达，仅使用纯文本语料库进行预训练。上下文无关模型(如 word2vec 或 GloVe)为词汇表中的每个单词生成一个单词嵌入表示法，其中BERT考虑给定单词每次出现的上下文。例如，尽管”跑步”的矢量在“他在经营一家公司”和”他在跑马拉松”两句中的出现具有相同的word2vec矢量表示，但BERT将提供一种上下文嵌入，可以根据句子表达的不同而不同。 2019年10月25日，Google搜索宣布他们已经开始在美国国内的英语搜索查询中应用BERT模型。^[14]2019年12月9日，据报道，Google搜索已经采用了BERT，涵盖了70多种语言。^[15]

获奖情况

在2019年美国计算机语言学协会北美分会年会上，BERT获得了最佳长篇论文奖。^[16]

参见

参考文献

^ ^1.0 ^1.1 ^1.2 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].
^ Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google AI Blog. [2019-11-27] （英语）.
^ Understanding searches better than ever before. Google. 2019-10-25 [2019-11-27] （英语）.
^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015. arXiv:1506.06724  [cs.CV]. cite arXiv模板填写了不支持的参数 (帮助)
^ ^5.0 ^5.1 Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna. Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019: 4364–4373. S2CID 201645145. doi:10.18653/v1/D19-1445 （美国英语）.
^ ^6.0 ^6.1 Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2019: 276–286. doi:10.18653/v1/w19-4828  .
^ Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 284–294. Bibcode:2018arXiv180504623K. S2CID 21700944. arXiv:1805.04623  . doi:10.18653/v1/p18-1027.
^ Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 1195–1205. Bibcode:2018arXiv180311138G. S2CID 4460159. arXiv:1803.11138  . doi:10.18653/v1/n18-1108.
^ Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 240–248. Bibcode:2018arXiv180808079G. S2CID 52090220. arXiv:1808.08079  . doi:10.18653/v1/w18-5426.
^ Zhang, Kelly; Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 359–361. doi:10.18653/v1/w18-5448  .
^ Dai, Andrew; Le, Quoc. Semi-supervised Sequence Learning. 4 November 2015. arXiv:1511.01432  [cs.LG].
^ Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer. Deep contextualized word representations. 15 February 2018. arXiv:1802.05365v2  [cs.CL].
^ Howard, Jeremy; Ruder, Sebastian. Universal Language Model Fine-tuning for Text Classification. 18 January 2018. arXiv:1801.06146v5  [cs.CL].
^ Nayak, Pandu. Understanding searches better than ever before. Google Blog. 25 October 2019 [10 December 2019].
^ Montti, Roger. Google's BERT Rolls Out Worldwide. Search Engine Journal. Search Engine Journal. 10 December 2019 [10 December 2019].
^ Best Paper Awards. NAACL. 2019 [Mar 28, 2020].

外部链接

Official GitHub repository

[:0-1] 1.0 ^1.1 ^1.2 Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 11 October 2018. arXiv:1810.04805v2  [cs.CL].

[2] Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing. Google AI Blog. [2019-11-27] （英语）.

[3] Understanding searches better than ever before. Google. 2019-10-25 [2019-11-27] （英语）.

[4] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books. 2015. arXiv:1506.06724  [cs.CV]. cite arXiv模板填写了不支持的参数 (帮助)

[:1-5] 5.0 ^5.1 Kovaleva, Olga; Romanov, Alexey; Rogers, Anna; Rumshisky, Anna. Revealing the Dark Secrets of BERT. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). November 2019: 4364–4373. S2CID 201645145. doi:10.18653/v1/D19-1445 （美国英语）.

[:2-6] 6.0 ^6.1 Clark, Kevin; Khandelwal, Urvashi; Levy, Omer; Manning, Christopher D. What Does BERT Look at? An Analysis of BERT's Attention. Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2019: 276–286. doi:10.18653/v1/w19-4828  .

[7] Khandelwal, Urvashi; He, He; Qi, Peng; Jurafsky, Dan. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 284–294. Bibcode:2018arXiv180504623K. S2CID 21700944. arXiv:1805.04623  . doi:10.18653/v1/p18-1027.

[8] Gulordava, Kristina; Bojanowski, Piotr; Grave, Edouard; Linzen, Tal; Baroni, Marco. Colorless Green Recurrent Networks Dream Hierarchically. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 1195–1205. Bibcode:2018arXiv180311138G. S2CID 4460159. arXiv:1803.11138  . doi:10.18653/v1/n18-1108.

[9] Giulianelli, Mario; Harding, Jack; Mohnert, Florian; Hupkes, Dieuwke; Zuidema, Willem. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 240–248. Bibcode:2018arXiv180808079G. S2CID 52090220. arXiv:1808.08079  . doi:10.18653/v1/w18-5426.

[10] Zhang, Kelly; Bowman, Samuel. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Stroudsburg, PA, USA: Association for Computational Linguistics). 2018: 359–361. doi:10.18653/v1/w18-5448  .

[11] Dai, Andrew; Le, Quoc. Semi-supervised Sequence Learning. 4 November 2015. arXiv:1511.01432  [cs.LG].

[12] Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer. Deep contextualized word representations. 15 February 2018. arXiv:1802.05365v2  [cs.CL].

[13] Howard, Jeremy; Ruder, Sebastian. Universal Language Model Fine-tuning for Text Classification. 18 January 2018. arXiv:1801.06146v5  [cs.CL].

[14] Nayak, Pandu. Understanding searches better than ever before. Google Blog. 25 October 2019 [10 December 2019].

[15] Montti, Roger. Google's BERT Rolls Out Worldwide. Search Engine Journal. Search Engine Journal. 10 December 2019 [10 December 2019].

[16] Best Paper Awards. NAACL. 2019 [Mar 28, 2020].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

查论编自然语言处理
一般术语	语料库口语语料库停用词词袋完全人工智慧（英语：AI-complete） n元语法（双字母组、三元语法（英语：Trigrams））
文本挖掘	文本分割词性标注（英语：Part-of-speech tagging）拆句处理（英语：Shallow parsing）复合词处理（英语：Compound term processing）搭配提取（英语：Collocation extraction）词干提取词形还原命名实体识别指代文本情感分析概念挖掘（英语：Concept mining）语法分析词义消歧术语提取（英语：Terminology extraction）真实大小写处理（英语：Truecasing）
自动摘要（英语：Automatic summarization）	多文档摘要（英语：Multi-document summarization）句子抽取（英语：Sentence extraction）文本简化（英语：Text simplification）
分布语义（英语：Distributional semantics）模型	潜在语义学 Seq2Seq模型 Word2vec 语言模型大型语言模型基础模型 LLaMA ChatGPT GPT-4 文心一言词嵌入
机器翻译	电脑辅助翻译基于实例（英语：Example-based machine translation）基于规则（英语：Rule-based machine translation）
自动识别与数据采集	语音识别语音合成光学字符识别自然语言生成提示工程
主题模型	弹珠分布（英语：Pachinko allocation）隐含狄利克雷分布潜在语义索引
计算机辅助审查（英语：Computer-assisted reviewing）	自动作文评分（英语：Automated essay scoring）语料库检索工具（英语：Concordancer）文法检查器（英语：Grammar checker）预测文本（英语：Predictive text）拼写检查语法猜测（英语：Syntax guessing）
自然语言用户界面（英语：Natural language user interface）	自动在线助手聊天机器人文字冒险游戏问答系统