Please enable JavaScript.
Coggle requires JavaScript to display documents.
BERT的发展 - Coggle Diagram
BERT的发展
BERT(2018)
Traing Tasks
Task1:Masked Language Model (MLM)
Task2:Next Sentence Prediction (NSP)
Corpus
Fine Tuning Experimentation and Results
Model Overview
Model Framework
Model Input
wordPiece
Sentence Pairs
Ablation Experimentation
Infuence of Model Size
BERT with Featrue-based paradigms
Influence of Training Tasks
Two Paradigms
Feature-based
Fine-Tuning
ELECTRA(2020)
Parameter sharing、samller generator、training strategies
Performance comparison
Generator-Discriminator framework
Learning rate analysis
ERNIE(2019-2021)
ERNIE 1.0
Modification 2: Dialogue Language Model (DLM)
Experimentation
Dataset:中文维基百科、百度百科、百度新闻、百度贴吧
Tasks:XNLI、LCQMC、MSRA-NER、ChnSentiCorp、nlpcc-dbqa
Modification1: knowledge integration (or knowledge injection)
ERNIE 2.0
Modification 1:Multi-Task Continual Learning
More unsupervised pre-trained tasks
百科、书籍、新闻、对话、检索数据、修辞关系数据
ENNIE 3.0
ALBERT(2019)
Optimization Strategies
parameter reduction
Matric decomposation
Parameter sharing
SOP Task was in the place of NSP
Else Experimental Design
n-gram MASK
Comparison Between xxlarge and BERT_large
The influence of extra data or dropout
Application in the real-world natural language understanding (NLU) task
RoBERTa (2019)
Optimization Strategies
Dynamical Masking
Removed the NSP task
More Data
Larger batch-size
Byte-level BPE Encoding