![tensorflow - Why Bert transformer uses [CLS] token for classification instead of average over all tokens? - Stack Overflow tensorflow - Why Bert transformer uses [CLS] token for classification instead of average over all tokens? - Stack Overflow](https://i.stack.imgur.com/m0jrg.png)
tensorflow - Why Bert transformer uses [CLS] token for classification instead of average over all tokens? - Stack Overflow
![The BERT pre-training model based on bi-direction transformer encoders.... | Download Scientific Diagram The BERT pre-training model based on bi-direction transformer encoders.... | Download Scientific Diagram](https://www.researchgate.net/publication/349990403/figure/fig1/AS:1001068146208771@1615684653748/The-BERT-pre-training-model-based-on-bi-direction-transformer-encoders-E-1-E-2-E-n_Q640.jpg)
The BERT pre-training model based on bi-direction transformer encoders.... | Download Scientific Diagram
![PDF] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Semantic Scholar PDF] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Semantic Scholar](https://d3i71xaburhd42.cloudfront.net/df2b0e26d0599ce3e70df8a9da02e51594e0e992/3-Figure1-1.png)
PDF] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Semantic Scholar
![Review — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | by Sik-Ho Tsang | Medium Review — BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | by Sik-Ho Tsang | Medium](https://miro.medium.com/max/1838/1*5cQlEV_7WuzUfE1B__jR5Q.png)