bert Visualization of a 12-layer BERT for sentence encoding/embedding

One major concern of using BERT as sentence encoder (i.e. mapping a variable-length sentence to a fixed length vector) is which layer to pool and how to pool. I made a visualization on UCI-News Aggregator Dataset, where I randomly sample 20K news titles; get sentence encodes from different layers and with max-pooling and avg-pooling, finally reduce it to 2D via PCA. There are only four classes of the data, illustrated in red, blue, yellow and green. The BERT model is uncased_L-12_H-768_A-12 released by Google.

download-1

full thread can be viewed here: https://github.com/hanxiao/bert-as-service#q-so-which-layer-and-which-pooling-strategy-is-the-best

Dec 07 '18 12:12 hanxiao

for those who are interested in using BERT model as sentence encoder, welcome to check my repo bert-as-service: https://github.com/hanxiao/bert-as-service You can get sentence/ELMo-like word embedding with 2 lines of code.

demo

Dec 07 '18 12:12 hanxiao

@hanxiao Is that sentence encoding by averaging the word vectors to represent the whole sentence ?

Mar 28 '19 16:03 ghost

Can in simple words you explain how you got sentence embedding from word embedding ?

Apr 16 '19 12:04 singularity014

@hanxiao Is that sentence encoding by averaging the word vectors to represent the whole sentence ?

I dont think it is by taking average of the word-embeddings.

Apr 16 '19 12:04 singularity014

@hanxiao Is that sentence encoding by averaging the word vectors to represent the whole sentence ?

You can check the title of these two pictures which illustrate the pooling strategies (i.e. REDUCE_MEAN and REDUCE_MAX)

May 13 '19 07:05 Jun-jie-Huang

Amazing work about visualization!

Apr 13 '23 15:04 LMC63