bert icon indicating copy to clipboard operation
bert copied to clipboard

Visualization of a 12-layer BERT for sentence encoding/embedding

Open hanxiao opened this issue 7 years ago • 6 comments

One major concern of using BERT as sentence encoder (i.e. mapping a variable-length sentence to a fixed length vector) is which layer to pool and how to pool. I made a visualization on UCI-News Aggregator Dataset, where I randomly sample 20K news titles; get sentence encodes from different layers and with max-pooling and avg-pooling, finally reduce it to 2D via PCA. There are only four classes of the data, illustrated in red, blue, yellow and green. The BERT model is uncased_L-12_H-768_A-12 released by Google.

download-1 download

full thread can be viewed here: https://github.com/hanxiao/bert-as-service#q-so-which-layer-and-which-pooling-strategy-is-the-best

hanxiao avatar Dec 07 '18 12:12 hanxiao

for those who are interested in using BERT model as sentence encoder, welcome to check my repo bert-as-service: https://github.com/hanxiao/bert-as-service You can get sentence/ELMo-like word embedding with 2 lines of code.

demo

hanxiao avatar Dec 07 '18 12:12 hanxiao

@hanxiao Is that sentence encoding by averaging the word vectors to represent the whole sentence ?

ghost avatar Mar 28 '19 16:03 ghost

Can in simple words you explain how you got sentence embedding from word embedding ?

singularity014 avatar Apr 16 '19 12:04 singularity014

@hanxiao Is that sentence encoding by averaging the word vectors to represent the whole sentence ?

I dont think it is by taking average of the word-embeddings.

singularity014 avatar Apr 16 '19 12:04 singularity014

@hanxiao Is that sentence encoding by averaging the word vectors to represent the whole sentence ?

You can check the title of these two pictures which illustrate the pooling strategies (i.e. REDUCE_MEAN and REDUCE_MAX)

Jun-jie-Huang avatar May 13 '19 07:05 Jun-jie-Huang

Amazing work about visualization!

LMC63 avatar Apr 13 '23 15:04 LMC63