I'm also new to this field, but I think I can answer some of these: 1. They are ...

I'm also new to this field, but I think I can answer some of these:

1. They are learned during training, not sure about the second part.

2. There's two parts here: First, they are not context specific to the model that learned them. This was a problem in earlier embeddings (like Word2Vec) where embeddings are static values and would be dependent on the context for the model. However Transformers (like GPT) generate context aware embeddings, which means the model understands that words can have different meanings depending on their context. The second part is can you share them on their own, and that answer is not really because the context-aware embeddings are produced by the neural network itself so you can't really separate the embeddings and the model, because the embeddings ARE the model.

3. 'Similar' in this case means what they call 'semantic similarity' which is a measure of how close in meaning two inputs are. It's usually calculated using cosine similarity which allows you to measure the closeness of two vectors in arbitrary dimensions.