Sunday, June 30, 2024

LLM Embeddings : What , Why , How

Embeddings are the core of building LLM applications.

What are Embedding ?

To put it simply, embedding is data , which is put as an array of words.

The data could be text  (word , sentences) images.

https://www.pinecone.io/learn/vector-embeddings/ 

https://www.youtube.com/watch?v=ySus5ZS0b94&t=235s 

Vector embeddings are a way of representing text as a set of numbers in a high dimensional space, and the numbers represent the meaning of the words in the text.

This is critical for the AI to gain understanding and maintain long term memory.

Types of embeddings ?

 https://datasciencedojo.com/blog/embeddings-and-llm/ 


What are they used for ?

To measure 

  • relatedness between data  
    • how closely two pieces of text are related in meaning
    • The distance between two embeddings or 2 vectors measures their relatedness which translates to the relatedness between the text concepts they represent.
  • similarity between data 
    • similar embeddings or vector represent similar concepts

How ?

There are two common approaches to measuring relatedness and similarity between text embeddings.
Cosine similarity and Euclidean distance.
Some examples of how text embeddings can be used to measure relatedness and similarity are 
  • Text classification : which is the task of assigning a label to a piece of text.
  • Text clustering : which is the task of grouping together pieces of text that are similar in meaning.
  • And question answering, which is the task of answering a question posed in natural language.






No comments: