My Tech Learnings: June 2024

Sunday, June 30, 2024

Splitting & Embedding Text using Langchain

There are different type of loaders

https://python.langchain.com/v0.2/docs/integrations/document_loaders/

including , but not limited to

csv
facebook chat
file directory
html
power point
hugging face
hacker news

from langchain.text_splitter import RecursiveCharacterTextSplitter

with open('files/churchill_speech.txt') as f:

churchill_speech = f.read()

text_splitter = RecursiveCharacterTextSplitter(

chunk_size=100,

chunk_overlap=20,

length_function=len

)

chunks = text_splitter.create_documents([churchill_speech])

# print(chunks[2])

# print(chunks[10].page_content)

print(f'Now you have {len(chunks)}')

from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

vector_store = Pinecone.from_documents (chunks, embeddings, index_name=index_name)

In a nutshell, this method processes the input documents, generates embeddings using the provided OpenAI embeddings instance, and returns a new pinecone vector store.

The resulting vector store object can perform similarity searches and retrieve relevant documents based on user queries.

// https://www.reddit.com/r/LangChain/comments/18ehcm7/guys_anyone_use_pineconefrom_documents

With this, we've successfully embedded the text into vectors and inserted them into a pinecone index.

query = 'Where should we fight?'

result = vector_store.similarity_search(query)

print(result)

The user defines a query.

The query is embedded into a vector.

A similarity search is performed in the vector database, and the text behind the most similar vectors is the answer to the user's question.

Vector Databases : What Why How

What are vector databases ?

They are new type of databases designed to store & query unstructured data.

Unstructured data is data that does not have a fixed schema , like text , image and audio.

Examples of Vector DB: https://lakefs.io/blog/12-vector-databases-2023/

pine cone
milvus
chroma

Three Steps

1) Embedding
2) Indexing
3) Querying

https://www.pinecone.io/learn/vector-database/

Why do we need them ?

Vector databases are useful for storing and querying complex, unstructured data, such as images, audio, and user preferences, that traditional databases may struggle with. They can help developers retrieve data more quickly and simply by discovering similarities between data points. Vector databases can also support semantic search, which considers the context and semantic meaning of a search query, rather than just matching exact words or phrases. This can lead to more relevant and accurate search results

How to use ?

https://www.youtube.com/watch?v=AGKY_Q3GjRc&list=PLRLVhGQeJDTLiw-ZJpgUtZW-bseS2gq9-

Using Pine cone in production ...

https://www.youtube.com/watch?v=fo0F-DAum7E&pp=ygUWcGluZWNvbmUgaW4gcHJvZHVjdGlvbg%3D%3D

Code : http://hilite.me/

# authenticating to Pinecone. 
# the API KEY is in .env
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv(), override=True)



from pinecone import Pinecone, ServerlessSpec
# Initilizing and authenticating the pinecone client
pc = Pinecone()
# pc = Pinecone(api_key='YOUR_API_KEY')
# checking authentication
pc.list_indexes()

##################################
# Working with pinecone indexes
#################################

# listing all indexes
pc.list_indexes()

index_name = 'langchain'
# getting a complete description of a specific index:
pc.describe_index(index_name)

# getting a list with the index names 
pc.list_indexes().names()

# deleting an index
if index_name in pc.list_indexes().names():
    print(f'Deleting index {index_name} ... ')
    pc.delete_index(index_name)
    print('Done')
else:
    print(f'Index {index_name} does not exist!')


# creating a Serverless Pinecone index 
# starter free plan permits 1 project, up to 5 indexes, up to 100 namespaces per index
index_name = 'langchain'

if index_name not in pc.list_indexes().names():
    print(f'Creating index {index_name}')
    pc.create_index(
        name=index_name,
        dimension=1536,
        metric='cosine',
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        ) 
    )
    print('Index created! 😊')
else:
    print(f'Index {index_name} already exists!')

############################

# working with vectors
############################
index = pc.Index(index_name)
index.describe_index_stats()

# inserting vectors
import random
vectors = [[random.random() for _ in range(1536)] for v in range(5)]
# print(vectors) ... above code generates 5 vectors with 1536 dimentions
ids = list('abcde')
index_name = 'langchain'
index = pc.Index(index_name)
index.upsert(vectors=zip(ids, vectors))

# updating vectors
index.upsert(vectors=[('c', [0.5] * 1536)])

# fetching vectors
# index = pc.Index(index_name)
index.fetch(ids=['c', 'd'])

# deleting vectors
index.delete(ids=['b', 'c'])

index.describe_index_stats()

# querying a non-existing vector returns an empty vector
index.fetch(ids=['x']) 

# querying vectors
query_vector = [random.random() for _ in range(1536)]

index.query(
    vector=query_vector,
    top_k=3,
    include_values=False
)

############################
# Namespaces
############################

# index.describe_index_stats()
index = pc.Index('langchain')

import random
vectors = [[random.random() for _ in range(1536)] for v in range(5)]
ids = list('abcde')
index.upsert(vectors=zip(ids, vectors))

# partition the index into namespaces
# creating a new namespace
vectors = [[random.random() for _ in range(1536)] for v in range(3)]
ids = list('xyz')
index.upsert(vectors=zip(ids, vectors), namespace='first-namespace')

vectors = [[random.random() for _ in range(1536)] for v in range(2)]
ids = list('qp')
index.upsert(vectors=zip(ids, vectors), namespace='second-namespace')

index.describe_index_stats()

index.fetch(ids=['x'])

index.fetch(ids=['x'], namespace='first-namespace')

index.delete(ids=['x'], namespace='first-namespace')

index.delete(delete_all=True, namespace='first-namespace')

index.describe_index_stats()

LLM Embeddings : What , Why , How

Embeddings are the core of building LLM applications.

What are Embedding ?

To put it simply, embedding is data , which is put as an array of words.

The data could be text (word , sentences) images.

https://www.pinecone.io/learn/vector-embeddings/

https://www.youtube.com/watch?v=ySus5ZS0b94&t=235s

Vector embeddings are a way of representing text as a set of numbers in a high dimensional space, and the numbers represent the meaning of the words in the text.

This is critical for the AI to gain understanding and maintain long term memory.

https://lakefs.io/blog/12-vector-databases-2023/

Types of embeddings ?

https://datasciencedojo.com/blog/embeddings-and-llm/

What are they used for ?

To measure

relatedness between data

how closely two pieces of text are related in meaning
The distance between two embeddings or 2 vectors measures their relatedness which translates to the relatedness between the text concepts they represent.

similarity between data

similar embeddings or vector represent similar concepts

How ?

There are two common approaches to measuring relatedness and similarity between text embeddings.

Cosine similarity and Euclidean distance.

Some examples of how text embeddings can be used to measure relatedness and similarity are

Text classification : which is the task of assigning a label to a piece of text.
Text clustering : which is the task of grouping together pieces of text that are similar in meaning.
And question answering, which is the task of answering a question posed in natural language.

Saturday, June 22, 2024

Temperature and Top_p in ChatGPT

https://www.coltsteele.com/tips/understanding-openai-s-temperature-parameter

Temperature is a number between 0 and 2, with a default value of 1 or 0.7 depending on the model you choose.

The temperature is used to control the randomness of the output.

When you set it higher, you'll get more random outputs. When you set it lower, towards 0, the values are more deterministic.

The message we send asks the model to complete the sentence "the key to happiness is" with two words.

res = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Complete the sentence with two words. The key to happiness is"}]
)

With the default temperature, running this will usually return "contentment and gratitude" or "inner peace".

If we increase the temperature parameter to the maximum of 2 by adding `temperature=2, it's going to give me a much more varied output: "personal fulfillment", "simplicity and gratitude", "contentment and balance", "satisfaction and appreciation", "different for everybody", "gratitude and teamwork", "mindfulness and empathy".

Moving the temperature all the way down to zero, it's going to return "contentment and gratitude" pretty much every single time. It's not guaranteed to be the same, but it is most likely that it will be the same output.

Haiku Example

In this example, we ask the model to write a haiku.

res = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
temperature=0.9,
messages=[{"role": "user", "content": "Write a haiku"}]
)

With the temperature at 0.9, it's going to produce nice haikus that are typically nature-themed:

flowers in the field,
dancing in the summer breeze,
nature's symphony.

Solitary bee
over forgotten blossoms,
April cold leaves.

Moving the temperature all the way up to 2, we'll certainly get something different, but it won't always be coherent:

Easy tropical flips-
trade palms winds filter music_,
without flotates spring waves

First of all, this is not a haiku. It also starts introducing some very weird stuff like underscores and a made up word.

How Temperature Works

Basically, the temperature value we provide is used to scale down the probabilities of the next individual tokens that the model can select from.

With a higher temperature, we'll have a softer curve of probabilities. With a lower temperature, we have a much more peaked distribution. If the temperature is almost 0, we're going to have a very sharp peaked distribution.

https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api/172683

Let’s start with temperature:

Temperature is a parameter that controls the “creativity” or randomness of the text generated by GPT-3. A higher temperature (e.g., 0.7) results in more diverse and creative output, while a lower temperature (e.g., 0.2) makes the output more deterministic and focused.
In practice, temperature affects the probability distribution over the possible tokens at each step of the generation process. A temperature of 0 would make the model completely deterministic, always choosing the most likely token.

Next, let’s discuss top_p sampling (also known as nucleus sampling):

Top_p sampling is an alternative to temperature sampling. Instead of considering all possible tokens, GPT-3 considers only a subset of tokens (the nucleus) whose cumulative probability mass adds up to a certain threshold (top_p).
For example, if top_p is set to 0.1, GPT-3 will consider only the tokens that make up the top 10% of the probability mass for the next token. This allows for dynamic vocabulary selection based on context.

Both temperature and top_p sampling are powerful tools for controlling the behavior of GPT-3, and they can be used independently or together when making API calls. By adjusting these parameters, you can achieve different levels of creativity and control, making them suitable for a wide range of applications.

To give you an idea of how these parameters can be used in different scenarios, here’s a table with example values:

Lang Chain

What is Lang Chain ?

LangChain is an open source framework for building applications based on large language models (LLMs). . It allows AI developers to combine LLMs like GPT-4 with external sources of computation and data.

It is currently offered as Python and TypeScript package. Popularity of this framework is increasing after march 2023 with launch of ChatGPT.

Why is it important ?

LLMs excel at responding to prompts in a general context, but struggle in a specific domain they were never trained on. Prompts are queries people use to seek responses from an LLM. For example, an LLM can provide an answer to how much a computer costs by providing an estimate. However, it can't list the price of a specific computer model that your company sells.

To do that, machine learning engineers must integrate the LLM with the organization’s internal data sources and apply prompt engineering—a practice where a data scientist refines inputs to a generative model with a specific structure and context.

The following sections describe benefits of LangChain.

With LangChain, organizations can repurpose LLMs for domain-specific applications without retraining or fine-tuning.
LangChain simplifies artificial intelligence (AI) development by abstracting the complexity of data source integrations and prompt refining.
LangChain provides AI developers with tools to connect language models with external data sources.

What are the core components of LangChain ?

Monday, June 10, 2024

GraphQL Cons

What REST limitations does GraphQL attempt to overcome ?

It emerged in 2012 during social media boom
1) fixed structure data exchange : fixed structures
2) over fetching vs under fetching : REST API always return whole data set.

After few years of using GraphQL in production people in the community aren't reaching for it as much as we once did.

First up, security. GraphQL's self-documenting query API increases attack surface. Every field needs careful authorization based on context, a far bigger task than securing REST endpoints.
Performance is another challenge. GraphQL's flexibility allows crafting queries that can consume significant server resources. Rate limiting becomes more complex compared to REST, requiring estimating query complexity.
Parsing invalid GraphQL queries can lead to memory issues due to error response generation, requiring additional mitigations not needed in REST.
GraphQL's nested queries often lead to N+1 problems in both data fetching and authorization. The Dataloader pattern helps but introduces boilerplate and complexity.
Solving GraphQL's performance and security issues often couples business logic with the transport layer, making testing and debugging more challenging compared to REST.
For many use cases, an OpenAPI 3.0+ compliant JSON REST API might be a better fit. It provides type safety and self-documentation without GraphQL's complexity.
Emerging tools like TypeSpec offer promising "specification first" approaches for generating type-safe APIs, similar to GraphQL's "schema first" approach but for REST.

While GraphQL has its strengths, its complexity can be overkill for many projects. As always, choose the right tool for the job.

https://aws.amazon.com/compare/the-difference-between-graphql-and-rest/

When to use GraphQL vs. REST

You can use GraphQL and REST APIs interchangeably. However, there are some use cases where one or the other is a better fit.

For example, GraphQL is likely a better choice if you have these considerations:

· You have limited bandwidth, and you want to minimize the number of requests and responses
· You have multiple data sources, and you want to combine them at one endpoint
· You have client requests that vary significantly, and you expect very different responses

On the other hand, REST is is likely a better choice if you have these considerations:

· You have smaller applications with less complex data
· You have data and operations that all clients use similarly
· You have no requirements for complex data querying