What are vector databases ?
They are new type of databases designed to store & query unstructured data.
Unstructured data is data that does not have a fixed schema , like text , image and audio.
Examples of Vector DB: https://lakefs.io/blog/12-vector-databases-2023/
- pine cone
- milvus
- chroma
Three Steps
- 1) Embedding
- 2) Indexing
- 3) Querying
https://www.pinecone.io/learn/vector-database/
Why do we need them ?
Vector databases are useful for storing and querying complex, unstructured data, such as images, audio, and user preferences, that traditional databases may struggle with. They can help developers retrieve data more quickly and simply by discovering similarities between data points. Vector databases can also support semantic search, which considers the context and semantic meaning of a search query, rather than just matching exact words or phrases. This can lead to more relevant and accurate search results
How to use ?
https://www.youtube.com/watch?v=AGKY_Q3GjRc&list=PLRLVhGQeJDTLiw-ZJpgUtZW-bseS2gq9-
Using Pine cone in production ...
https://www.youtube.com/watch?v=fo0F-DAum7E&pp=ygUWcGluZWNvbmUgaW4gcHJvZHVjdGlvbg%3D%3D
Code : http://hilite.me/
# authenticating to Pinecone. # the API KEY is in .env import os from dotenv import load_dotenv, find_dotenv load_dotenv(find_dotenv(), override=True) from pinecone import Pinecone, ServerlessSpec # Initilizing and authenticating the pinecone client pc = Pinecone() # pc = Pinecone(api_key='YOUR_API_KEY') # checking authentication pc.list_indexes() ################################## # Working with pinecone indexes ################################# # listing all indexes pc.list_indexes() index_name = 'langchain' # getting a complete description of a specific index: pc.describe_index(index_name) # getting a list with the index names pc.list_indexes().names() # deleting an index if index_name in pc.list_indexes().names(): print(f'Deleting index {index_name} ... ') pc.delete_index(index_name) print('Done') else: print(f'Index {index_name} does not exist!') # creating a Serverless Pinecone index # starter free plan permits 1 project, up to 5 indexes, up to 100 namespaces per index index_name = 'langchain' if index_name not in pc.list_indexes().names(): print(f'Creating index {index_name}') pc.create_index( name=index_name, dimension=1536, metric='cosine', spec=ServerlessSpec( cloud="aws", region="us-east-1" ) ) print('Index created! 😊') else: print(f'Index {index_name} already exists!') ############################
# working with vectors
############################
index = pc.Index(index_name) index.describe_index_stats() # inserting vectors import random vectors = [[random.random() for _ in range(1536)] for v in range(5)] # print(vectors) ... above code generates 5 vectors with 1536 dimentions ids = list('abcde') index_name = 'langchain' index = pc.Index(index_name) index.upsert(vectors=zip(ids, vectors)) # updating vectors index.upsert(vectors=[('c', [0.5] * 1536)]) # fetching vectors # index = pc.Index(index_name) index.fetch(ids=['c', 'd']) # deleting vectors index.delete(ids=['b', 'c']) index.describe_index_stats() # querying a non-existing vector returns an empty vector index.fetch(ids=['x']) # querying vectors query_vector = [random.random() for _ in range(1536)] index.query( vector=query_vector, top_k=3, include_values=False ) ############################ # Namespaces ############################ # index.describe_index_stats() index = pc.Index('langchain') import random vectors = [[random.random() for _ in range(1536)] for v in range(5)] ids = list('abcde') index.upsert(vectors=zip(ids, vectors)) # partition the index into namespaces # creating a new namespace vectors = [[random.random() for _ in range(1536)] for v in range(3)] ids = list('xyz') index.upsert(vectors=zip(ids, vectors), namespace='first-namespace') vectors = [[random.random() for _ in range(1536)] for v in range(2)] ids = list('qp') index.upsert(vectors=zip(ids, vectors), namespace='second-namespace') index.describe_index_stats() index.fetch(ids=['x']) index.fetch(ids=['x'], namespace='first-namespace') index.delete(ids=['x'], namespace='first-namespace') index.delete(delete_all=True, namespace='first-namespace') index.describe_index_stats()
No comments:
Post a Comment