There are different type of loaders
https://python.langchain.com/v0.2/docs/integrations/document_loaders/
including , but not limited to
- csv
- facebook chat
- file directory
- html
- power point
- hugging face
- hacker news
from langchain.text_splitter import RecursiveCharacterTextSplitter
with open('files/churchill_speech.txt') as f:
churchill_speech = f.read()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=100,
chunk_overlap=20,
length_function=len
)
chunks = text_splitter.create_documents([churchill_speech])
# print(chunks[2])
# print(chunks[10].page_content)
print(f'Now you have {len(chunks)}')
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
vector_store = Pinecone.from_documents (chunks, embeddings, index_name=index_name)
In a nutshell, this method processes the input documents, generates embeddings using the provided OpenAI embeddings instance, and returns a new pinecone vector store.
The resulting vector store object can perform similarity searches and retrieve relevant documents based on user queries.
// https://www.reddit.com/r/LangChain/comments/18ehcm7/guys_anyone_use_pineconefrom_documents
With this, we've successfully embedded the text into vectors and inserted them into a pinecone index.
query = 'Where should we fight?'
result = vector_store.similarity_search(query)
print(result)
The user defines a query.
The query is embedded into a vector.
A similarity search is performed in the vector database, and the text behind the most similar vectors is the answer to the user's question.
No comments:
Post a Comment