Documents
Document vs LlamaDocument
Section titled “Document vs LlamaDocument”In our codebase we use the term Document to refer to any file provided by the
user. It is a 1 to 1 mapping with a file on disk or S3, like a pdf or a docx
file.
In the Llama ecosystem (an in machine learning in general) the term has a different meaning. It is simply a collection of text, with some optional metadata. Check out the Llama source code for this.
To avoid clashes between the two concepts, we use the term LlamaDocument to
refer to the latter. This translates to renaming the import of any Document
class to LlamaDocument that comes from the llama_index package.
from llama_index.core.schema import Document as LlamaDocumentA Node is simply the vector embedding of a LlamaDocument. In our vector
database we store Node objects.
Our ingestion pipeline transforms LlamaDocument into Node in a process
called
indexing.