Documents

Document vs LlamaDocument

In our codebase we use the term Document to refer to any file provided by the user. It is a 1 to 1 mapping with a file on disk or S3, like a pdf or a docx file.

In the Llama ecosystem (an in machine learning in general) the term has a different meaning. It is simply a collection of text, with some optional metadata. Check out the Llama source code for this.

To avoid clashes between the two concepts, we use the term LlamaDocument to refer to the latter. This translates to renaming the import of any Document class to LlamaDocument that comes from the llama_index package.

from llama_index.core.schema import Document as LlamaDocument

Nodes

A Node is simply the vector embedding of a LlamaDocument. In our vector database we store Node objects.

Our ingestion pipeline transforms LlamaDocument into Node in a process called indexing.