Path to the model on the filesystem.
Optional batchPrompt processing batch size.
Optional cacheOptional callbackUse callbacks instead
Optional callbacksOptional concurrencyUse maxConcurrency instead
Optional contextText context size.
Optional embeddingEmbedding mode only.
Optional f16Use fp16 for KV cache.
Optional gpuNumber of layers to store in VRAM.
Optional logitsThe llama_eval() call computes all logits, not just the last one.
Optional maxThe maximum number of concurrent calls that can be made.
Defaults to Infinity, which means no limit.
Optional maxThe maximum number of retries that can be made for a single call, with an exponential backoff between each attempt. Defaults to 6.
Optional maxOptional metadataOptional onCustom handler to handle failed attempts. Takes the originally thrown error object as input, and should itself throw an error if the input error is not retryable.
Optional prependAdd the begining of sentence token.
Optional seedIf null, a random seed will be used.
Optional tagsOptional temperatureThe randomness of the responses, e.g. 0.1 deterministic, 1.5 creative, 0.8 balanced, 0 disables.
Optional threadsNumber of threads to use to evaluate tokens.
Optional topKConsider the n most likely tokens, where n is 1 to vocabulary size, 0 disables (uses full vocabulary). Note: only applies when temperature > 0.
Optional topPSelects the smallest token set whose probability exceeds P, where P is between 0 - 1, 1 disables. Note: only applies when temperature > 0.
Optional trimTrim whitespace from the end of the generated text Disabled by default.
Optional useForce system to keep model in RAM.
Optional useUse mmap if possible.
Optional verboseOptional vocabOnly load the vocabulary, no weights.
Generated using TypeDoc
Note that the modelPath is the only required parameter. For testing you can set this in the environment variable
LLAMA_PATH.