Skip to main content

LlamaEdge

LlamaEdge is the easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge.

Lightweight inference apps. LlamaEdge is in MBs instead of GBs

Native and GPU accelerated performance

Supports many GPU and hardware accelerators

Supports many optimized inference libraries

Wide selection of AI / LLM models

Installation and Setup

See the installation instructions.

Chat models

See a usage example.

from langchain_community.chat_models.llama_edge import LlamaEdgeChatService

API Reference:LlamaEdgeChatService

Was this page helpful?

Installation and Setup
Chat models