Skip to main content

LlamaEdge

LlamaEdge is the easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge.

  • Lightweight inference apps. LlamaEdge is in MBs instead of GBs
  • Native and GPU accelerated performance
  • Supports many GPU and hardware accelerators
  • Supports many optimized inference libraries
  • Wide selection of AI / LLM models

Installation and Setup

See the installation instructions.

Chat models

See a usage example.

from langchain_community.chat_models.llama_edge import LlamaEdgeChatService
API Reference:LlamaEdgeChatService

Was this page helpful?