Skip to content

rafska/awesome-local-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Awesome local LLM

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

Table of Contents

Inference platforms

  • LM Studio - discover, download and run local LLMs
  • jan - an open source alternative to ChatGPT that runs 100% offline on your computer
  • LocalAI - the free, open-source alternative to OpenAI, Claude and others
  • ChatBox - user-friendly desktop client app for AI models/LLMs
  • lemonade - a local LLM server with GPU and NPU Acceleration

Back to Table of Contents

Inference engines

  • ollama - get up and running with LLMs
  • llama.cpp - LLM inference in C/C++
  • vllm - a high-throughput and memory-efficient inference and serving engine for LLMs
  • exo - run your own AI cluster at home with everyday devices
  • BitNet - official inference framework for 1-bit LLMs
  • sglang - a fast serving framework for large language models and vision language models
  • Nano-vLLM - a lightweight vLLM implementation built from scratch
  • koboldcpp - run GGUF models easily with a KoboldAI UI
  • gpustack - simple, scalable AI model deployment on GPU clusters
  • mlx-lm - generate text and fine-tune large language models on Apple silicon with MLX
  • distributed-llama - connect home devices into a powerful cluster to accelerate LLM inference
  • ik_llama.cpp - llama.cpp fork with additional SOTA quants and improved performance
  • FastFlowLM - run LLMs on AMD Ryzen™ AI NPUs
  • vllm-gfx906 - vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
  • llm-scaler - run LLMs on Intel Arc™ Pro B60 GPUs

Back to Table of Contents

User Interfaces

  • Open WebUI - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
  • Lobe Chat - an open-source, modern design AI chat framework
  • Text generation web UI - LLM UI with advanced features, easy setup, and multiple backend support
  • SillyTavern - LLM Frontend for Power Users
  • Page Assist - Use your locally running AI models to assist you in your web browsing

Back to Table of Contents

Large Language Models

Explorers, Benchmarks, Leaderboards

Back to Table of Contents

Model providers

  • Qwen - powered by Alibaba Cloud
  • Mistral AI - a pioneering French artificial intelligence startup
  • Tencent - a profile of a Chinese multinational technology conglomerate and holding company
  • Unsloth AI - focusing on making AI more accessible to everyone (GGUFs etc.)
  • bartowski - providing GGUF versions of popular LLMs
  • Beijing Academy of Artificial Intelligence - a private non-profit organization engaged in AI research and development
  • Open Thoughts - a team of researchers and engineers curating the best open reasoning datasets

Back to Table of Contents

Specific models

General purpose

  • Qwen3-Next - a collection of the latest generation Qwen LLMs
  • Gemma 3 - a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models
  • gpt-oss - a collection of open-weight models from OpenAI, designed for powerful reasoning, agentic tasks, and versatile developer use cases
  • Ministral 3 - a collection of edge models, with base, instruct and reasoning variants, in 3 different sizes: 3B, 8B and 14B, all with vision capabilities
  • GLM-4.5 - a collection of hybrid reasoning models designed for intelligent agents
  • Hunyuan - a collection of Tencent's open-source efficient LLMs designed for versatile deployment across diverse computational environments
  • Phi-4-mini-instruct - a lightweight open model built upon synthetic data and filtered publicly available websites
  • NVIDIA Nemotron - a collection of open, production-ready enterprise models trained from scratch by NVIDIA
  • Llama Nemotron - a collection of open, production-ready enterprise models from NVIDIA
  • OpenReasoning-Nemotron - a collection of models from NVIDIA, trained on 5M reasoning traces for math, code and science
  • Granite 4.0 - a collection of lightweight, state-of-the-art open foundation models from IBM that natively support multilingual capabilities, a wide range of coding tasks—including fill-in-the-middle (FIM) code completion—retrieval-augmented generation (RAG), tool usage and structured JSON output
  • EXAONE-4.0 - a collection of LLMs from LG AI Research, integrating non-reasoning and reasoning modes
  • ERNIE 4.5 - a collection of large-scale multimodal models from Baidu
  • Seed-OSS - a collection of LLMs developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features

Back to Table of Contents

Coding

  • Qwen3-Coder - a collection of the Qwen's most agentic code models to date
  • Devstral-Small-2507 - an agentic LLM for software engineering tasks fine-tuned from Mistral-Small-3.1
  • Mellum-4b-base - an LLM from JetBrains, optimized for code-related tasks
  • OlympicCoder-32B - a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics
  • NextCoder - a family of code-editing LLMs developed using the Qwen2.5-Coder Instruct variants as base

Back to Table of Contents

Multimodal

  • Qwen3-Omni - a collection of the natively end-to-end multilingual omni-modal foundation models from Qwen

Back to Table of Contents

Image

  • Qwen-Image - an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing
  • Qwen-Image-Edit-2509 - the image editing version of Qwen-Image extending the base model's unique text rendering capabilities to image editing tasks, enabling precise text editing
  • Qwen3-VL - a collection of the most powerful vision-language models in the Qwen series to date
  • GLM-4.5V - a VLLM based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air
  • HunyuanImage-2.1 - an efficient diffusion model for high-resolution (2K) text-to-image generation​
  • FastVLM - a collection of VLMs with efficient vision encoding from Apple
  • MiniCPM-V-4_5 - a GPT-4o Level MLLM for single image, multi image and high-FPS video understanding on your phone
  • LFM2-VL - a colection of vision-language models, designed for on-device deployment
  • ClipTagger-12b - a vision-language model (VLM) designed for video understanding at massive scale

Back to Table of Contents

Audio

  • Voxtral-Small-24B-2507 - an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance
  • chatterbox - first production-grade open-source TTS model
  • VibeVoice - a collection of frontier text-to-speech models from Microsoft
  • canary-1b-v2 - a multitask speech transcription and translation model from NVIDIA
  • parakeet-tdt-0.6b-v3 - a multilingual speech-to-text model from NVIDIA
  • Kitten TTS - a collection of open-source realistic text-to-speech models designed for lightweight deployment and high-quality voice synthesis

Back to Table of Contents

Miscellaneous

  • Jan-v1-4B - the first release in the Jan Family, designed for agentic reasoning and problem-solving within the Jan App
  • Jan-nano - a compact 4-billion parameter language model specifically designed and trained for deep research tasks
  • Jan-nano-128k - an enhanced version of Jan-nano features a native 128k context window that enables deeper, more comprehensive research capabilities without the performance degradation typically associated with context extension method
  • Arch-Router-1.5B - the fastest LLM router model that aligns to subjective usage preferences
  • gpt-oss-safeguard - a collection of safety reasoning models built-upon gpt-oss
  • Qwen3Guard - a collection of safety moderation models built upon Qwen3
  • HunyuanWorld-1 - an open-source 3D world generation model
  • Hunyuan-GameCraft-1.0 - a novel framework for high-dynamic interactive video generation in game environments

Back to Table of Contents

Tools

Models

  • unsloth - fine-tuning & reinforcement learning for LLMs
  • outlines - structured outputs for LLMs
  • llama-swap - reliable model swapping for any local OpenAI compatible server - llama.cpp, vllm, etc.

Back to Table of Contents

Agent Frameworks

  • AutoGPT - a powerful platform that allows you to create, deploy, and manage continuous AI agents that automate complex workflows
  • langflow - a powerful tool for building and deploying AI-powered agents and workflows
  • langchain - build context-aware reasoning applications
  • autogen - a programming framework for agentic AI
  • anything-llm - the all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more
  • Flowise - build AI agents, visually
  • llama_index - the leading framework for building LLM-powered agents over your data
  • crewAI - a framework for orchestrating role-playing, autonomous AI agents
  • agno - a full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning
  • sim - open-source platform to build and deploy AI agent workflows
  • openai-agents-python - a lightweight, powerful framework for multi-agent workflows
  • SuperAGI - an open-source framework to build, manage and run useful Autonomous AI Agents
  • camel - the first and the best multi-agent framework
  • pydantic-ai - a Python agent framework designed to help you quickly, confidently, and painlessly build production grade applications and workflows with Generative AI
  • txtai - all-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
  • agent-framework - a framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET
  • archgw - a high-performance proxy server that handles the low-level work in building agents: like applying guardrails, routing prompts to the right agent, and unifying access to LLMs, etc.
  • ClaraVerse - privacy-first, fully local AI workspace with Ollama LLM chat, tool calling, agent builder, Stable Diffusion, and embedded n8n-style automation
  • ragbits - building blocks for rapid development of GenAI applications

Back to Table of Contents

Model Context Protocol

  • mindsdb - federated query engine for AI - the only MCP Server you'll ever need
  • github-mcp-server - GitHub's official MCP Server
  • playwright-mcp - Playwright MCP server
  • chrome-devtools-mcp - Chrome DevTools for coding agents
  • n8n-mcp - a MCP for Claude Desktop / Claude Code / Windsurf / Cursor to build n8n workflows for you
  • awslabs/mcp - AWS MCP Servers — helping you get the most out of AWS, wherever you use MCP
  • mcp-atlassian - MCP server for Atlassian tools (Confluence, Jira)

Back to Table of Contents

Retrieval-Augmented Generation

  • pathway - Python ETL framework for stream processing, real-time analytics, LLM pipelines and RAG
  • graphrag - a modular graph-based RAG system
  • LightRAG - simple and fast RAG
  • haystack - AI orchestration framework to build customizable, production-ready LLM applications, best suited for building RAG, question answering, semantic search or conversational agent chatbots
  • vanna - an open-source Python RAG framework for SQL generation and related functionality
  • graphiti - build real-time knowledge graphs for AI Agents
  • onyx - the AI platform connected to your company's docs, apps, and people
  • claude-context - make entire codebase the context for any coding agent
  • pipeshub-ai - a fully extensible and explainable workplace AI platform for enterprise search and workflow automation

Back to Table of Contents

Coding Agents

  • zed - a next-generation code editor designed for high-performance collaboration with humans and AI
  • OpenHands - a platform for software development agents powered by AI
  • cline - autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way
  • aider - AI pair programming in your terminal
  • opencode - a AI coding agent built for the terminal
  • tabby - an open-source GitHub Copilot alternative, set up your own LLM-powered code completion server
  • continue - create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks
  • void - an open-source Cursor alternative, use AI agents on your codebase, checkpoint and visualize changes, and bring any model or host locally
  • goose - an open-source, extensible AI agent that goes beyond code suggestions
  • Roo-Code - a whole dev team of AI agents in your code editor
  • crush - the glamourous AI coding agent for your favourite terminal
  • kilocode - open source AI coding assistant for planning, building, and fixing code
  • humanlayer - the best way to get AI coding agents to solve hard problems in complex codebases
  • ProxyAI - the leading open-source AI copilot for JetBrains

Back to Table of Contents

Computer Use

  • open-interpreter - a natural language interface for computers
  • OmniParser - a simple screen parsing tool towards pure vision based GUI agent
  • cua - the Docker Container for Computer-Use AI Agents
  • self-operating-computer - a framework to enable multimodal models to operate a computer
  • Agent-S - an open agentic framework that uses computers like a human

Back to Table of Contents

Browser Automation

  • puppeteer - a JavaScript API for Chrome and Firefox
  • playwright - a framework for Web Testing and Automation
  • browser-use - make websites accessible for AI agents
  • firecrawl - turn entire websites into LLM-ready markdown or structured data
  • stagehand - the AI Browser Automation Framework
  • nanobrowser - open-source Chrome extension for AI-powered web automation

Back to Table of Contents

Memory Management

  • mem0 - universal memory layer for AI Agents
  • letta - the stateful agents framework with memory, reasoning, and context management
  • supermemory - memory engine and app that is extremely fast, scalable
  • cognee - memory for AI Agents in 5 lines of code
  • LMCache - supercharge your LLM with the fastest KV Cache Layer
  • memU - an open-source memory framework for AI companions

Back to Table of Contents

Testing, Evaluation, and Observability

  • langfuse - an open-source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more
  • opik - debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards
  • openllmetry - an open-source observability for your LLM application, based on OpenTelemetry
  • garak - the LLM vulnerability scanner from NVIDIA
  • giskard - an open-source evaluation & testing for AI & LLM systems
  • agenta - an open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place

Back to Table of Contents

Research

  • Perplexica - an open-source alternative to Perplexity AI, the AI-powered search engine
  • gpt-researcher - an LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations
  • SurfSense - an open-source alternative to NotebookLM / Perplexity / Glean
  • open-notebook - an open-source implementation of Notebook LM with more flexibility and features
  • RD-Agent - automate the most critical and valuable aspects of the industrial R&D process
  • local-deep-researcher - fully local web research and report writing assistant
  • local-deep-research - an AI-powered research assistant for deep, iterative research
  • maestro - an AI-powered research application designed to streamline complex research tasks

Back to Table of Contents

Training and Fine-tuning

  • OpenRLHF - an easy-to-use, high-performance open-source RLHF framework built on Ray, vLLM, ZeRO-3 and HuggingFace Transformers, designed to make RLHF training simple and accessible
  • Kiln - the easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets
  • augmentoolkit - train an open-source LLM on new facts

Back to Table of Contents

Miscellaneous

  • context7 - up-to-date code documentation for LLMs and AI code editors
  • cai - Cybersecurity AI (CAI), the framework for AI Security
  • speakr - a personal, self-hosted web application designed for transcribing audio recordings
  • presenton - an open-source AI presentation generator and API
  • OmniGen2 - exploration to advanced multimodal generation
  • 4o-ghibli-at-home - a powerful, self-hosted AI photo stylizer built for performance and privacy
  • Observer - local open-source micro-agents that observe, log and react, all while keeping your data private and secure
  • mobile-use - a powerful, open-source AI agent that controls your Android or IOS device using natural language
  • gabber - build AI applications that can see, hear, and speak using your screens, microphones, and cameras as inputs
  • promptcat - a zero-dependency prompt manager/catalog/library in a single HTML file

Back to Table of Contents

Hardware

Back to Table of Contents

Tutorials

Models

Back to Table of Contents

Prompt Engineering

Back to Table of Contents

Context Engineering

  • Context-Engineering - a frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization
  • Awesome-Context-Engineering - a comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems

Back to Table of Contents

Inference

  • vLLM Production Stack - vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Back to Table of Contents

Agents

Back to Table of Contents

Retrieval-Augmented Generation

  • Pathway AI Pipelines - ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data
  • RAG Techniques - various advanced techniques for Retrieval-Augmented Generation (RAG) systems
  • Controllable RAG Agent - an advanced Retrieval-Augmented Generation (RAG) solution for complex question answering that uses sophisticated graph based algorithm to handle the tasks
  • LangChain RAG Cookbook - a collection of modular RAG techniques, implemented in LangChain + Python

Back to Table of Contents

Miscellaneous

Back to Table of Contents

Communities

Back to Table of Contents

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to get started.

About

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published