A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally
- Inference platforms
- Inference engines
- User Interfaces
- Large Language Models
- Tools
- Hardware
- Tutorials
- Communities
- LM Studio - discover, download and run local LLMs
jan - an open source alternative to ChatGPT that runs 100% offline on your computer
LocalAI - the free, open-source alternative to OpenAI, Claude and others
ChatBox - user-friendly desktop client app for AI models/LLMs
lemonade - a local LLM server with GPU and NPU Acceleration
ollama - get up and running with LLMs
llama.cpp - LLM inference in C/C++
vllm - a high-throughput and memory-efficient inference and serving engine for LLMs
exo - run your own AI cluster at home with everyday devices
BitNet - official inference framework for 1-bit LLMs
sglang - a fast serving framework for large language models and vision language models
Nano-vLLM - a lightweight vLLM implementation built from scratch
koboldcpp - run GGUF models easily with a KoboldAI UI
gpustack - simple, scalable AI model deployment on GPU clusters
mlx-lm - generate text and fine-tune large language models on Apple silicon with MLX
distributed-llama - connect home devices into a powerful cluster to accelerate LLM inference
ik_llama.cpp - llama.cpp fork with additional SOTA quants and improved performance
FastFlowLM - run LLMs on AMD Ryzen™ AI NPUs
vllm-gfx906 - vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
llm-scaler - run LLMs on Intel Arc™ Pro B60 GPUs
Open WebUI - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Lobe Chat - an open-source, modern design AI chat framework
Text generation web UI - LLM UI with advanced features, easy setup, and multiple backend support
SillyTavern - LLM Frontend for Power Users
Page Assist - Use your locally running AI models to assist you in your web browsing
- AI Models & API Providers Analysis - understand the AI landscape to choose the best model and provider for your use case
- LLM Explorer - explore list of the open-source LLM models
- Dubesor LLM Benchmark table - small-scale manual performance comparison benchmark
- oobabooga benchmark - a list sorted by size (on disk) for each score
- Qwen - powered by Alibaba Cloud
Mistral AI - a pioneering French artificial intelligence startup
- Tencent - a profile of a Chinese multinational technology conglomerate and holding company
- Unsloth AI - focusing on making AI more accessible to everyone (GGUFs etc.)
- bartowski - providing GGUF versions of popular LLMs
- Beijing Academy of Artificial Intelligence - a private non-profit organization engaged in AI research and development
- Open Thoughts - a team of researchers and engineers curating the best open reasoning datasets
- Qwen3-Next - a collection of the latest generation Qwen LLMs
Gemma 3 - a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models
gpt-oss - a collection of open-weight models from OpenAI, designed for powerful reasoning, agentic tasks, and versatile developer use cases
Ministral 3 - a collection of edge models, with base, instruct and reasoning variants, in 3 different sizes: 3B, 8B and 14B, all with vision capabilities
- GLM-4.5 - a collection of hybrid reasoning models designed for intelligent agents
- Hunyuan - a collection of Tencent's open-source efficient LLMs designed for versatile deployment across diverse computational environments
- Phi-4-mini-instruct - a lightweight open model built upon synthetic data and filtered publicly available websites
- NVIDIA Nemotron - a collection of open, production-ready enterprise models trained from scratch by NVIDIA
- Llama Nemotron - a collection of open, production-ready enterprise models from NVIDIA
- OpenReasoning-Nemotron - a collection of models from NVIDIA, trained on 5M reasoning traces for math, code and science
- Granite 4.0 - a collection of lightweight, state-of-the-art open foundation models from IBM that natively support multilingual capabilities, a wide range of coding tasks—including fill-in-the-middle (FIM) code completion—retrieval-augmented generation (RAG), tool usage and structured JSON output
- EXAONE-4.0 - a collection of LLMs from LG AI Research, integrating non-reasoning and reasoning modes
- ERNIE 4.5 - a collection of large-scale multimodal models from Baidu
- Seed-OSS - a collection of LLMs developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features
- Qwen3-Coder - a collection of the Qwen's most agentic code models to date
Devstral-Small-2507 - an agentic LLM for software engineering tasks fine-tuned from Mistral-Small-3.1
- Mellum-4b-base - an LLM from JetBrains, optimized for code-related tasks
- OlympicCoder-32B - a code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics
- NextCoder - a family of code-editing LLMs developed using the Qwen2.5-Coder Instruct variants as base
- Qwen3-Omni - a collection of the natively end-to-end multilingual omni-modal foundation models from Qwen
- Qwen-Image - an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing
- Qwen-Image-Edit-2509 - the image editing version of Qwen-Image extending the base model's unique text rendering capabilities to image editing tasks, enabling precise text editing
- Qwen3-VL - a collection of the most powerful vision-language models in the Qwen series to date
- GLM-4.5V - a VLLM based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air
- HunyuanImage-2.1 - an efficient diffusion model for high-resolution (2K) text-to-image generation
- FastVLM - a collection of VLMs with efficient vision encoding from Apple
- MiniCPM-V-4_5 - a GPT-4o Level MLLM for single image, multi image and high-FPS video understanding on your phone
- LFM2-VL - a colection of vision-language models, designed for on-device deployment
- ClipTagger-12b - a vision-language model (VLM) designed for video understanding at massive scale
Voxtral-Small-24B-2507 - an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance
- chatterbox - first production-grade open-source TTS model
- VibeVoice - a collection of frontier text-to-speech models from Microsoft
- canary-1b-v2 - a multitask speech transcription and translation model from NVIDIA
- parakeet-tdt-0.6b-v3 - a multilingual speech-to-text model from NVIDIA
- Kitten TTS - a collection of open-source realistic text-to-speech models designed for lightweight deployment and high-quality voice synthesis
- Jan-v1-4B - the first release in the Jan Family, designed for agentic reasoning and problem-solving within the Jan App
- Jan-nano - a compact 4-billion parameter language model specifically designed and trained for deep research tasks
- Jan-nano-128k - an enhanced version of Jan-nano features a native 128k context window that enables deeper, more comprehensive research capabilities without the performance degradation typically associated with context extension method
- Arch-Router-1.5B - the fastest LLM router model that aligns to subjective usage preferences
- gpt-oss-safeguard - a collection of safety reasoning models built-upon gpt-oss
- Qwen3Guard - a collection of safety moderation models built upon Qwen3
- HunyuanWorld-1 - an open-source 3D world generation model
- Hunyuan-GameCraft-1.0 - a novel framework for high-dynamic interactive video generation in game environments
unsloth - fine-tuning & reinforcement learning for LLMs
outlines - structured outputs for LLMs
llama-swap - reliable model swapping for any local OpenAI compatible server - llama.cpp, vllm, etc.
AutoGPT - a powerful platform that allows you to create, deploy, and manage continuous AI agents that automate complex workflows
langflow - a powerful tool for building and deploying AI-powered agents and workflows
langchain - build context-aware reasoning applications
autogen - a programming framework for agentic AI
anything-llm - the all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more
Flowise - build AI agents, visually
llama_index - the leading framework for building LLM-powered agents over your data
crewAI - a framework for orchestrating role-playing, autonomous AI agents
agno - a full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning
sim - open-source platform to build and deploy AI agent workflows
openai-agents-python - a lightweight, powerful framework for multi-agent workflows
SuperAGI - an open-source framework to build, manage and run useful Autonomous AI Agents
camel - the first and the best multi-agent framework
pydantic-ai - a Python agent framework designed to help you quickly, confidently, and painlessly build production grade applications and workflows with Generative AI
txtai - all-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
agent-framework - a framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET
archgw - a high-performance proxy server that handles the low-level work in building agents: like applying guardrails, routing prompts to the right agent, and unifying access to LLMs, etc.
ClaraVerse - privacy-first, fully local AI workspace with Ollama LLM chat, tool calling, agent builder, Stable Diffusion, and embedded n8n-style automation
ragbits - building blocks for rapid development of GenAI applications
mindsdb - federated query engine for AI - the only MCP Server you'll ever need
github-mcp-server - GitHub's official MCP Server
playwright-mcp - Playwright MCP server
chrome-devtools-mcp - Chrome DevTools for coding agents
n8n-mcp - a MCP for Claude Desktop / Claude Code / Windsurf / Cursor to build n8n workflows for you
awslabs/mcp - AWS MCP Servers — helping you get the most out of AWS, wherever you use MCP
mcp-atlassian - MCP server for Atlassian tools (Confluence, Jira)
pathway - Python ETL framework for stream processing, real-time analytics, LLM pipelines and RAG
graphrag - a modular graph-based RAG system
LightRAG - simple and fast RAG
haystack - AI orchestration framework to build customizable, production-ready LLM applications, best suited for building RAG, question answering, semantic search or conversational agent chatbots
vanna - an open-source Python RAG framework for SQL generation and related functionality
graphiti - build real-time knowledge graphs for AI Agents
onyx - the AI platform connected to your company's docs, apps, and people
claude-context - make entire codebase the context for any coding agent
pipeshub-ai - a fully extensible and explainable workplace AI platform for enterprise search and workflow automation
zed - a next-generation code editor designed for high-performance collaboration with humans and AI
OpenHands - a platform for software development agents powered by AI
cline - autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way
aider - AI pair programming in your terminal
opencode - a AI coding agent built for the terminal
tabby - an open-source GitHub Copilot alternative, set up your own LLM-powered code completion server
continue - create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks
void - an open-source Cursor alternative, use AI agents on your codebase, checkpoint and visualize changes, and bring any model or host locally
goose - an open-source, extensible AI agent that goes beyond code suggestions
Roo-Code - a whole dev team of AI agents in your code editor
crush - the glamourous AI coding agent for your favourite terminal
kilocode - open source AI coding assistant for planning, building, and fixing code
humanlayer - the best way to get AI coding agents to solve hard problems in complex codebases
ProxyAI - the leading open-source AI copilot for JetBrains
open-interpreter - a natural language interface for computers
OmniParser - a simple screen parsing tool towards pure vision based GUI agent
cua - the Docker Container for Computer-Use AI Agents
self-operating-computer - a framework to enable multimodal models to operate a computer
Agent-S - an open agentic framework that uses computers like a human
puppeteer - a JavaScript API for Chrome and Firefox
playwright - a framework for Web Testing and Automation
browser-use - make websites accessible for AI agents
firecrawl - turn entire websites into LLM-ready markdown or structured data
stagehand - the AI Browser Automation Framework
nanobrowser - open-source Chrome extension for AI-powered web automation
mem0 - universal memory layer for AI Agents
letta - the stateful agents framework with memory, reasoning, and context management
supermemory - memory engine and app that is extremely fast, scalable
cognee - memory for AI Agents in 5 lines of code
LMCache - supercharge your LLM with the fastest KV Cache Layer
memU - an open-source memory framework for AI companions
langfuse - an open-source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more
opik - debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards
openllmetry - an open-source observability for your LLM application, based on OpenTelemetry
garak - the LLM vulnerability scanner from NVIDIA
giskard - an open-source evaluation & testing for AI & LLM systems
agenta - an open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place
Perplexica - an open-source alternative to Perplexity AI, the AI-powered search engine
gpt-researcher - an LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations
SurfSense - an open-source alternative to NotebookLM / Perplexity / Glean
open-notebook - an open-source implementation of Notebook LM with more flexibility and features
RD-Agent - automate the most critical and valuable aspects of the industrial R&D process
local-deep-researcher - fully local web research and report writing assistant
local-deep-research - an AI-powered research assistant for deep, iterative research
maestro - an AI-powered research application designed to streamline complex research tasks
OpenRLHF - an easy-to-use, high-performance open-source RLHF framework built on Ray, vLLM, ZeRO-3 and HuggingFace Transformers, designed to make RLHF training simple and accessible
Kiln - the easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets
augmentoolkit - train an open-source LLM on new facts
context7 - up-to-date code documentation for LLMs and AI code editors
cai - Cybersecurity AI (CAI), the framework for AI Security
speakr - a personal, self-hosted web application designed for transcribing audio recordings
presenton - an open-source AI presentation generator and API
OmniGen2 - exploration to advanced multimodal generation
4o-ghibli-at-home - a powerful, self-hosted AI photo stylizer built for performance and privacy
Observer - local open-source micro-agents that observe, log and react, all while keeping your data private and secure
mobile-use - a powerful, open-source AI agent that controls your Android or IOS device using natural language
gabber - build AI applications that can see, hear, and speak using your screens, microphones, and cameras as inputs
promptcat - a zero-dependency prompt manager/catalog/library in a single HTML file
Alex Ziskind - tests of pcs, laptops, gpus etc. capable of running LLMs
Digital Spaceport - reviews of various builds designed for LLM inference
JetsonHacks - information about developing on NVIDIA Jetson Development Kits
Miyconst - tests of various types of hardware capable of running LLMs
- Kolosal - LLM Memory calculator - estimate the RAM requirements of any GGUF model instantly
- LLM Inference VRAM & GPU Requirement Calculator - calculate how many GPUs you need to deploy LLMs
ZLUDA - CUDA on non-NVIDIA GPUs
Let's reproduce GPT-2 (124M)
nanochat - a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase, designed to run on a single 8XH100 node via scripts like speedrun.sh, that run the entire pipeline start to end
Knowledge Distillation: How LLMs train each other
gguf-docs - Docs for GGUF quantization (unofficial)
Prompt Engineering Guide - guides, papers, lecture, notebooks and resources for prompt engineering
Prompt Engineering by NirDiamant - a comprehensive collection of tutorials and implementations for Prompt Engineering techniques, ranging from fundamental concepts to advanced strategies
Prompting guide 101 - a quick-start handbook for effective prompts by Google
Prompt Engineering by Google - prompt engineering by Google
Prompt Engineering by Anthropic - prompt engineering by Anthropic
Prompt Engineering Interactive Tutorial - Prompt Engineering Interactive Tutorial by Anthropic
Real world prompting - real world prompting tutorial by Anthropic
Prompt evaluations - prompt evaluations course by Anthropic
system-prompts-and-models-of-ai-tools - a collection of system prompts extracted from AI tools
system_prompts_leaks - a collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
Prompt from Codex - Prompt used to steer behavior of OpenAI's Codex
Context-Engineering - a frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization
Awesome-Context-Engineering - a comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems
vLLM Production Stack - vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
GenAI Agents - tutorials and implementations for various Generative AI Agent techniques
500+ AI Agent Projects - a curated collection of AI agent use cases across various industries
12-Factor Agents - principles for building reliable LLM applications
Agents towards production - end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for real-world launches
LLM Agents & Ecosystem Handbook - one-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools
601 real-world gen AI use cases - 601 real-world gen AI use cases from the world's leading organizations by Google
A practical guide to building agents - a practical guide to building agents by OpenAI
Pathway AI Pipelines - ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data
RAG Techniques - various advanced techniques for Retrieval-Augmented Generation (RAG) systems
Controllable RAG Agent - an advanced Retrieval-Augmented Generation (RAG) solution for complex question answering that uses sophisticated graph based algorithm to handle the tasks
LangChain RAG Cookbook - a collection of modular RAG techniques, implemented in LangChain + Python
LocalLLaMA
LLMDevs
LocalLLM
LocalAIServers
GenAI monitor - monitoring updates & fresh releases related to LLMs, diffusion models and Generative AI
We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to get started.