Running LLMs locally, without subscribing to paid plans using CLI

How to build a ChatGPT-alternative that runs completely local and is 100% offline.

Posted Dec 24, 2024 Updated Jul 9, 2025

By Ketan Momaya

2 min read

Running LLMs (that are open weights or open source) on local/on-premise computer has few benefits:

Data Privacy and security: private data remains on your device, as it doesn’t need sending data over the internet to the cloud
Offline Capability: Once downloaded, models can run without the internet
No paid subscriptions: Computer and inference is on your machine hence no additional cost for your application (important for token-intensive applications)
Speed Increase: Depending on your desk/laptop hardware(slower inference, without GPU/NPU), you may see potential speed increase due to no HTTP call overheads
Customization: Local deployment allows custom modifications to models

There are many open-source tools for hosting open weights LLMs locally for inference - command line interface (CLI) or full GUI based.

Few popular are GPT4ALL, Jan.ai, Ollama.

It is easy to set up local-first AI, running open-source FMs directly on your computer.

On a windows laptop, I chose Ollama and Open WebUI as runners for LLMs.

What is Ollama?

It’s an open-source tool and one of the most active inference server in the open source community. It allows you to run LLMs or SLMs on your local machine, and is optimized for both CPU’s and GPU’s. One can use it to download foundation models directly, and it provides a library of quantized models, which are optimized versions of the models at the cost of some quality loss. That means that you can choose your trade off between speed and quality depending of your needs. It comes with an OpenAI-compatible API, so it can works with many existing applications, and it makes it easy to swap in the model you’re using when working locally or deploying to production.

Ollama runs as a native Windows application (wsl not needed), and has support for NVIDIA and AMD Radeon GPU.

Set up instructions at: Ollama

List of models available for download on Models Library

[1/25/2025 update], Deepseek-r1 is among the most popular with 1.4MN+ pulls

Once Ollama is installed, it is very easy to run from either Windows command-line shells: Command shell or PowerShell