Post

Running LLMs locally, without subscribing to paid plans using CLI

How to build a ChatGPT-alternative that runs completely local and is 100% offline.

Running LLMs (that are open weights or open source) on local/on-premise computer has few benefits:

  • Data Privacy and security: private data remains on your device, as it doesn’t need sending data over the internet to the cloud
  • Offline Capability: Once downloaded, models can run without the internet
  • No paid subscriptions: Computer and inference is on your machine hence no additional cost for your application (important for token-intensive applications)
  • Speed Increase: Depending on your desk/laptop hardware(slower inference, without GPU/NPU), you may see potential speed increase due to no HTTP call overheads
  • Customization: Local deployment allows custom modifications to models

There are many open-source tools for hosting open weights LLMs locally for inference - command line interface (CLI) or full GUI based.

Few popular are GPT4ALL, Jan.ai, Ollama.

It is easy to set up local-first AI, running open-source FMs directly on your computer.

On a windows laptop, I chose Ollama and Open WebUI as runners for LLMs.

What is Ollama?

It’s an open-source tool and one of the most active inference server in the open source community. It allows you to run LLMs or SLMs on your local machine, and is optimized for both CPU’s and GPU’s. One can use it to download foundation models directly, and it provides a library of quantized models, which are optimized versions of the models at the cost of some quality loss. That means that you can choose your trade off between speed and quality depending of your needs. It comes with an OpenAI-compatible API, so it can works with many existing applications, and it makes it easy to swap in the model you’re using when working locally or deploying to production.

Ollama runs as a native Windows application (wsl not needed), and has support for NVIDIA and AMD Radeon GPU.

Set up instructions at: Ollama

List of models available for download on Models Library

[1/25/2025 update], Deepseek-r1 is among the most popular with 1.4MN+ pulls

Once Ollama is installed, it is very easy to run from either Windows command-line shells: Command shell or PowerShell

Basic Ollama commands and output:

Ollama Commands

Examples using Command Line Interface (CLI)

Text Summarization 1 summarizing a large text file e.g. monopoly.txt

Summarization

Note: The ‘<’ operator didn’t work in Powershell, but works on command shell

Text Summarization 2 summarize Git User Manual (4K lines long) Download user manual here Git User Manual

Summarization

Content Creation: Write an article, or blog post, or create copy

content creation

Create Blog

Answering Questions: Ask question and let Gen AI offer recommendation

Question & Recommendation

This post is licensed under CC BY 4.0 by the author.