My setup for running open models

December 23, 2025

Mostly out of curiosity and desire to learn I've tried to run open models locally on both LM studio and ollama, but I quickly realized the limitations intrinsic to my hardware (just a high-spec'd laptop).

Curious to try AWS Bedrock I eventually settled on the following setup:

litellm exposing Bedrock models (Qwen, atm) locally on an OpenAPI-compatible API (yes it's a mouthful).

This works great for any tool that can be configured to use an OpenAPI-compatible API like Quill meetings.

Getting VS code to work with this setup was more challenging as it required VS Code Insiders (the bleeding edge, AFAIU) and even in that case VS Code tends to forget settings or use them inconsistently. For example it always uses copilot for the inline code actions.

llm required some tweaking too, in particular the setting suggested in this comment.

I am very impressed with litellm which provides accurate usage tracking per team or account. The potential for offering llm access on an internal network with budgeting seems very interesting.

I'm very impressed by how mature and thriving the ecosystem is.

Search This Blog

unicolet.org

My setup for running open models

Popular posts

Opengrep quickstart

From 0 to ZFS replication in 5m with syncoid

Mirth: recover space when mirthdb grows out of control