My setup for running open models
Mostly out of curiosity and desire to learn I've tried to run open models locally on both LM studio and ollama, but I quickly realized the limitations intrinsic to my hardware (just a high-spec'd laptop).
Curious to try AWS Bedrock I eventually settled on the following setup:
litellm exposing Bedrock models (Qwen, atm) locally on an OpenAPI-compatible API (yes it's a mouthful).
This works great for any tool that can be configured to use an OpenAPI-compatible API like Quill meetings.
Getting VS code to work with this setup was more challenging as it required VS Code Insiders (the bleeding edge, AFAIU) and even in that case VS Code tends to forget settings or use them inconsistently. For example it always uses copilot for the inline code actions.
llm required some tweaking too, in particular the setting suggested in this comment.
I am very impressed with litellm which provides accurate usage tracking per team or account. The potential for offering llm access on an internal network with budgeting seems very interesting.
I'm very impressed by how mature and thriving the ecosystem is.