Hi,
does anyone have experience with a similar setup with the robot, skills and obsidian that keeps data local ? There are more and more reports of security issues this is why I would like to do all of this local only. Thanks, Wout
Hi,
does anyone have experience with a similar setup with the robot, skills and obsidian that keeps data local ? There are more and more reports of security issues this is why I would like to do all of this local only. Thanks, Wout
That would be highly desirable!
On a recent Macbreak Weekly episode, there was some discussion of using ollama and an app called “Apfel” (I think?), it is made by an Austrian company so uses the German for apple. It was available only for macOS Tahoe.
Whether that does what Cowork does, I don’t know. I haven’t tried it as I don’t have macOS Tahoe on my main computer - although when I launched my home computer today, a dialog flashed up “screen contrast increased” and (hooray!) the screen is readable now.
Thank you I will Digg into that one.
Goose is a free, open source, local alternative to Cowork which can run cloud LLM models or can run local LLM models via Ollama.
It is very well-done and has a pretty strong user base. The software was originally developed for internal use by Block/Square and then they open-sourced it.
If anyone does try these out, it would be great to see feedback here!
Cheers
I’ve tried Apfel and it’s too underpowered and limited to be a “robot assistant” - it’s the built in AI in the macOS, which you’ve probably discovered the limitations of already. But, I’d say it could be great as a step in a script or something like that.
In this post, a patent attorney outlines workstation requirements for building a local model AI for use in drafting patent claims. It does seem to involve a significant amount of prompt engineering.
There seem to be a few options for using Claude Desktop with local models starting to appear.
Most are for Claude Code, but I’m seeing a couple that hint at Cowork as well. I’ll investigate. In the meantime, LM Studio looks interesting and they claimn a Code frontend to local models;
I’m trying it out in play mode…
YMMV
Cheers
Graham
Fotgive my ignorance, if ithe LM Code is using Claude’s capabilties / functionality doesn’t that mean it has to send data to Anthropic’s servers?
Claude Code and Claude Cowork are AI agent harnesses. Here’s a concise definition from Salesforce:
An agent harness is the software infrastructure that wraps around an AI model to manage its lifecycle, context, and interactions with the outside world. It is not the “brain” that does the thinking; instead, it is the environment that provides the brain with the tools, memories, and safety limits it needs to function.
I use Claude Code via Anthropic’s own models, but also via Ollama + a local installation of Gemma 4 (Google’s open-source model). In the latter case, I’m using the Claude Code harness, but the horse is Gemma 4, not, say, Opus 4.7. In this case, nothing gets sent to Anthropic’s servers because both the harness (Claude Code) and the horse (Gemma 4) work locally on my Mac without requiring anything to be sent to anyone’s servers.
I’ve installed the second-largest Gemma 4 model (gemma4:26b), and it’s fine for basic tasks like converting PDFs to markdown or summarizing documents (as long as neither are too long or complex), but it’s definitely not as powerful as the paid versions of Gemini or Claude. For instance, Gemini has no trouble converting a scanned PDF of handwritten notes to a clean markdown document; Gemma 4 needs the handwritten notes to be in jpg format and is much more hit-or-miss deciphering handwriting. Nonetheless, I’m happy to use Claude Code + Gemma4 for basic things and conserve my tokens for big jobs.
I plan on test-driving LM Studio + Gemma 4 today. I also plan on trying Gemma 4 in DEVONthink.
Much as I want this to work, I think we’re several years from local models that are capable of doing this. I have a 64GB M3 Max and I can’t run a decent coding model on my Mac. Basically they use way too much RAM. I’ve not seen believable estimates for Claude Code/Cowork. However, the DeepSeek v4 Flash (the smaller version), needs ~175GB of RAM to run. V4 Pro could easily consume 1TB of RAM. See: DeepSeek-V4 VRAM Requirements - Million-Token Local Inference Guide | Will It Run AI Blog
Agreed. The open-source models I can run comfortably on my 64GB M2 Max Mac Studio are fine for basic tasks, but I’m only inclined to use them in cases where privacy is an absolute must, or I need to husband my tokens.
Thanks, that was very helpful. As were the subsequent follow ups esp. re RAM requirements.
In the post I linked above, a patent attorney working with AI to draft patent specifications - he contended that 32GB of RAM would be sufficient. Doesn’t sound like that is really too successful.
I for one would be interested to hear how you get on with your Devonthink experiment.
I think it depends on what you want to do and how fast you want it done, frankly. I’ll be the first to admit that I may not be handing the task to the local model I’ve put in the Claude Code harness in the most efficient or well-structured way.
The model “GLM OCR” with LM Studio also requires jpg for OCR. It’s a small model that runs on my M2 Macbook Air. In my experience, its handwriting recognition is a little better than Gemini 3.
Not to mention the fans spinning like crazy (I haven’t found a model that doesn’t make my M4 Max Mac Studio want to takeoff)
Well, it’s certainly doable. Again, it’s fine for simple things. But, as @mlevison has noted above, be prepared to allocate a chunk of system resources to the work. I would absolutely consider using a local model if I needed to work with sensitive documents and the task was straightforward.
There are a couple of other considerations that come into play. One is model “style” for lack of a better term: I’ve built my infrastructure of prompts, skills, projects, and what-not around what it’s like to work with Claude and Gemini, and what each needs from me to produce the best results. To get the most out of the local models, I’m going to have to bestow the appropriate infrastructure on them too, keeping their context windows and capabilities in mind. To be fair to Gemma, I haven’t really done that work yet. I suspect I will get better results once I’ve built the right toolbox.
Thanks for the tip. There’s a version of GLM OCR in Ollama’s stable of models, so I might give it a try.