What kind of latency and tokens/second are you getting? I’m trying the beta with Gemma 3 4B. It’s a little too dumb, at least with the out-of-the-box system prompts.
Not sure how to measure tokens/second through DEVONthink, is there any part of the DT UI that displays it?
No, that kind of information isn’t shown. I don’t even know that’s is accessible to DEVONthink.
The look and feel of the Markdown editor seems more… not sure how to describe it… well, DT3 felt like a text editor with Markdown syntax highlighting and DT4 feels feels more like a more specialized Markdown editor. Which, giving it a second thought, perhaps the main visible change is the default font. But it feels more polished now and I like it.
Thanks for the nice feedback! I passed this along to development as an encouragement.
Did you notice the Editing > Format settings for line width, margins, and leading (line height) as well as the Markdown specific default font you mentioned?
The bad piece is that local models through ollama are not exactly fast even in a 128GB M4 Max Mac Studio. Will have to investigate more with smaller models and compare it to OpenAI or Anthropic via API, but this doesn’t seem to be a limitation related to DT4, more with the state of the art of running local models on Macs.
And part of what I covered in the Getting Started > AI Explained section of the help.
Start by doing a search on “getting started” on devontechnologies web site. Lots of short-and-sweet to the point material on how to start. Don’t expect to learn it all in a day. Start by learning the difference in indexing vs in-database. Then practice doing searches. Go from there.
Just to keep things in perspective: I’ve got an estimated 40,000 working hours in DEVONthink from 2.x into 4. I have done support, automation, training and documentation for all the releases. And I haven’t “learned it all”. There’s no compulsion to “learn it all”. You learn what you need to know and leave room for exploration, if desired.
(PS: Not a typo, not an exaggeration )
yes, excellent point. You may not ever need some of the features. Learn as you have to. Once you get a grip on basics, then look at the manual to see if any of the more advanced capabilities are useful. Learning is a process.
For individual runs, run ollama in the CLI with --verbose
. I think to vebose log everything you’re serving, which would capture the DEVONthink requests, you’d run serve with OLLAMA_DEBUG
value of 1
.
It should show something along these lines (this is for 4B on an M3 Max):
llama_perf_context_print: load time = 402.36 ms
llama_perf_context_print: prompt eval time = 1315.78 ms / 1536 tokens ( 0.86 ms per token, 1167.37 tokens per second)
llama_perf_context_print: eval time = 1411.78 ms / 104 runs ( 13.57 ms per token, 73.67 tokens per second)
llama_perf_context_print: total time = 2907.67 ms / 1640 tokens
So what… this in the Terminal:
ollama run gemma3 OLLAMA_DEBUG=1
?
It’d be ollama serve OLLAMA_DEBUG=1
. Then ollama run gemma...
in another shell. I can’t test at the moment, unfortunately, but if you run ollama help serve
you’ll see the flag there.
After more than a decade I will not be upgrading. When an app moves to a subscription model you really need to be using it heavily to get a ROI. Subscriptions are only for the serious user not casual user (unless money is no obstacle)
Whilst it’s nice to say you don’t need to pay for new features if you don’t want to. I imagine you will have to pay if you want security and bug fixes, which makes the whole thing moot.
This is no slander of the app. DT is impressive. It’s just that I’m a casual user and the increase in price weeds out the casual user.
I had no luck passing OLLAMA_DEBUG=1 to the serve command and --verbose to the run command.
It seems that the “ollama run” with --verbose option in the CLI will display tokens/sec when prompts are being sent through the same terminal, not when prompts are coming from the ollama serve TCP port which is what DT4 uses , so while I can tell that Microsoft’s Phi4 model delivers 15.18 tokens/sec and Deepseek’s R1 (70b quantization) delivers 3.41 tokens/sec on my M4 Max Mac Studio to answer “Why is the sky blue?” this is not a test done through DT4 with its various system prompts so I would really be testing ollama, not DT4 integration with it so I’d be derailing this thread
Also, considering both phi4 and deepseek-r1:70b responses are basically the same from this one-shot prompt, this goes to show that the economics of running LLM models need to be carefully assessed: bigger is not always better.
Nuts. You could try LMStudio–verbose logging is just a toggle and you can see all your DEVONthink queries scroll by.
I haven’t had a chance to mess with this (as I’m “a little busy” right now), but that’s what I was thinking as well.
It looks like ollama serve can send the llamacpp performance data back in the response, but I can’t get the server to log it.
Perhaps, but maybe not. It has been a while since I used Agenda, which had a similar (same?) license model. What I remember is that I continued to get app updates as they were published, even after my paid year was expired. The app remained fully updated except new features, released after my paid period had passed, were disabled in the app. If I decided to renew for another year the new features were turned on. Hopefully DEVONthink will be close to this.
That’s a different license model that, as far as I know, is largely unique to Agenda.
That being said, I don’t think most desktop apps get what we’d typically consider security updates. That’s typically things like operating systems, software for websites, etc.
Agenda is awesome on how they manage feature flags depending on the actual purchase date of the license, but I think DT approach is more similar to BinaryNights’ licensing model for Forklift.
These nuances apart, I think both approaches are fair compromises between a regular subscription and the “buy the license, get the free updates” model.
The DEVONthink license will not be in the style of Agenda, in which you can always use the latest version (including all fixes), but in which a number of latest features can still be locked. Within Agenda, you buy for unlocking to all the current features available with an in app purchase.
For DEVONthink 4: (source https://www.devontechnologies.com/apps/devonthink/upgrade )
With the purchase of a software license you receive the app itself including one year of updates. When the year is up, you can extend your license to continue receiving updates — but you don’t have to.
Should you choose to not extend the license, you can continue to use DEVONthink; the app just no longer downloads and installs newer updates. The license will neither become deactivated nor restricted in any way. You own the app, we will never take it away from you.
In other words you buy a 1 year of updates (additions, improvement, and fixes), and the license allows for lifetime usage the last version after 1 year of updates, as it is.
In the discussion, repair of critical fixes / errors, the developers state the the following:
source
https://discourse.devontechnologies.com/t/dt4-more-flexible-and-modern-license-model/82624/11
Our updates all include additions and improvements, not just fixes. In fact, in almost 13 years here I can only recall two hotfixes we’ve released.
That being said, if there were a need for a hotfix for someone out of license, we would handle it professionally and fairly in that situation.
Your forums are not configured correctly for the official discourse app integration. It requires very high trust level for the user to be able to connect to the API. I (and many others) can’t connect the official app to regularly go through the forum.