"Power using" AI tools

For the time being, I will treat AI-powered chat bots the way I treat Wikipedia – with an involuntary eye-roll whenever someone provides a Wikipedia link to back up their arguments.

1 Like

OpenAI does not support this (that I’ve seen), but I’m liking the practice of annotating results with confidence intervals and then using that to style with configurable highlighting. I see potential for annotation based on different types of non-confidence, too.

I think one can be a bit more nuanced than that.

If you use ChatGPT then you can use the WebChatGPT extension and thus it will give you links to the source.

If you use Bing AI Chat the it routinely gives you references.

As in - references back to accepted primary or secondary sources - not references to anonymous Wikipedia articles.

In either case, you can specify the desired source of references - such as Pubmed for medicine or whatever other domain(s) you consider somewhat authoritative.

Ultimately I view this as simply a tool to help me locate relevant primary sourcees; that really is no different from a Google search or an old-fashioned paper directory search in the library. None of them guarantee relevancy or accuracy of the source; that’s why it helps to use multiple search mechanisms and multiple sources.

That said - there is no question that both in my personal and professional life, either ChatGPT (with WebGPT add-on) or Big AI Chat is an order of magnitude improvement in terms of both speed and thoroughness of locating desired information. Clearly they will improve over time, as will my workflow in learning how a search using these technologies differs from other types of searches.

That could very well be done - a very basic discussion in that regard is here:

1 Like

Regarding reliability - if you give Bing AI specific direction regarding a source to use then in my experiments so far it is remarkably accurate in quoting it

In a way @karlnyhus’s parallel with Wikipedia is remarkably good. Everything in Wikipedia is supposed to be sourced as well, and there’s a process for making sure the sources are good.

In both cases you can follow the links and check to see if they in fact are reliable and support the referenced statement.

I think the point @karlnyhus was making is, in both cases you should probably check those links…

4 Likes

This statement bothers me and it has been said several times.

I expect improvements in a software package: in its features, its user interface, its speed, its reach. That sort of thing. But if an editor comes out of the box and loses text, I can’t use it.

And if an AI comes out of the box and makes dumb or fanciful statements, likewise, I can’t use it. Trust, as with people, has to be earned. It is not enough to say that it will get “better.” Accuracy for something that provides information is a core competency. If that is an area in which it needs time to improve, then its maker released it too soon.

2 Likes

It is eminently usable currently. I would be quite disappointed both personally and professionally if it were to disappear. Indeed it works well enough that I am working on some scripts and a web app for me to use personally to integrate it tighter into my daily workflow - this is as big a change for me as the introduction of Javascript or initial release of Devonthink.

That said - my main use case is for brainstorming ideas and searching for information sources; it works extremely well there as long as I confirm any data point before using it - as I ought to do anyway. Any feedback to OpenAI/Microsoft will no doubt make a great product even bettter.

I suspect that a majority of the ominous concerns/criticisms we are receiving are from those who want to use it to author text - maybe even without proofreading it and/or passing it off as original. That’s not a good use of the product (at least not now) so yes those people will be disappointed.

1 Like

I see more noise than signal in its ability to cite and review medical literature. It may be useful as a very rough initial search. It sometimes gives me references that don’t exist.

In other areas, such as computer programming, it’s very useful.

1 Like

Do you mean ChatGPT out of the box? Yes, it 100% makes up medical references.

However, if you use the WebChatGPT extension and add the suffix site:pubmed.ncbi.nlm.nih.gov to your searches then ChatGPT works very well for medical literature searches.

Alternatively if you use Bing AI Chat and mention the word pubmed somewhere in your question then it will mostly give you articles from Pubmed.

2 Likes

Simon Willison is a brilliant and entertaining programmer who has built some really useful tools (including Datasette).

He also is fascinated by the AI tools being rolled out, and been writing about them. This post is great, but so are the others linked at the end of it.

Note that while he clearly enjoys poking holes on the existing models – the funnier or weirder the better – he also sees great potential for AI-assisted search, and has some thoughtful insights on where we are in heading that direction.

I’d quote a few paragraphs from the link, but there are just too many good ones.

1 Like

Here are a few (I’m just getting started)…

Dinner recommendations:

I have red cabbage, sweet potatoes, beans, typical pantry staples. What can I make for dinner?

I eat everything, but my wife is vegan. Plus I don’t have a lot of time.

Hard to write emails:

Write a short email declining an offer to speak at a conference. Be warm, but direct. Make a point about climate change and air travel. Keep things open for future opportunities.

Write a message that will convince the director of IT to allow us to use Zapier.

Fun things to send to friends:

Write a poem about squid

Write a limerick about a builder named Chris on his birthday

(Edited to clarify these are Chat GPT)

3 Likes

Thanks, that’s better. Some decent responses with WebChatGPT. Some still a bit off. “[Here’s a study, but we caution that it’s yet to be published.]” then provides link to the study published in a peer reviewed journal a few years ago.

1 Like

I agree it is not perfect… but then neither is a standard Pubmed search or Google scholar search.

Each has its place. None of them can be relied on without cross-reference and verification.

That said - at this point if I want to look up something in the medical literature my first go-to destination will be either WebChatGPT or Bing AI Chat; I find those to get me to a highly on-point paper quicker than the others.

Keep in mind you can ask either WebChatGPT or BingAI to summarize each article - and the summary can be 1-sentence, 3-sentences, or however you wish. I find that to be particularly helpful.

I have found this format to be helpful in WebChatGPT:

Display a list of 10 pubmed articles on low back pain. Make each item a 1-sentence summary conclusion as an active href hyperlink to the URL with the summary conclusion as the title. site:pubmed.ncbi.nlm.nih.gov

And this format in Bing AI:

Do cervical epidural steroid injections work? pubmed table with title/hyperlink, summary, sentiment

One other comment… WebChatGPT seems notably less reliable than Bing Chat AI. This is largely because WebChatGPT is only sending to ChatGPT 10 articles which it has retrieved from the Duck Duck Go search engine. And for unclear reasons DDG sometimes is way off base in its replies.

Thus Bing gives consistent results but WebChatGPT can be notably off the charts at times.

That said - I have found this Prompt within WebChatGPT to be helpful:

Web search results:

{web_results}
Current date: {current_date}

Instructions: Using the provided web search results, write a comprehensive reply to the given query. Make sure to cite results using [[number](URL)] notation after the reference. If the provided search results refer to multiple subjects with the same name, write separate answers for each subject.

list articles  with a 2-sentence summary.  characterize sentiment in 1 word.  Output results as a rendered HTML table with columns for Title/Hyperlink, summary, sentiment 

Web search results:

{web_results}
Current date: {current_date}



Query: {query}
2 Likes

BTW - with WebChatGPT you can see the DDG output which is sent to ChatGPT. For reasons unclear to me, sometimes when your query specifies only Pubmed articles, DDG returns a blank response. [This happens even though the identical query in DDG itself results in lots of responses.]

In that situation where DDG returns nothing, WebChatGPT is most likely to “hallucinate” - it responds to the absence of Pubmed articles from DDG by inventing such articles out of vapor.

One area where AI seems to excel (pun intended) is coding/scripting. I can see developers using this either to do a check on existing work or prompt them to think about cleaner ways to code things. But for us non-dev people, I see a bright future for AI just in spreadsheet formulas alone. Here’s one cool example, and you can use it for free (limited to 5 requests/month) to test it out, otherwise it’s just under $5/mo. paid annually:

You can use it to generate a formula or translate a formula into regular english. I love this concept!

Just tested AIExcelBot.com on a financial spreadsheet w/ a lot of complicated Google Sheets add-ons and extensions - explained a formula very well. This would have been very handy back when I worked in I.T. Now I just use sheets to track investments, but hopefully I can find a use for this soon :slight_smile:

4 Likes

That is interesting… even more so there are apparently other companion apps included in the price if I understand the web page correctly.

One of them is a Google AppsScript generator - that has lots of potential:

That said - This is what I get from OpenAI for free - it looks pretty similar to the sample from the paid app. I need to try them both out to see if one is more accurate. Or maybe this app is using OpenAI itself - in which case where is the value?

We’re finding that we’re converging on prompt front matter (or back matter, since it follows the request?) that describes some of our practices and syntaxes. A prompt management framework might be in our future, at least as long as these models continue to respond better to plain english requests than to just passing snippets of code or bits of linter config.

We’re also finding we can get better code by mentioning names of famous programmers, e.g., we often get more idiomatic Vue by asking for the example to come from or to be approved by Evan You. So we’re potentially moving towards a dictionary of authors or popularizers we might automatically include based on the rest of the request? And (eventually) paying more for requests to dig deeper to please more obscure individuals or eventually even fictitious people, depending on how the business models evolve? Strange territory.

Edit: there’s also this whole matter of protecting IP. Right now we are ad hoc obscuring important bits we pass as examples/context/stubs, but eventually we’ll probably be wanting to run prompts through a security/privacy filter if we aren’t able to self-host tools that provide the power we want.

2 Likes

Has anyone been experimenting with CreateML? I’ve been playing around with using it to identify articles of clothing in images (for an app my wife wants to build) and it’s pretty amazing and easy to use.

1 Like