I originally was going to write this post about how bad Apple Visual Intelligence is, but instead I’ve decided to write about how good Perplexity visual intelligence is. I’ve been traveling around Europe for the past few weeks in places where I don’t speak the language. I thought this would be a good time to try out Apple Visual Intelligence, one of the features they discuss is translating text: Use visual intelligence on iPhone – Apple Support (UK)
The hardest part about using it is the UI. Apparently visual intelligence is a separate mode that you go into. If you’re in the camera app already, you can’t use it. You have to swipe out of the camera and then long press on the camera button to enter this mode. Why Apple? Why can’t I smoothly decide to invoke visual intelligence from the camera?
The other troublesome part is it will overlay the English on the sign and sometimes it is really tiny and hard to see. You can tap on it, but sometimes the text is split up and you have to tap several areas. Another issue is that you can’t invoke visual intelligence from the photos app. You have to be standing in front of the sign or building to invoke this special mode. If you want to look up the info a few minutes later, you’re out of luck.
OK, I said I’d be positive, so this will be the end of my criticism of Visual Intelligence. Now I’ll discuss Perplexity. I like that you can use a photo you took earlier in the day or take a photo when you’re standing in front of the building or sign. As an example, we were walking back to our hotel and so I took a few photos of restaurants that looked appealing. When I got back to the hotel, I uploaded the photo to Perplexity and it told me all about the place, so I could decide from the comfort of my hotel room whether to make a reservation or not.
But the real power is that I was able to point my camera at an inscription while I was in Rome. I’d ask Perplexity to tell me what the inscription says, but it would go beyond that and tell me the history behind the inscription. When it was inscribed and the reason, celebrating a war in history or the death of a famous person.
BTW, I tried to do this with the ChatGPT Ask button in Apple Visual Intelligence and it would often fail. It would literally translate the inscription, but if I asked questions about it, it wouldn’t know. I think this is weird because most of the time in Perplexity I was using the ChatGPT model which knew a lot about the exact same inscription. What is Apple doing to ChatGPT to make it less useful?
I was also impressed at how Perplexity could say intelligent things about more mundane objects. For example, I pointed my camera at a chandelier in my hotel in Venice. It had this to say: “This is a Murano glass chandelier in the classic Venetian ciocca (floral bouquet) style, likely inspired by the iconic Ca’ Rezzonico design tradition. Given that you’re in Venice, this could very well be a locally crafted piece hanging in a period residence or hotel.”
There are actually several paragraphs more, giving the history of the design and also including this gem “The warm, diffused light pattern cast on the ceiling — visible in your photo — is a hallmark of authentic Murano chandeliers.” Considering that the hotel was near the Ca’ Rezzonico museum, I’d say Perplexity nailed the details of this chandelier!
I was going to visit the Sistine Chapel so I asked Perplexity to create a document about the ceiling and asked it to include pictures so I could study it before I visited the real thing. It created a 13 page pdf describing the history of the commission and then detailing all the different sections of the painting and what they mean. I saved the pdf on my iPhone and read it a couple of times as I waited in line to get in.
I also like the fact that Perplexity gives sources for everything it says. I can click through and look at the sources and verify what it is summarizing matches what is in my photo. The Sistine Chapel doc also had a list of references so I could read more in depth after I had digested the summary.
I also like how it tried to infer more information about the building or sign from what was surrounding it. It would see I was in Rome for example, and it could tell from my photo that an inscription appeared to be near the Coliseum. Sometimes it would ask me where I took the photo so it could get more details about it. I asked it to read the photo metadata, but apparently Perplexity strips that out before it gets sent to the server.
Overall, I found it very handy to have this tool in my pocket and be able to identify both mundane things like restaurants and chandeliers as well as historical buildings. I sure hope Apple is keeping an eye on the competition and is working on giving us something as good as if not better than what Perplexity can do today.