Bookmarking and saving web page data

I never had that happen. I use webarchives because they save all links unreferenced (so clicking them sends me to the internet) and saves all text and images. I just now check 4 webarchives I created in 2010 and they all displayed fine, including in one case sidebar ‘news’ and ads from 2010 that were embedded in a saved discussion-group page, that would have been updated if they had visited the site.

I don’t think I ever had a webarchive not open and not have the text/images I’d meant to save.

I couldn’t remember exactly where I read what I did, but I searched on the DEVONthink forum, and one of the employees wrote:

In case of dynamic webpages loading contents on demand a web archive is not really a future-proof format (and limited to Apple’s platforms too). A different format like PDF, formatted notes, Markdown or rich text is recommended for archiving and doesn’t require the original webpages.

AFAICT there’s nothing dynamic in my webarchives - I’m saving articles and forum posts and images, which are all inside the webarchive itself. Never had a problem opening one, ever.

1 Like

I reached out to Devon Technologies on the subject. Here’s what they had to say about DEVONthink’s use of Web Archives.

Of course there is a relationship as we use Apple’s frameworks. Currently there is no effect on the capture of webarchives but in the future it is possible we will no longer support this due to Apple’s deprecation of the feature.

For many popular sites, webarchives have not been a “future proof” solution for some time due to the way content is dynamically delivered and the proliferation of JavaScripts that affect page laoding and appearance.

PDF is (and always has been) the most locked down format.

1 Like

The problem I have with PDFs is that they lock the page into a specific size that at best you have to choose at creation time or, at worst, is chosen for you arbitrarily.

The nature of PDF (explicit presentation) goes strongly against the nature of HTML (semantics) and CSS (adaptive presentation).

Much of my webarchive hoard consists of pages from which I want to easily be able to extract text (Safari Reader View makes that exceedingly easy) and/or graphics. Much tougher to do with pdfs.

I’m not sure what year webarchives were introduced by Apple, but I remember I migrated to them from iCab’s years-older alternative, and I still have some of that browser’s webarchives as well. In fact I just fired up iCab, which I never use any more, and was surprised by how many options there were to save webpages… even the ability to save as a Safari webarchive or a zip-file webarchive…

The problem with archives saved by Firefox and Chromium browsers is that you don’t get a nice, tidy single file, but an html page plus a folder containing all assets, and that makes things far too cumbersome for my filing system.

FYI there are a couple of utilities which can extract assets from webarchives in case of emergency:

http://www.splook.com/Software/WebArchive_Folderizer.html

https://robrohan.github.io/WebArchiveExtractor/

I use DEVONThink and save to rich text format and do cleanup as necessary if required. Seems to wrok pretty well for me.

1 Like

Unless you want the whole page, formatted with graphics, which rtf saves won’t help with.

Thanks, but I discovered DEVONthink has a “Convert” menu item which will convert from web archive to any other of their supported formats. Not surprising that they have this given the comprehensiveness of the app, I guess. As I have only just started using DEVONthink for this, I quickly converted all of my saved articles to PDF and had a glance — all very usable.

1 Like

And yet you yourself quoted DevonTechnologies saying, “in the future it is possible we will no longer support this” :confused: Maybe it will be useful to other people in the thread too.

Oh, certainly. But for my case of having picked web archives as my DEVONthink format of choice, and while it is still supported, it is an easy conversion to make in situ.

Honestly, I had only rarely come across the concept of web archives before DEVONthink and KeepIt and given the rather generic name, I didn’t realise it was a “thing” like PDF. I thought it was just descriptive of a bespoke process to “download the stuff and poke it in an archive file.” So I’m glad @anon41602260 mentioned it!

Yup. As I noted, Apple borrowed the idea from others, as I used an earlier version in the iCab browser for a couple of years, I think, before Apple added it to Safari.

So here’s where it gets interesting. I was doing some further research and the only place I can find mention of deprecation is in the WebKit API class documentation on the Apple Developer web site. But that’s a program’s internal representation of said archive, not the file format itself. This class is marked as “available” from macOS 10.13 through 10.14, which is the 2017 and 2018 releases and must therefore be deprecated from 2019.

However, further digging unearthed the fact that the 2019 release of iOS & iPadOS 13 introduced a new ability to save and view .webarchive files. You can see information about this in the MacStories iOS and iPadOS 13 review here.

So, I don’t think the .webarchive file format is going away at all, if this functionality was introduced the year after the API class was deprecated.

I’m sticking with .webarchive for now because there are some pages I tried to convert which do not convert well to PDF.

3 Likes

I use DEVONthink for this and save most pages in markdown (clutter free). When I need images I save as PDF.

2 Likes

Good catch, but I’ve been looking into alternatives just in case. ‘Web ARChive’ is a variant of the ARC format used by the Internet Archive for its Wayback Machine, was designed by the Intl Internet Preservation Consortium, and I think is what’s being used by the Library of Congress, which has so far archived almost 24,000 complete websites. So it might have some staying power. I found a free Mac app to crawl websites, and hrome plugins like this one and this one that implement the standard to make packages viewable “from any browser webpage”.

Oh, cool! I did come across that format but had not looked into what apps could be used.

I deleted my post…

1 Like

I use both Pocket and Pinboard (full versions) — but mostly Pinboard. I like Pocket’s recommendations and caching too, but not its app. I like Pinboard’s notes and being able to use it with LaunchBar. Its API is handy; more on that anon :blush:. And much more to like about Pinboard. Will check out GoodLinks; I use Ngoc Luu’s 1writer a lot.

1 Like

All good. I only discovered that fact because I was looking for what would be replacing it. In terms of the API, it looks like nothing… or if there is something they’re not helping us out by mentioning it anywhere obvious.