(Batch) process annotations from PDFs out to Obsidian

This is my workflow to process annotations from PDFs out to Obsidian. The key batch translations that I will post from my workflow are

  • Storing annotations from the PDF as markdown text bundles including asset images
  • Re-exposing tags that may have otherwise been hidden in annotation notes
  • Re-setting the folder and file names in the textbundle to be viewable in Obsidian

I am annotating with Bookends on iPadOS. The processing is independent of this first choice. You must however have Highlights on macOS for the first step. As a heads up, you must (at least for now) also have BBEdit on macOS for the second step.

This post is about the first step. Follow ups (perhaps not immediately) will outline the other steps. The recommended process for the first step is to

  • Put your annotated PDFs in a folder on macOS. I tag the annotated PDFs with the tag annotated as a way to find them.
  • Select the annotated PDFs at the Finder level.
  • Run the AppleScript below. The script will cycle through all selected PDFs, opening them in Highlights and exporting the annotations as a markdown textbundle folder to a default folder location.
(*
save annotations in PDF to markdown textbundle using Highlights
2021-07-29
jjw

Instructions
--
* select a set of PDF files at the Finder level
* run this script
--> output is a textbundle folder of annotations from all PDFs selected

Caveats
--
The current version of this script will crash if the default folder already
contains a copy of the annotation textbundle folder.
This uses lots of AppleEvents with delays because Highlights is
entirely and frustratingly unscriptable.
*)

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

-- set the default folder location to save the annotation textbundles
property theDefaultFolder : "/Volumes/Databases/Journal Annotations"
-- quit Highlights at the end?
property quitHighlightsOnEnd : true

on run {}
	set theCount to 1
	-- get the list
	tell application "Finder" to set theSelectedList to the selection as alias list
	if theSelectedList = {} then return
	-- start with Highlights
	tell application "Highlights" to activate
	repeat with theSelection in theSelectedList
		set theFileName to the POSIX path of theSelection as text
		my saveHighlightsAnnotations(theFileName, theCount)
		set theCount to theCount + 1
	end repeat
	if quitHighlightsOnEnd is true then tell application "Highlights" to quit
	return
end run

on saveHighlightsAnnotations(theFileName, theCount)
	tell application "Highlights" to open POSIX file theFileName
	delay 1
	tell application "System Events"
		-- save as textbundle
		keystroke "t" using option down
		delay 1
		-- save to the default location
		if theCount = 1 then
			keystroke "g" using {shift down, command down}
			delay 0.5
			keystroke theDefaultFolder
			delay 0.5
			keystroke return
			delay 0.5
		end if
		-- this next step clicks the SAVE dialog button
		-- the script will crash here if the file already exists
		keystroke return
		delay 1
		-- close the window
		keystroke "w" using command down
	end tell
	return
end saveHighlightsAnnotations

–
JJW

3 Likes

My processing chain is documented below. As a starting point, I …

  • Annotate the PDF in Bookends on iPadOS.
  • Sync the annotated PDF back to Bookends on macOS.

You can use whatever apps you choose for the above steps as long as the annotations are not flattened and the annotated PDF is stored on macOS. I have tagged the annotated PDFs at the Finder level with the tag annotated so that I can select them all at once for the next step.

  • Extract the annotations to a textbundle using the Highlights app on macOS and the script below. The script can be run as a drag + drop applet or it can be invoked from the Scripts menu. You should not create an applet and double-click on it to run, as this approach can play havoc with what is passed as the selection to the on open handler.
(*
extract annotations in PDFs to markdown textbundle folder using Highlights
2021-07-30
jjw

Instructions
--
* select a set of PDFs at the Finder level
* run this script
OR
* drag + drop a set of PDFs onto this script application
--> output is a textbundle folder of annotations from all PDFs selected

Caveats
--
This uses lots of AppleEvents with delays because Highlights is
entirely and frustratingly unscriptable.
*)

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

-- set the default folder location to save the annotation textbundles
property theDefaultFolder : "/Volumes/Databases/Journal Annotations"
-- remove any existings textbundles (default is to suffix with _old)
property removeExisting : false
-- quit Highlights at the end?
property quitHighlightsOnEnd : true
-- display report dialog at end?
property displayReport : true

-- run with preselected set
on run {}
	tell application "Finder" to set theSelectedList to the selection as alias list
	if theSelectedList = {} then return
	my batchProcess(theSelectedList)
end run

-- drag and drop files onto script application
on open FileSets
	my batchProcess(FileSets)
end open

on batchProcess(FileSets)
	tell application "Highlights" to activate
	set theCount to 0
	set isFile to true
	repeat with theSelection in FileSets
		set theFilePathName to the POSIX path of theSelection as text
		try
			tell application "System Events" to set fileExtension to name extension of (theSelection as alias)
		on error
			set isFile to false
		end try
		if isFile is true then
			if ((fileExtension is "pdf") or (fileExtension is "PDF")) then
				my checkforExisting(theFilePathName)
				my extractAnnotationsviaHighlights(theFilePathName, theCount)
				set theCount to theCount + 1
			end if
		end if
	end repeat
	if quitHighlightsOnEnd is true then tell application "Highlights" to quit
	if displayReport then
		tell application "Finder"
			activate
			display alert "Extracted annotations from " & theCount & " PDFs."
		end tell
	end if
	return
end batchProcess

on checkforExisting(theFilePathName)
	set itExists to true
	set theTBFolderPrefix to my extractFileName(theFilePathName)
	set theTBFolderName to theDefaultFolder & "/" & theTBFolderPrefix & ".textbundle"
	try
		POSIX file theTBFolderName as alias
	on error
		set itExists to false
	end try
	if itExists is true then
		if removeExisting is false then
			set theCopyTBFolderName to theDefaultFolder & "/" & theTBFolderPrefix & "_copy.textbundle"
			try
				POSIX file theCopyTBFolderName as alias
				set copyExists to true
			on error
				set copyExists to false
			end try
			if copyExists is true then
				set theCmd to "rm -r " & (the quoted form of theCopyTBFolderName)
				do shell script theCmd
			end if
			set theCmd to "mv " & (the quoted form of the theTBFolderName) & " " & (the quoted form of theCopyTBFolderName)
			do shell script theCmd
		else
			set theCmd to "rm -r " & (the quoted form of the theTBFolderName)
			do shell script theCmd
		end if
	end if
end checkforExisting

on extractFileName(theFilePathName)
	set cTID to text item delimiters
	set text item delimiters to "/"
	set theFolderName to text item -1 of theFilePathName
	set text item delimiters to "."
	set theFileName to text 1 thru text item -2 of theFolderName
	set text item delimiters to cTID
	return theFileName as text
end extractFileName

on extractAnnotationsviaHighlights(theFileName, theCount)
	tell application "Highlights" to open POSIX file theFileName
	delay 1
	tell application "System Events"
		-- save as textbundle
		keystroke "t" using option down
		delay 1
		-- save to the default location
		if theCount = 0 then
			keystroke "g" using {shift down, command down}
			delay 0.5
			keystroke theDefaultFolder
			delay 0.5
			keystroke return
			delay 0.5
		end if
		-- this next step clicks the SAVE dialog button
		-- the script will crash here if the file already exists
		keystroke return
		delay 1
		-- close the window
		keystroke "w" using command down
	end tell
	return
end extractAnnotationsviaHighlights

The script will extract all forms of annotations, including “picture” annotations. Here is a snapshot example of an annotation that I made using Bookends on iPadOS that is extracted in the assets folder of the textbundle folder.

example

  • I use #hashtag notations in the note fields of annotations. Highlights extracts the #hashtags INSIDE the URL as ![#result a graph of radius vs time](… url link to figure …). This leaves them hidden from any markdown editor. The AppleScript below exposes these “hidden” #hashtags. It requires BBEdit on macOS. I welcome any help to convert this to use sed at the OS level (I am baffled by what to use for the proper escape sequence to capture the required closing brackets). You use this by selecting the text.markdown files inside the .textbundle folder created in the above step.
(*
expose hidden hashtags
version 2021-07-29
author jjw
*)

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

-- should BBEdit quit when done?
property quiteBBEditonEnd : true
-- display report dialog at end?
property displayReport : true

-- grep strings for BBEdit
-- DO NOT CHANGE
property BBEgrepString : "!\\[(#\\S*) (.*)\\]"
property BBEreplaceString : "\\1 \\2 !\\[\\2\\]"
(*
property sedgrepString : "s/![\\(#[:alpha:]*\\) \\([:alpha:]*\\)\\]"
property sedreplaceString : "/\\1 \\2 ![\\2\\]"
*)

-- run with preselected set
on run {}
	tell application "Finder" to set theSelectedList to the selection as alias list
	if theSelectedList = {} then return
	my extractTagsinFileswBBE(theSelectedList)
end run

-- drag and drop files onto script application
on open FileSets
	my extractTagsinFileswBBE(FileSets)
end open

on extractTagsinFileswBBE(FileSets)
	tell application "BBEdit" to activate
	set theCount to 0
	repeat with theSelection in FileSets
		set theFilePathName to POSIX path of theSelection as text
		tell application "BBEdit"
			open POSIX file theFilePathName
			delay 0.5
			tell text of front text window to replace BBEgrepString using BBEreplaceString options {starting at top:true, search mode:grep}
			save active document of front window
			close front window
		end tell
	end repeat
	if quiteBBEditonEnd then tell application "BBEdit" to quit
	return
end extractTagsinFileswBBE

(*
on extractTagsinFilewShell(theFileName)
	set theCMD to "sed " & the quoted form of (grepString & replaceString) & " < " & the quoted form of theFileName
	do shell script theCMD
	return
end extractTagsinFilewShell
*)
  • As a final step (this can also be the second step), I convert the .textbundle folders format to plain folders. This exposes the internals to both Devonthink and Obsidian. I use the script below. You use this by selecting the .textbundle folder (unlike the above script where you select the text.markdown file).
(*
convert textbundle folder to regular markdown folder
version 2021-07-29
author jjw
*)

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

-- define a prefix to rename the text.markdown markdown file
property AnnotationFilePrefix : "Annotations_"
-- remove textbundle folder after converting
property removeTBFolder : true

property theMDFileName : "text.markdown"

on run {}
	-- get the list
	tell application "Finder" to set theSelectedFolderList to the selection as alias list
	if theSelectedFolderList = {} then return
	my convertBatchFolders(theSelectedFolderList)
end run

on open theFolderList
	my convertBatchFolders(theFolderList)
end open

on performSmartRule(theRecords)
	set theCount to my convertBatchFoldersDT(theRecords)
	display alert "Successfully converted " & theCount & " records from textbundles to regular folders."
end performSmartRule

on convertBatchFolders(theFolderList)
	set theRootPath to ""
	set theName to ""
	repeat with theFolder in theFolderList
		set theFilePath to POSIX path of theFolder
		set {theRootPath, theName} to my getBaseNames(theFilePath)
		
		-- convert the text.markdown file name
		set thecurrentMDFilePathName to theFilePath & theMDFileName
		set thedesiredMDFilePathName to theFilePath & theName & ".md"
		set theCmd to "mv " & (the quoted form of thecurrentMDFilePathName) & " " & (the quoted form of thedesiredMDFilePathName)
		do shell script theCmd
		
		-- convert the .textbundle folder name
		set theNewMDFolderName to theRootPath & "/" & theName
		set theCmd to "rsync -av " & (the quoted form of theFilePath) & " " & (the quoted form of theNewMDFolderName)
		do shell script theCmd
		
		-- remove folder?
		if removeTBFolder is true then
			set theCmd to "rm -r " & (the quoted form of the theFilePath)
			do shell script theCmd
		end if
	end repeat
	return
end convertBatchFolders

on getBaseNames(FullPathName)
	set cTID to text item delimiters
	set text item delimiters to {"/"}
	if (FullPathName ends with "/") then
		set baseName to text item -2 of FullPathName
		set rootName to text 1 thru text item -3 of FullPathName
	else
		set baseName to last text item of FullPathName
		set rootName to text 1 thru text item -2 of FullPathName
	end if
	if (baseName contains ".") then
		set text item delimiters to {"."}
		set nameWithoutExtension to text 1 thru text item -2 of baseName as text
	else
		set nameWithoutExtension to baseName as text
	end if
	set text item delimiters to cTID
	return {rootName, nameWithoutExtension}
end getBaseNames

I apologize that this is a long post with a lot of code. I’d consider a more formal posting approach (e.g. GitHub or a public Dropbox) if interest dictates. I am not as proficient in such methods at the moment.

Hope this information has some benefits to some folks.

Enjoy!

–
JJW

2 Likes

Thank you for this workflow. Very helpful and a needed one for Bookends!

1 Like