Oogie's (Black Sheep Shepherdess) PKM Dissertation (LONG!)

OogieM · December 18, 2020, 1:17am

I’ve promised that I’d start to explain my entire path toward my own Personal Knowledge Management system.

Part 1: What is the Problem?

So my whole take on the Personal Knowledge Management (PKM) craze is that I need to build a set of applications, procedures and tools that allow me to reach my end goal:

"I use and maintain an ongoing archive and database of useful information based on my interests that is resilient in the face of technology changes. My data are linked in ways that both allow me to uncover hidden relationships and also allow for greater understanding of the problems I am using the system to work on. My workflow is easy to implement with distinct steps for each class of information that it contains. I can add new items to it easily without significant extra investments in time. I accomplish this by using workflows for different types of information with clear boundaries for what goes where. This well defined system allows me to quickly and easily enter, edit and link information within my archive. This system will be implemented so that I have incorporated my current archive of information and it will allow for a new information to be added and the entire archive to grow as my needs and interest areas change.

The resulting database of ideas, inspiration, reference and my own thoughts will help me do more creative work, improve my ability to think in a critical fashion and help me achieve my long term goals especially the ones related to AnimalTrakker/LambTracker, Regenerative Agriculture, Genetic Diversity of the North American Welsh Mountain Sheep population, History of the North Fork, and my personal interests including textiles both ancient and modern, family genealogy and the large photo archive and catalog."

I recognize that building this archive will take time and that upgrading my current reference system to use new tools will also take time but the goal is a clean and concise system.

One of the first tasks I had is to identify the major areas or pain points with my current system.

My archive consists of a number of source types:

Personal digital notes in plain text or rich text format (and perhaps moving to markdown in future). These can be located in DEVONThink or as plain and rich text files on one or more internal or external hard drives and our NAS server.
Personal notes handwritten on paper. Often filed in my paper filing cabinet. Occasionally scanned into PDF files.
Personal handwritten notes on my iPad. In GoodNotes format.
Larger files, blog posts and other medium and long form writing I have created. Mostly in Scrivener but some in Libre Office.
Libre Office files, including write, calculate, presentation and database formats. I do have a few powerpoint presentations as references from other people but LibreOffice can read them and I treat those sources as the same. Ditto for any other MS Office files.
Scientific research papers and books that are in PDF format. These typically have a DOI number but not always.
Various Thesis papers from researchers. Usually in PDF format but some in Word. These typically do not have a DOI number.
PDF Files that are more generic and not strictly scientific papers or references. Most of them do not have a DOI number.
Emails either to or from me. I use standard Apple Mail These are currently archived into a separate DEVONThink database on a rolling calendar system. Newest emails are kept in Apple Mail and as they age out they get moved to the DT Archive.
Web pages, sometimes just the links, sometimes captured data. These may include images either embedded or as separate image files.
Kindle books purchased from Amazon with both highlights and notes in those books. I have over 1000 kindle books in my Amazon library
Kindle books from either open source sources like Gutenberg or other publishers like Manning or Take Control Books or even individual authors like Kourosh Dini and others. I have over 800 of these types of Kindle books.
Kindle books that I have borrowed, read, sometimes made notes on, and returned. I have about 200 of these.
Paper books. Many of these are old and do not have ISBN numbers. Some have annotations written in them not by me but by other people. Some have notes about them that I created in one of my above formats, handwritten, scanned paper, on paper, in short computer files etc. There are over 3000 paper books in my personal library.
Other images like digital format pictures and digitized analog pictures are being handled separately and will not be discussed here except where such a file has as a part of the information one or more notes or comments about it or it is an illustration or figure for a reference item described above.

My current system of archiving consists of both paper and digital formats. My paper system consists of 5 large paper filing cabinets and 15+ bankers boxes full of paper files and notes. My digital file cabinet is things sorted in a shallow layer of folders by subject in one of two locations. The digital filing system is approximately 400GB on my main machine and approximately 750GB on another machine.

As my archive has grown it has become harder and harder to locate ancillary sources. While my filing systems, both paper and digital are good for finding things by major subject it falls down when searching for things with numerous minor subjects or multiple major subjects when I can’t remember what I originally considered the primary subject. The scientific papers specifically are a jumbled mess because I was not consistent in renaming them when I pulled PDFs down and the names are not human readable. I have no way to correlate papers written by the same author unless I happened to create a note that mentioned both papers and the authors in it. There is no linking of related papers and in spite of DEVONThink’s see also AI tools they are not really sufficient to cover the variety of sources I have.

So I have a complex set of inputs and a need for a number of different systesm to handle them. Up next, what I decided to tackle First.

OogieM · December 19, 2020, 11:12pm

Part 2: Scientific Paper App stack and Workflow

The first major area I decided to work on is the whole scientific paper stack. I had originally estimated that I had about 100 curated reference papers scattered on my hard drives. That was a total miscalculation. It’s more like 900 of them.

No wonder I couldn’t seem to find specific ones I remember reading when I looked for information!

I also discovered that I had duplicates filed in different top level folders but with different filenames. This gave me an indication that the linking of papers by subject was going to prove helpful. I do not yet know how many duplicates I will end up with but so far in the first 150 papers I found about 6 duplicates. Duplicate removal is not likely to change the overall numbers of papers I want to reference significantly.

My needs for this part of my PKM are: (Hopefully no longer a code block)

Easy intake of both existing and new scientific papers, theses, articles and books. General informational references are not being including in this first pass. What goes into this system is something that can be used as a formal citation in a scientific paper or a book.
Capture of any existing notes or annotations about those items in a digital format. That means eliminating any paper notes by finding ways to digitize them, ideally with OCR for easy search.
Easy searching of the archive original
Easy searching of my notes and quotes from these sources
Linking between both major and minor subjects and all authors and dates
System must be suitable for creating proper scientific citations for publication
Maintain both an unedited copy of all materials and my annotated copy.

I had never had any bibliographic system in place so that was the first piece to solve. I ended up choosing Zotero. I like that it is open source and that there are a number of tools and add-ons that can expand its capabilities. It works on my systems without need for cloud services except for how to move files to my reading devices and back. I added both Zutilo and Zotfile extensions. Zotfile handles the back and forth to my iPad. I am just now learning how I can use Zutilo for shortcuts and other actions.
Next task was to standardize on a good PDF reader that could export highlights and annotations or notes in a format I could use elsewhere. It needed to be available on both Mac and iPad. It needed to support annotations and highlights on iPad using the Apple Pencil. It needed to easily export those annotations and quotes from the iPad onto my Mac for further integration into my system in a variety of formats as I was not sure what I would end up using in my reference system.

I use PDF Expert as my general purpose PDF reader on the Mac and I used GoodReader as my general purpose PDF reader on iPad. However both of those packages failed in this specific application. PDF Expert can only export notes and highlights as html from the iPad and uses email. That makes it much more difficult to include those annotations into anything else. It takes too many steps and results in a format that is not what I can use later easily. GoodReader also only uses email for output of annotations and suffers from the same problems as PDF Expert. I started looking at many other tools. Short form of what I found is here:

HMM First attempt at entereing in my spreadsheet failed miserably. The apps I looked at included PDF Expert, GoodReader, Highlights, Liquid text, Adobe reader, PDF Reader, iAnnotate, PDV View and Markup.
From @ChrisUpchurch this nifty table (I Hope)

App	save highlights	save notes	export annotations and export formats	cost
PDF Expert	yes	yes	yes, file xfer and email html	$49.99/yr
GoodReader	yes	yes	yes, email mail message	$79.99/yr Pro
Highlights	yes	yes	yes email, airdrop html or direct to some apps webarchive markdown covers iOS and Mac rtf pdf	$2.99/mo $24.99./yr
Liquid Text	yes	yes	? MS Word ?		I gave up on it almost immediately, too confusing
Adobe Reader	yes	yes	yes Adobe FDF & FDX	$15/mo
PDF Reader	yes	yes	yes email, airdrop jpg	$89.95
iAnnotate	yes	yes	yes email, airdrop txt	$9.99
PDF View	yes	yes	no		Eliminated
Markup	yes	yes	email txt annotated PDF	$3.33/mo

I’d love corrections and additions from others as I didn’t always buy the app to test it out and some features are crippled on the trial versions.

My final selection was Highlights. It’s a bit quirky to get comfortable with using the pen but it offers the critical choices for export. A big plus is you can set different highlight colors that can be interpreted as specific types of markdown in the export. I’m still exploring this but it’s a very powerful feature.
As everyone knows I am very cloud adverse. I needed a sync solution of some sort to move PDF files from Zotero, onto the iPad for annotation and back again. My preference would be a WebDAV implementation and that is something that Zotero would support. Their only requirement for transfer to and from the iPad is a folder that the iPad has access to. The gotcha seems to be the iPad side. For now I have set up a dropbox folder for PDFs in transit but I am looking at replacing it with a more private solution over time.

Next up is what to do with those annotations. They consist of quotes or highlighted sections and personal notes. Initially I thought I’d just save them out in DEVONThink as text files and then set up linking within DT. It’s doable but takes a lot of extra steps and DT isn’t the easiest package to use for that sort of linking yet. Obsidian with its really useful graphical view seemed to offer more than DT could although missing in some areas. I decided to combine the two. All my annotations are saved in markdown format and get put into an Obsidian vault. That vault is indexed in DEVONThink. So I get both the graphical view of Obsidian and the additional searching of DT.

I also wanted to automate as much as I could in the whole system so I set up a Hazel rule to take .md files from my downloads folder and automatically move them into my Obsidian vault for further processing. I also use DEVONThink to index the Obsidian vault to take advantage of the superior searching in DT.
So here is how I use this stack; I have 2 basic scenarios:

New Scientific Paper Workflow

I find or hear about a potential paper that might be of interest to me. A recent example is I got an email about a paper that looks interesting. When I process my email I know that it’s a less than 2 minute action to decide whether to at least take a deeper look at this paper. So I click on the link in email, and then go to the paper and view it. If I decide it’s something that I might want to keep as reference I open Zotero and using add items by identifier I look for the doi number of the paper and enter it. If all goes well the file is located, downloaded, automatically renamed according to my Zotfile rules and moved onto our NAS server in the folder Oogie_Research_PDF_Files. In Zotero it starts out in the unfiled section. I can quickly enter in a bunch of papers here and move on.

Here are my Zotfile preferences.

When I am ready to actually work on annotating these papers I again go into Zotero and use Zotfile to send selected papers to tablet. They get moved into the dropbox folder Research_Share_PDF for me to pick up there. On my iPad I open up the Highlights app and navigate to that folder in dropbox to pull up a paper to read and review. As I am reading I use both highlights and notes to document the important parts of the paper or things I want to link. So far I am still deciding hat the various highlight colors will mean but I use Yellow for my normal highlights and have been experimenting with tags for some of the other colors. When I am done and ready to return the paper to Zotero I first use the share option from within Highlights to share the notes as markdown files via AirDrop to my iMac. Hazel watches the downloads folder and moves any file with an extension of .md into the Obsidian vault for me into a special location called In Process

Back in Zotero I do a get from tablet and I get back the annotated copy of the document. I then move the top level Zotero item into a group (folder) called Research Papers. That moves the entire thing out of the unfiled items smart search.

In Obsidian I can move the .md notes into my reference folder after doing any editing I need to to add or create additional links between papers.

The holding place at each step along the way is to be sure that what finally makes it into my reference system is actually something worth the effort to keep. If I’m not willing to do all the steps then I don’t need it at all.

If I think the paper is worth saving but after I get it to my iPad and really read it I decide not to keep it I don’t bother to save any notes. Back in Zotero I get from tablet and then delete the entire thing. For me I prefer reading on my iPad so much that I am willing to do the extra steps of moving files to the iPad and back rather than preview them on my Mac. YMMV

End result is a nicely annotated file, a pristine original unannotated file and a set of markdown notes in Zotero and a set of those notes in Obsidian that I can also see in DEVONThink.

Existing Scientific Paper Workflow

I am cleaning out a folder on my hard drive labeled Sheep-Disease_Scrapie and I find a file that is a PDF that looks like it might be a scientific paper. So I open the file and lo and behold it’s a paper titled “Association between PrP genotypes and performance traits in a Welsh Mountain flock.” I am reading it in PDF Expert on my Mac as that is my set app for PDFs on the Mac.

It happens to have a doi number in the paper. First I check to see if I have made any notes or annotations on this copy. The answer is no. This is a paper I want to keep and need to annotate so I know what’s in it. I select the doi number and go to Zotero where I then follow the same steps as if it is a new paper. The reason is when Zotero gets the paper by doi it does a much better job of extracting out the citation info compared to entering it all in by hand.

I do not edit or annotate PDFs on my Mac. I only do that on my iPad. I am more efficient doing it there and I can easily do it when I have a bit of time. I can take my tablet with me and in previous years I’d bring it to the Pub and do annotating and reading scientific papers and books and other stuff while enjoying a beer. Now that’s not happening but I still move to the couch to read so I want it on a portable device for annotation.

At this point I can delete the poorly named original file and move on. I tend to do this sort of task in batches for efficiency.

If the file does not have a doi number then I go ahead and do an import in Zotero using the New Item option and fill out as much of the reference info as I can. Usually I also have to do a rename attachments but not always and I’ve not figured out why it does the automatic rename most of the time but sometimes does not. Once in Zotero it joins the new paper workflow.

So far I’ve been successful at finding duplicates before I pull them into Zotero and the item just gets checked for annotations before going into the trash. So far I have not found any annotations in the PDF files themselves. I have not verified that I do not have any paper notes though. That will be handled when I clean out the paper filing cabinet.

Each thing I decide to keep has a cost associated with it. Not just the obvious costs of the time to properly categorize it, annotate it and link those annotations but the hidden costs of more storage space either physical or digital needed and the cost to convert the item over time as new data formats appear or my system changes. So I want to be sure that what makes it past the final wicket is worth the ongoing effort to maintain it in a useful format. This staged intake of items allows me to work on things a bit at a time and always know what is my next step for any individual item.

In practice this works smoothly and is easy to do but it sure took a long time to write down the steps!

Wrothnie · December 22, 2020, 6:17am

Thanks, Oogie. That was a very interesting explanation.

I am glad I am not the only person in the world who finds LiquidText too confusing to stick with.

As you know, I had also mixed Highlights as it does not have a bookmark function.

I shall have to check out Zotero.

OogieM · December 22, 2020, 1:40pm

I got around that by making quick notes any place I would have put a bookmark. That allows me to quickly page through a PDF like a table of contents in the split view in Highlights.

SuperTachyon · December 22, 2020, 1:50pm

Just some corrections to the table:

GoodReader is $20/year or $80 perpetual license.
LiquidText exports to PDF in addition to Word. It’s likely to be life changing if you dig into it for an hour and watch it’s help section.

nlippman · December 23, 2020, 2:30am

@OogieM

Thanks for providing this excellent in-depth view of your workflow. I hope you are continuing this series!

I’m going to have a look at Zotero; if it can automatically feed journals for me it will be a big help!

JohnAtl · December 23, 2020, 2:45am

You might be interested in research rabbit, which makes suggestions based on references you upload to it.

nlippman · December 23, 2020, 2:51am

@JohnAtl

Thanks, I’ll have a look at that.

My biggest need right now is to get an easy solution to downloading journals that I do not personally subscribe to but one of the hospitals at which I have library privileges has an institutional subscription. Through those subscriptions I have access to the journals, but the process of downloading is tedious as I have to manually click through and save each article I want to read. I am hoping Zotero will allow me to easily scan the TOC for each issue and download the articles I want, and even better automate the feed (right now I just have the TOC emailed to me and dlownlaod when I have time).

OogieM · December 23, 2020, 1:49pm

I don’t think you can do that in Zotero. If the TOC file you get includes all the DOI refrences you can at least do them in batches in Zotero easily.

nlippman · December 23, 2020, 2:27pm

@OogieM

Yes, I learned that when I installed Zotero last night and connected it to an RSS feed for a journal. I can click through to get the journal’s website (actually ClinicalKey, an agregator) but I cannot find a way to enter authorization data and have direct downloads done.

Is it the case that you can have Zotero retrieve a PDF based on its DOI? I did not find a mechanism for that either.

It seems a bit confusing since the source code for Zotero on a quick scan through (it’s a complex app!) suggest it supports direct journal downloading, but I don’t see any way in the UI to make that happen.

Look like I have to keep searching.

OogieM · December 23, 2020, 2:33pm

Yes, that’s how I do all of my imports where I have a DOI number on the Zotero top bar it’s the second little button to the right of the plus one.

Click on it and you get a field where you can type in a DOI, ISBN or other identifier. Then if oyu have Zotfile set up all the rest (renaming, moving to your final reference file loction etc. happens automatically.

Screen Shot 2020-12-23 at 7.32.10 AM

Hope that helps

JohnAtl · December 23, 2020, 7:15pm

Do you have VPN or proxy access through the hospital? That might help with having download privileges through Zotero.

Katie · January 26, 2021, 5:37am

Oogie, Do you still want the charger? I found some bubble wrap and, I think, a box.

I enjoyed your postcards!

I’ve been in and out of the hospital so I don’t come here very often.

KaTie

OogieM · January 26, 2021, 4:10pm

I’d love it if possible. I’m swappong mine around to the places I need them

Katie · February 7, 2021, 6:49pm

Ok. I haven’t forgotten about you! Honest!