Musings on storing information

svsmailus · February 16, 2022, 10:27am

I have spent many years using technology and seen quite a lot of innovation over those years. I suppose an advantage of years of use is that it helps define what you want and trends tend to affect you less and less.

I used to store everything and have databases full of stuff I hardly ever look at. The irony of a number of attempts at creating a PKM system is that they ended up being a mini Internet and when I needed information out of it, it was often still quicker to find it online rather than in my own PKM. I’ve used Zettelkasten and concluded it’s not really my cup of tea. I’m not really the linking type. I’m not interested in “my future self”. Everyone is telling me to store stuff for my future self. Well, I like to be present with my current self and in the moment now. I’m not interested in journaling all activities and events in my life, especially on a computer. It just amounts to more textual noise. In 200 years time when everyone has Facebooked, Instagrammed and micro-blogged their every waking moment, very few will really be interested. It won’t really be a record of history, because that’s already being stored by Google. Too much time is wasted storing up for tomorrow and wasting today.

I have a keen interest in using my current brain rather than a second brain. Lynne Kelly’s book, “Memory Craft” is well worth a read. The point is that writing everything down is actually not healthy for your brain. You need to exercise it. I’ve been amazed how by creating a visual alphabet I can remember 26 things with ease without needing to write them down. You’d be surprised how useful that actually is throughout the day. I rarely need a notebook to write thing down that I need to remember. I’m also beginning to store info in my brain with memory palaces and surprised how well it works.

So, I’ve moved from storing everything to being extremely selective. The closest to how I store information is Tiago Forte’s, “Progressive Summarization” I only store what is needful for projects or what stands out to me. There are features in different apps that I’ve used over the years that I have found extremely helpful. This leads me to what would be and ideal app for storing information.

Two things are required.

Search. Now by search I am not referring to incremental search, but filtered search. Filtered search only shows the lines in the data being searched that meet your search query. Something like the veritable Silver Searcher which is just awesome. I cannot understand why so many apps only do incremental search, this is a real annoyance. There also needs to be a facility to click on the relevant search entry and be taken straight to that line.
Meta Data. Every bit of information I possess has meta data. Things like date, phone numbers, addresses, topic, types, etc. I need something that stores this separately and can be searched separately. It must allow me to create my own meta data types. The best app that I’ve used that does this is Tinderbox.

If I could find an app on macOS and iOS that does these two things well, I’d be a happy bunny. A database is pretty close to this. I’ve been thinking about setting up an SQL database to try this, but iOS is still currently a problem. Another solution is Tiddlywiki with Quine on iOS, but requires some maintenance and setup.

anon41602260 · February 16, 2022, 1:12pm

I think “storing only what is needful” (perhaps “needed”?) is a useful admonition, but not always practical. Some things are obvious: a genealogical researcher would want to store references and records related to known ancestral names. But the researcher might ignore neighbors in a census record and miss out on a formerly-unknown set of relatives. Thus, sometimes what is “needed” is not obvious. I think this leads to scrapclipping articles and links “just in case” – which is related to the second half of the admonition to store “what stands out to me”. How to throttle the impulses. Slippery slopes are always just around the corner.

I’m not sure I understand the requirement for software that both searches well and stores metadata - unless the app sits on top of a database (like DEVONthink – which supports custom metadata, by the way). Just what is being searched? The web in total? A group of sites? Your computer and attached drives? Online databases?

svsmailus · February 16, 2022, 1:20pm

Only the notes I write or data I save.

Thanks for that, forgot it did that!

Katie · February 19, 2022, 2:33am

I am struck by how little information I use to need, pre computer fascination.

dealtek · February 19, 2022, 4:24pm

Your points are a good wake up call for me to review the stuff I have saved - much of it also low priority and not needed.

tomalmy · February 19, 2022, 5:20pm

Well I went for the save everything route. While 99.9% or more of the stuff I’ll never need or look for, occasionally something from the distant past is useful. And disk space is so cheap these days! It also humbles oneself to realize that an entire life’s work will fit such a small space.

webwalrus · February 19, 2022, 6:17pm

Is that the sort of book that uses visuals and such, so that it’s best in print? Or do you think the audio version would work just as well?

neonate · February 19, 2022, 8:06pm

A few years back when I was getting ready to retire from my primary career, I decided to go through all of my digital work files and home files and put everything in order. I started doing this, then realized it was an overwhelming task and 99.9% would no longer be useful to me (as you mentioned). I made 2 secured sparsebundles on my Mac, “Home Archives” and “Work Archives”. I kept a few important files on my Mac, then filed the others into the appropriate sparsebundles. I saved each sparsebundle onto a drive, “Archives”, and deleted them from my Mac. I filed everything “as is”; I didn’t bother about sorting and creating subfolders, etc. In the nine years since I’ve done this, I’ve gone into “Work” Archives” to retrieve a few files about 3-4 times (I do some consulting). I don’t remember ever looking at the “Home Archives” other than when adding more stuff. I do keep a backup of the “Archives” disk, but both the original and the backup are kept in my home. Neither are stored in the cloud because I may have some sensitive things in there (sensitive but unimportant). My thoughts are: If I need something it may be there, but if “Archives” were to suddenly disappear, no big loss. When I quit consulting, I will delete “Work Archives” on day one. My instructions to family in the case of my untimely demise, is to delete “Work Archives”.

fuzzygel · February 19, 2022, 10:50pm

@svsmailus , there are a lot of good points you raised. I enjoy reading your reference to memory craft but I am a bit of difficulty connecting your thoughts on memory to search to meta data, etc. I may have to re-read the post as it contains a lot of information.

@neonate . I am going through almost the same process.

I retired from full time work 2 years ago. I have started the journey of creating a digital asset or legacy manual for myself and for my family. For my family, I am creating step by step instructions on how to access, manage and use my digital asset, from 1password vault to my collection of databases (DEVONthink , Eaglefiler, etc) , photos, financial stuff, email accounts, software, etc.

I guess my target audience of my documentation may be different from what is mentioned on this post. They are for my family, so I need to write and present in a way that they can understand. They do not have the level of IT and computing knowledge I have. This is an important point for me. If the documentation is just for myself, I can choose any app or any format I like that suits me.

I am still trying to find the happy medium that can do both, but I might have to do this twice, once for myself, one for my family.

To me the key issue is not just storing information , but to make it retrievable and useable (to the target audience), otherwise, what is the point

svsmailus · February 20, 2022, 10:32am

There are some visuals that are helpful (I’m using the kindle version). However, it does focus on helping you understand the principles and applying them yourself. Ultimately you need mental images that work for you. What amazes me is how well it works. People keep telling me you’ll forget stuff unless you write it down and this is simply untrue. There is a danger that writing causes us to dumb down the potential of our brains. Building a second brain could reduce the effectiveness of your actual brain.

I’m talking about two different things. When I do store information I want meta as a separate criteria. This is because meta can help narrow down content. The second thing was to do with memory. I’m learning just how powerful our minds are and how well they can remember things.

There in lies the rub. Personally I believe the future will have search engines that will retrieve data easily and effectively across file formats. In many ways Google’s search ability is a phenomenon. I use Foxtrot Pro on my mac and it is very good. The barrier is usually different file formats. What if file formats are no longer an issue in ten years time? In fact if all my data was in plain text there would be no barrier now.

In focusing on the future user finding what they need from my data I need to collect it intelligently. (This is of course different from building a second brain which has a different purpose.) Helping people find useful information in my data if I have been collecting for fifty years requires meta data. That is data such as date, when was something recorded? Type, what kind of information is recorded? The meta can of course be embedded in plain text, but needs something to make it distinguishable from the main content. The reason I believe meta is vital is that if a search engine can actually find everything and you are searching through fifty years of data you may well end up with what you want alongside thousands of entries you don’t. You need some way to narrow down the search and meta makes that easier. You could select the year or a date range that you know it was written, or that it was an event or an email. The whole point is to narrow down until you can see what you need.

I have for the past 10+ used a format that adds this data to every filename of every file I create. Something like “20220220-major category-minor category-description.extension”. So if I create a logo for companyX today, the filename would look something like this: “20220220-CompanyX-logo-rebranded logo.png”. Finding this later would be a matter of first searching for all images that have “2022”, “CompanyX” and “logo” in the title. This would narrow down the search result significantly to find the file I’m looking for. It pretty much works for me. The challenge is being able to add this meta data to all information I store. In plaintext a YAML header would work or symbols such as #, @, § and the like to designate meta data. You would then couple that with a filtered search engine and should be good to go.

OogieM · February 20, 2022, 1:20pm

As a strong proponent of rich and varied personal data storage systems I would say that if you can find something faster in the internet than you can in your own curated set of info you didn’t properly organize it for how you think.

Those are not mutually exclusive needs. As a veteran of many computer and data storage changes (over 45 years as a programmer) I’ve seen the need for things now that I am very glad I saved and organized decades ago. Just this week I was able to find a refence regarding a genetic trait and the inheritance of that trait that was originally documented in the 1920’s. I have that original paper. I saved the reference to that from a later paper published in the late 1970’s. That combined with newer research is going to modify the structure of a research project I am designing now. Because of this post I decided to see if I could find the same info on the internet. The answer was that the original paper from the 1920’s is not available and neither is the one from the 1970’s in an interent form. I can purchase a hard copy version of the more recent paper and I found one library that has the original paper copies of the journal that published the earlier paper but they will not lend those items out and they are several states away. But I have both a copy of the original old paper and one of the newer paper in my reference system. I had no clue I might need to use that stuff now, it just was something I was interested in so I saved the info once I had it. So I see great value in saving and organizing stuff from the past because that is how I move forward into the future.

Following on to the search requirement about what is searched. If you’ve filed it appropriately you never need to search.

Take my recent case. I now that I sve and store information on various genetic traits. And I remembered that I used that info in collecte in the 1970’s SO I went to the place on my file systme where that info was stored and found the 1970’s paper and my notes. That had the link back to the 1920’s paper.

Now in my case the data and links were not yet in my overall tool so I took an extra few minutes to add those links into Obsidian and also placed links into more areas that related to that subject matter thus creating my own web of ways to get back to the original info no matter how I remember it next time I need it.

I’ve been redoing my data storage system. What Iv’e found is that I did have duplicated files, typically research papers or notes located in different places and my biggest gain has been collating and consoloditing those notes. But I do reference many of the items on a regular basis.Especially the much older ones. I am less likely to need the stuff from the middle timefame due to the subject matter. At lesat that’s how I think now but I am not deleting it because who knows what Ill need again in another few years.

This is so important ad an ongoing task. Mostly to keep it updated.

Which brings up the point of who is your target audience?

In my case I have several target audiences, myself in the future (stuff I will be working on at a later date), my heirs and family (personal and genealogical info), the local community (the historical archive about our town and farming here), the future breeders of CMK horses and Black Welsh Mountain sheep (which could be world wide) and the future historians wanting to know what life was like here and now (think British Library of 3022). So I need tomake sure that all those bases are covered with my system.

anon41602260 · February 20, 2022, 4:38pm

Speaking just for my current self – I’m pretty sure my future self will be dead, sooner than later. So, yeah, keeping things for that fellow is a waste of time.

svsmailus · February 20, 2022, 4:47pm

I would agree with you to a degree, but ultimately when the same information is stored on a desktop computer and on the internet, google’s search is going to find it faster. It’s not that I could not find it, but that google is faster.

Now this is of real interest to me. Google and other search engines and the advances of search on desktops are a testimony that for many this is not the case. In fact, search engines are the entry point for 99% of the people to accessing data on the internet.

I would love to hear your approach to filing and how your store your data?

Can you explain your system?

I’d probably argue that the British Library of 3022 would have so much data stored that just creating an index would denude the world of more resources than bitcoin mining, let alone living long enough to read and search it all. Plus determining what life was really like, rather than people’s online lives and opinions might make the whole thing a moot point.

webwalrus · February 20, 2022, 5:52pm

I would generally agree with @OogieM, and note that part of “being able to be present … in the moment now” is the confidence that Future Self is appropriately managed.

I would submit that everybody who exists in society is, by virtue of participating in a society, interested in their future self. It’s not a matter of whether, but the degree.

For example, I just about guarantee you don’t throw out your notices of doctor appointments because that only applies to Future Self - you do, in the present moment, what you’ve decided is appropriate to set Future Self up for success. You record the appointment in a calendar. Maybe you keep a file of paperwork that Future Self is going to need to discuss treatment options with the doctor. If you skip this stuff, when Future Self becomes Present Self, Present Self is going to be miserable.

The questions, to me, mostly revolve around how much it’s worth sending forward to Future Self.

For me, if my girlfriend mentions that she’d really like something, there’s probably high value in noting that for a future gift idea - even if that idea is for Future Self 9 months from now.

But recording every last scrap of research that led me to a decision? Probably not. If it’s an important decision, I might want some light notes about the decision and maybe important stuff to keep me from falling into a trap in the future.

I agree in principle, but not in practice. Google can find things very quickly, if they exist to be found. But there’s this adage, “what we post online is forever”, and NO, no, most emphatically no, it’s not.

Information - important information - disappears from the Internet every day. I had a server procedure that I found detailed very nicely in a helpful article one day, so I bookmarked it. A year later, that site was gone. The Internet can’t be trusted as a long-term storage mechanism.

If the information is important to you - particularly if you’re doing something esoteric like breeding a particular variety of sheep - you had better have the information on hand locally if it’s important. That can be in a digital form, but in your digital archives rather than out on the ephemeral Internet.

That, of course, assumes that we don’t have massive advances in data technology between now and then. Go back 30 years and tell Past Self that in 30 years you’ll be able to fit entire seasons of high-quality TV shows on a little square plastic thingy that goes in your pocket, and see what they say.

The thing about what Oogie is doing is that if her particular variety of sheep still exists in 1,000 years, what she’s doing now might actually be relevant. Right now she’s building on the research of people that are probably long dead, and in another 50 years you never know - somebody may be reading her papers.

It’s not about information hoarding, but rather about intentional curation - both for Future Self and for future generations.

OogieM · February 20, 2022, 8:42pm

No, No, No! To find what I needed about the sheep took less than 2 minutes. (Yes, I’ve been timing things for my own amusement and education. Deciding what workflows need optimization vs what is good enough)

Just searching in Google to discover that I could NOT find it or could not get a copy from the one place that had 1 piece took me 20 minutes this morning.

Correct but that’s irrelevant. On the internet you have no control over how anyone files anything so if course search is the only way to get to the data. But there is no reason to search in your own curated file system if youv’e done it right.

Flat folder structure with long descriptive folder and file names that include dates where possible and use system agnostic characters for the names. Specific “buckets” for specific types of data. Minimize file types and stick to standards where possible. Multiple backup copies on multiple storage media in multple places in multiple geographically separated locations. Hard copy backup of top mevel acces and structure info in several places and with several trusted people. Readna d verify all storage media/locations/fles on an appropriate schedule based on storage medium and file type. Plan for upgrading file types and structures with each new technology improvement, operating system or hardware upgrade.

Example, just my own archives have gone from 8 inch floppy drives, to 5.25 to 3.5 to bernoulli, to WORM, to CD-ROM to BlueRay. Which BTW at this year’s verification of the CD-ROM and BlueRay disks I had MANY unreadable. These were stored properly with environmental controls and were archive rated media but the data are gone. Fortunately that was only 1 of my several places the same data are stored.

And back in the day there was no one who thought the world needed more than 10 computers total.

Have you looked at the DNA data storage research lately? After all nature can store instructions to create life in a very small physical space and it covers replication too (albeit with errors but that’s why some species have such long repeats of critical sequences) I believe the largest genome known right now belongs to a plant. I don’t remember which one but it’s in Japan and has approximately 150 billion base pairs. Lungfish I think are still the largest animal genome with about 140 billion base pairs. And that amount of data is stored in something that is measured in picograms. Imagine a kilogram of stable DNA storage!

Most big collections (Smithsonian, British Library, Library of Congress, etc) have plans that cover several hundred years at a minimum.

YES!

Heck even archival data stoage devices can’t be trusted either.

And we circle back to the British Library. I’m going back in the wonderful digitized manuscrip records to locate Abbey records from the 1200’s that talk about sheep that are the precursors to mine. And with some DNA analysis comparing modern sheep to archeological digs at the dawn of domestication we can trace lineages back for thousands of years. It used to be that everyone thought dogs were domesticated about 10-12K years ago.Now the best guess is somewhere around 13-14K years ago and sheep, originally taught as domesticated about 9Kyears ago now have been traced back to around 10-12K years.

I hope so. My job now is to make sure the data are accessible, readable and protected from harm.

And that quote is now going into my PKM as part of my set of notes on Personal Inspiration. The place I go to read stuff when I am discouraged about what I am doing.

Thank You!

ThatNerd · February 20, 2022, 9:17pm

You can find what you need on the internet for the most part, if you remember having it in the first place. I’ve started many a project after rediscovering a saved article with a few bullet-point thoughts.

webwalrus · February 21, 2022, 4:42pm

Also…DNA is quaternary rather than binary, so each strand could theoretically encode way more data per “bit” than a purely digital storage medium.

The more I think about this, the more I tend to think that the type of data storage that’s valuable can be basically described as “what’s necessary to pick up one’s line of thinking again”, or the knowledge to allow somebody else to do that.

If, for example, you were studying something like the effects of overfeeding / underfeeding on sheep breeding rates, that could logically generate a ton of data, both pre- and post-study. But all that might strictly need to be communicated to Future Self or Future Generation is “varied feeding between X and Y grams of Z feed, and discovered that as feed volume goes up, breeding chances increase and offspring size & health increase as well”.

For me, I’ve found that future notes need to communicate a bit more than I think is necessary, but not nearly as much as Data Hoarder Self might otherwise be tempted to store.

“Pick up sour cream at store” is a context-free task. Not in a GTD sense (“at store” is obviously a context), but in a mental sense. Even for something as simple as sour cream, there’s a mental context. Taco night might require actual sour cream. Something like a dip recipe for a party might be able to use cream cheese instead. If somebody else is handing it off to me - or if I’ve forgotten why I added it to the list - I’d need more info to decide what to do if the store is out of sour cream. Just a little bit of extra context can be very helpful.

Future Self (and by extension and amplification, Future Generations) is not so bright sometimes. I can count on my general base of knowledge (I don’t have to describe how to boot up my Mac), but disentangling the bits of thought from one another and figuring out which are effectively “stored procedures” at this point and which are specialized knowledge relevant to this project is a continual challenge.

That’s what needs to be stored - the minimum additive knowledge that will restore Future Self or Future Generation to the point of being able to comprehend your current line of thinking. And finding the balance is a continual challenge.

nationalinterest · February 21, 2022, 5:18pm

Brilliant. Exactly right. The balance is difficult, although given the inexpensive nature of storage, keeping our “workings” is possible too.

So I will write a note on a topic, no more than a page of A5. But that can include references to copies of original books and articles (or spreadsheets) so future self can expand, refresh and reassess, perhaps with new insight. I just need a good referencing system and a searchable/indexed repository. Ideally that would be two way process - so if it were to look in the article/book/spreadsheet it would also point me to my note(s), reminding me how I used the information therein.

(Thinking aloud here… my PKM is work in progress; hoping to have it figured out before I meet my Maker, although it could be my eternity project.)

webwalrus · February 21, 2022, 11:51pm

Definitely. The thing I’ve seen with many people though is something along the lines of “there’s too much stuff! I’m just going to throw out everything!” Which is obviously not useful.

This might be why David Allen recommends a yearly sift through everything, with a purge of stuff that’s no longer necessary or useful. Sure, you can store everything - but it’s worth it to be mindful of that as well.

ryanjamurphy · February 22, 2022, 5:12pm

On the topic of DNA-as-storage-medium: