Tuesday, August 20, 2024

For Almost Any Need There is Something Open Source

The problem.   I have multiple copies of some very big PDFs (usually primary sources of 80-100MB).  Yes, vast quantities of antique laws.  Even on a 2 TB drive they gobble up space.   When backing up to OneDrive this is even worse and slower to back up. I am also reaching the 1TB limit on OneDrive.  There are also many files on my OneDrive that were copied several times and have slightly different file names.  Cleaning those up will also be useful.

I was thinking,  it would be so useful to have a program that identifies files that are duplicated even if the names are different.  Even better,  if it would delete the older version and link the newer version to the older version's directory.   This reduces extra space without losing the association with existing projects. 

https://dupeguru.voltaicideas.net/ seems like it may do that job, at least in part.  If it lets me delete the duplicate that is not in my primary sources directory that simplifies finding them.

Full backup first, of course. 


  1. My problem with such programs (and I've tried a few, notably doublekiller and duplicatefilefinder is that they will match files based on size and extension without much issue. But, I still have to go through each one and decide which to keep: I've not found a way to widely delete based on the path. So, where I have multiple 'backup' drives, it gets quite tedious.

    I suppose I should get a wireless NAS or something, dump EVERYTHING there and sort from there. Or I can hire a bright teenager or to do it for me.

    1. Figuring out which one to delete is annoying but many were duplicated in the same two directories. This was easy. OneDrive will be a challenge.
