For the ~2 people still following this thread, I had a productive weekend writing some scripts in perl to automate the magic for the three hard drives.
The package of scripts operates on a 'ticket' concept. In a directory on my mac, I create a simple text file per memory card that identifies what working directory the card is in, the name of the folder where the card's images get dumped, and the amount of disk space consumed by those images. The file name identifies what "stage" the card is in (1=being ingested to SSD, 2=integrity checkpoint, 3=copy from SSD to HDD1, 4=copy from HDD1 to HDD2, 5=integrity verification on HDD2, 6=whole-drive sync from SSD to HDD1, 7=whole-drive sync from HDD1 to HDD2, 8=whole-drive integrity verify on HDD2) or between (A=between 1 and 2, etc.). As the job works its way through my process, the filename is updated to reflect the current stage.
The first script is 'checkpointer'. Once a minute, checkpointer looks for a ticket in stage A, moves the ticket to stage 1, does the integrity checkpoint on the folder, and moves the ticket to stage B. If there happen to be several tickets in A, oh well, only one gets launched per minute (lots of other stuff to happen, so why rush?).
The second script is 'syncer'. Once a minute, syncer looks at all of the tickets waiting. It tallies up how many jobs are active on HDD1, how many jobs are active on HDD2, and gathers a synopsis of what tickets are in what state. It then goes down a pre-determined priority list to find a job that it can start (it will only do one job; one minute later, another syncer will look for another job that can be done):
if any 'B' tickets and HDD1 is not busy, it'll move an B to 3 and work it, bumping it to a C.
if any 'C' tickets and HDD1/HDD2 are not busy, it'll move a C to 4 and work it, bumping it to a D.
if any 'D' tickets and HDD2 is not busy, it'll move a D to 5 and work it, bumping it to an E.
if any 'E' tickets and HDD1 is not busy, it'll move all Es to 6 and re-sync SSD->HDD1, then bump all 6s to Fs.
if any 'F' tickets and HDD1/HDD2 are not busy, it'll move all Fs to 7 and re-sync HDD1->HDD2, then bump all 7s to Gs.
if any 'G' tickets and HDD2 is not busy, it'll move all Gs to 8 and re-verify all of HDD2, then bump all 8s to Hs.
With this priority scheme, if multiple cards happen to get ingested in semi-rapid fashion, syncer prioritizes getting all of the cards to stage 3 before moving any to stage 4 (and so on), as long as the relative drives aren't busy (I don't want to ask one of these drives to read one file while writing another, or I suspect performance will drop like a rock).
The third script is 'dashboard'. Once per second, dashboard re-scans the directory of tickets, tallies up how many tickets are in each state, and gathers a sum of bytes involved in the tickets for each state. As tickets make their way through stages, I can see how many GB are 'in flight', and have a sense of how long a particular task will take. As the tickets arrive at stage 'E' (meaning they've completed their individual checkpoint/sync/sync/verify and now exist in 3 places), the dashboard adds a new line to display which tickets are safely archived.
I recognize that this is all massively overkill, but it suits my objectives: do the dirty work now so I can not worry about the details while on my cruise. I'll essentially just start copying the card into a folder on the SSD manually, then create a new ticket (someday I'll automate that...not this time though) in stage 1. I'll then pull out a sheet of paper and scribble out 101, 102, 103, etc. As cards finish copying, I'll remove them from the card reader and put them next to their respective ticket number, and bump their ticket from 1 to A. We can go to dinner etc., and when we return I'll see exactly which cards are safe to reformat and reuse, KNOWING that I have three copies of the data. As the bulk sync/verify jobs finish (either before bed or the following morning), I'll eject HDD2(A or B) and put it into our cabin safe, then connect HDD2(B or A) and run a dummy ticket F-99 through the process to ensure that the other HDD2 is also brought up to current state. At this point, I'll temporarily have four copies of my data, but the SSD will likely start having files deleted to make space (depends on how aggressively we shoot).




Reply With Quote