6 Replies - 283 Views - Last Post: 14 February 2019 - 12:37 PM

#1 atraub   User is offline

  • Pythoneer
  • member icon

Reputation: 830
  • View blog
  • Posts: 2,251
  • Joined: 23-December 08

Isolating create event only for fully formed files

Posted 13 February 2019 - 12:04 PM

Hey all,

Background
I'm working on a little project with a raspberry Pi. My office has one of those giant scanner/printer/fax/copier beasts. Currently, scanned documents go directly into a dropbox folder into a coworker's computer - not ideal. I thought it'd be a simple project to put a little headless Pi out there that files would be scanned to and then could be synced to dropbox. That way, upgrading that pc or her turning her pc off, won't be an irritation in the future. Plus, it gave me an excuse to use my Pi Zero W. So, new scans can be sent to the Pi via Samba.

Haha did you know dropbox doesn't work on arm processors? I sure didn't. My initial solution was to write a Python Script that would check the folder every 5 seconds and the run a bash script to upload it to dropbox and delete the original off my Pi. Not ideal - I'd rather it be event driven. I instead opted to play with Node as I've been using it more lately. first I tried iNotify, but that was a pain. fs can watch a directory but that fired off a lot of extra events so I landed on Chokadir and it works great. Now, when a file is created, an event fires, the file uploads to dropbox using dropbox-uploader from npm


Problem
When a big file is created, the "created" event sometimes fire before the file is actually finished being created. This seems to be happening on the file system level when i test locally, but it wouldn't surprise me if samba is sending the file in chunks as well. Naturally, sending an incomplete file to dropbox and deleting it locally is a pretty nasty mistake. Any thoughts on what I can do about this?

This post has been edited by atraub: 13 February 2019 - 12:05 PM


Is This A Good Question/Topic? 1
  • +

Replies To: Isolating create event only for fully formed files

#2 modi123_1   User is online

  • Suitor #2
  • member icon



Reputation: 14987
  • View blog
  • Posts: 59,842
  • Joined: 12-June 08

Re: Isolating create event only for fully formed files

Posted 13 February 2019 - 12:10 PM

An easy fix would be to have it poll for new files.. keep that info in some stash.. and wait a minute.. if the file sizes change wait another minute.. when they file sizes haven't changed send off to where ever.
Was This Post Helpful? 1
  • +
  • -

#3 atraub   User is offline

  • Pythoneer
  • member icon

Reputation: 830
  • View blog
  • Posts: 2,251
  • Joined: 23-December 08

Re: Isolating create event only for fully formed files

Posted 13 February 2019 - 12:16 PM

Yeah, I thought about that option. I was hoping I could catch the file as soon as it was fully formed - but this may be the best solution.

Yikes! The really big file I used for test is growing to 3x its size after being uploaded.

For the curious:
require('dotenv').config();
const fs = require('fs');
const chokidar = require('chokidar');
const uploadFile = require('dropbox-upload');

const watcher = chokidar.watch(process.env.LOCAL_PATH, {ignored: /^\./, persistent: true});

watcher.on('add', (path) => {
    console.log(`New File detected: ${getFileName(path)}`);
    uploadFile(path, process.env.DROP_PATH, process.env.TOKEN, (data)=> {
        if(data.err === null){ //if successful
            fs.unlink(path, (err)=> {
                if (err){
                    console.error(err)
                } else {
                    console.log(`${getFileName(path)} uploaded to dropbox and deleted successfully!`);
                }
            });
        } else {//if it breaks
            console.error(data.err);
        }
    });
});

function getFileName(path){
    var splitPath = path.split("/");
    return splitPath[splitPath.length - 1]
}

console.log(`Dropbox Watcher: Online`);


This post has been edited by atraub: 13 February 2019 - 12:25 PM

Was This Post Helpful? 0
  • +
  • -

#4 Skydiver   User is online

  • Code herder
  • member icon

Reputation: 6873
  • View blog
  • Posts: 23,310
  • Joined: 05-May 12

Re: Isolating create event only for fully formed files

Posted 13 February 2019 - 03:36 PM

Interestingly, I'm also in the same situation of having to watch for a file as it is being uploaded. I'm using the same strategy of waiting for the filesize as well as last write time attribute to stabilize for 1 minute. (Obviously, I'll make that "stabilization time" a configuration setting.)

If the files were being uploaded directly to my Windows box, I would have just use the .NET Frameworks' FileSystemWatcher just listened for its events. Alas, I have to watch a network drive, and given the vagaries of unreliable networks, I'm forced to do the stabilization check above.

The reason for checking both the last write time as well as the file size is that some of the uploaders I have to deal with allocate the entire file size on the share first, and then go back and fill in the contents. :(
Was This Post Helpful? 0
  • +
  • -

#5 astonecipher   User is offline

  • Senior Systems Engineer
  • member icon

Reputation: 2835
  • View blog
  • Posts: 11,125
  • Joined: 03-December 12

Re: Isolating create event only for fully formed files

Posted 14 February 2019 - 10:11 AM

Are you polling the Samba server?
Was This Post Helpful? 0
  • +
  • -

#6 astonecipher   User is offline

  • Senior Systems Engineer
  • member icon

Reputation: 2835
  • View blog
  • Posts: 11,125
  • Joined: 03-December 12

Re: Isolating create event only for fully formed files

Posted 14 February 2019 - 10:17 AM

https://en.wikipedia.org/wiki/Inotify
Was This Post Helpful? 1
  • +
  • -

#7 Skydiver   User is online

  • Code herder
  • member icon

Reputation: 6873
  • View blog
  • Posts: 23,310
  • Joined: 05-May 12

Re: Isolating create event only for fully formed files

Posted 14 February 2019 - 12:37 PM

My code is running in PowerShell on Windows, so as cool as inotify would be, it's out of reach for me. The next closest is the Filesystem watcher, but from past experience, it is unreliable for watching network drives.

So even with using the watcher, I still have a fallback poller that runs every few minutes if there have been no notifications in a while. I am very tempted to KISS and just use the polling exclusively, since I do not have a tight delivery time requirement.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1