Keeping two file servers in sync (separate location, not via the internet)

witnessmenow · 06-04-2012 11:24AM #1

Hi Guys,

I have a problem and I was wondering would anyone be able to help me out with it.

I have two file servers, one in Athlone (linux) and one in Galway(currently windows, but i can change it).

The Galway one is a branch (a copy) of the Athlone one from a few months ago, the problem is I'm having massive issues keeping them in sync manually.

The internet in Athlone is not good enough to simply rsync the two servers. I have a 320gb portable hdd for a "sneaker net" (bringing between both sites) connection.

I have a few ideas on how you might approach this, I am willing to write an application to do this myself, but i would prefer if there was something out there that would do this for me.

My ideal scenario is that I can run an application or a command, It would check the other servers files (via ssh or http or something) and copy the differences to an external hard drive.

Another thing that would work fine for me is that each machine would mimic their folder structure on the portable HDD with 0 bytes files with the correct names and then i used some tool on each server to add any files that it didnt find its name sake on the hard drive.

Another approach might be using version control to detect the differences, again I would need to use the portable HDD to do the transfer.

Last approach i think of is a seriously throttled rsync. As i mentioned the Athlone internet is the weak point. We dont have much TV (old sky box, no sub) in the house so we use the RTE player and the like to watch stuff alot. So basically I would have to make it only go at night, or employ some device to do bandwidth managment on the net to give the rsync very low priority (easier said than done as the same machine is a web server too so it cant be a IP address that marks it)

Any suggestions on how to achieve any of the above or any other way of doing this would be greatly appreciated.

Thanks

niallb · 07-04-2012 11:33PM

You could create a vpn tunnel with limited bandwidth and run rsync across that. That would require very little tweaking.

Some numbers would be really useful:
What speed link do you actually have at both ends?
How much data is involved?
How much changes on a daily basis?

Which is more important - TV or server sync?
Are you living at the faster or slower location?
Bear in mind that maximum download is at best the maximum upload at the other end, so if you've adsl you'll not interfere with incoming TV as much as you might think.

freelancerTax · 08-04-2012 11:44AM

have a look at duplicity
u can create diffs against a local backup to tranfer later

ft

witnessmenow · 08-04-2012 12:41PM

Hey guys, thanks for the replies.

Ok the numbers:

There is about 2tb in each location about 90% of which exists in both locations (At a guess i would say there is 100gb or so that is in Galway that isnt in Athlone, and maybe 20gb that's in Athlone that's not in Galway)

Athlone is eircom 7mb with the "unlimited" add-on. In the real world i get about half of this, I have never seen downloads go any faster than 450kb/s which translates to about 3.6mb. According to speedtest my upload is 0.15mb

Galway is UPC 20/25mb (not sure which) . I seem to get the majority of this. Not sure what the upload is, 1mb maybe?

Monday to Friday I live in Galway, and i stay in Athlone at the weekends. Athlone would have active internet users all week. Galway would have less internet activity at the weekend than during the week, but still might have some.

Most of the downloading would happen in Galway. So Athlone's upstream would be used less.

The stuff might change daily in Galway, but it does not need to be synced daily (a lag of a few days/weeks would be acceptable).

Then sync is very much a background activity and should not interfere with anything else going on.

Any more info needed just shout

I'll have a look at duplicity thanks

Khannie · 10-04-2012 12:28PM

What kind of files are you looking at? Are they mostly text type files? (e.g. source code) These tend to compress incredibly well so you'd only be looking at transferring a fraction of the total payload using a compressed rsync.

witnessmenow · 11-04-2012 10:54PM

Videos, they are both media servers to various devices and computers throughout each house. The sync is not for backup purposes, just for convenience

timbyr · 12-04-2012 10:37PM

I came up with this as a possible solution assuming bash, find and openSSH are available on both servers. Possibly through cygwin on Windows.

Copying all files on your local server to your portable HDD that aren't on the remote server.

#!/bin/bash
if [ -z "$1" ] || [ -z "$2" ] || [ -z "$3" ] || [ -z "$4" ]
then
        exit 1
fi
ssh "$2" test -d "$3"
if [ -d "$1" ] && [ -d "$4" ] && [[ $? -eq 0 ]]
then
        VAR1="$(echo $1 | sed 's/\(.*[^\/]\)$/\1\//')"
        cd $VAR1
        join --nocheck-order -v 1 <(find $1 -type f -printf "%P\n" | sort) <(ssh "$2" 'find $2 -type f -printf "./%P\n"' | sort) | xargs tar -c | tar -xC $4   
fi

Usage is

./script /path/to/local/files [email]user@server.example.com[/email] /path/to/files/on/server /path/to/portable/hdd

Kind of hacky but maybe someone else can build on it.

witnessmenow · 14-04-2012 10:08AM

Thanks Timbyr! I'll def give that a go!

Could someone give me a quick breakdown on what it does?

So it checks the param inputs

Then logs into the server and moves to the correct folder

checks if the HDD and local path are folders (not sure what $? -eq 0 does)

Not sure what the regex is doing

No idea what the last line is doing other than taring up whatever it finds to the HDD

Is it recursive?

Thanks

timbyr · 14-04-2012 02:52PM

Checks if all the parameters are there.

if [ -z "$1" ] || [ -z "$2" ] || [ -z "$3" ] || [ -z "$4" ]
then
        exit 1
fi

Checks if the local directories exist and if the directory exists on the remote server. The $? -eq 0 part checks the return value of ssh "$2" test -d "$3"

ssh "$2" test -d "$3"
if [ -d "$1" ] && [ -d "$4" ] && [[ $? -eq 0 ]]

This bit is unnecessary. I left it in by mistake :P

        VAR1="$(echo $1 | sed 's/\(.*[^\/]\)$/\1\//')"
        cd $VAR1

You can replace it just the following if you want.

        cd $1

List all files in the local directory in alphabetical order and their relative paths.

(find $1 -type f -printf "%P\n" | sort)

List all files in the remote directory in alphabetical order and their relative paths.

(ssh "$2" 'find $2 -type f -printf "./%P\n"' | sort)

This takes two sorted lists and outputs lines are in list1 but not list2; ie. Files that are on the local system but not the remote.

join --nocheck-order -v 1 <list1 <list2

All this does is take a file and it's relative path and copies it with it's relative path to the directory $4.
I couldn't find any obviously tool to a copy a file ./dir/file to /somewhere/dir/file. Only ./dir/file to /somewhere/file.
The effect here is that it tars up the file and it's path and then extracts the file and creates the directory structure if it doesn't exist.

xargs tar -c | tar -xC $4

Keeping two file servers in sync (separate location, not via the internet)

Comments