Using Amazon's Elastic Cloud EC2 and Rsync to back up data files

James Kramer kramerjm-Re5JQEeQqe8AvxtiuMwx3w at public.gmane.org
Wed Jan 28 13:55:56 EST 2009


In the past I have been using S3sync or S3fs to  backup my data files
to Amazon's S3 storage.  Recently I switched to using Amazon's Elastic
Cloud EC2 to mirror my data files using Rsync.  It works very well.
It is much much faster than using s3sync or s3fs to backup to Amazon
S3.  What normally took all night using s3sync or s3fs was
accomplished in a few hours with EC2 using the method described in
this HowTo:

http://www.freewisdom.org/en/all/entries/2008/09/17/backup_with_rsync/

Some comments:
1.  you need to use Sun's version of Java.  For Ubuntu I did the following:
 apt-get install -y sun-java6-bin unzip

 sudo update-java-alternatives -s java-6-sun

2.  It was necessary for me to provide the full path to the file
'id_rsa-keypair'. and to 'chmod 600 id_rsa-keypair'

The only problem that I ran into is how to use the ssh commands in
scripts and cron.  Each time that I run the script, it is necessary to
interactively respond to the question:

	RSA key fingerprint is cb:79:eb:b5:40:2d:9a:2b:20:47:53:c8:09:4c:54:57.

	Are you sure you want to continue connecting (yes/no)?

	What is the password:

The RSA fingerprint and IP address change each time that I run the
script because it creates and terminates an EC2 Instance—each of which
has its own unique DNS name.

The only way that I could get the script to work without interactively
responding to the ssh prompts is to set up the passkey without a
passphrase which is the way that it is set up in the HowTo.  This
reduces the security of ssh and makes is easier for man-in-the-middle
attacks.  It was also necessary for me to modify the ssh commands
which were described in the HowTo by adding an additional option to
the ssh command:
	'ssh -o StrickHostKeyChecking=no .....'
This further reduces the security of the system, but I can see no
other way to run the scripts.

Another concern that I have is that the 'known-hosts' file which
stores the host fingerprints will become increasingly large with each
run of the script.

 I would appreciate any suggestions.

Jay






More information about the Discuss mailing list