Amazon S3 and rsync snapshots?

Sat Jan 17 04:29:12 EST 2009

On Fri, Jan 2, 2009 at 3:05 PM, John Abreau <jabr-iwcNaMm7aMIiq3RsQ1AnAw at public.gmane.org> wrote:
> Hi, Jay.
>
> That's pretty much what I assumed the process would be. The description
> doesn't address my two concerns, though:
>
> 1. By mounting it as a filesystem and then running rsync on top of that,
> rsync sees the s3 filesystem as a "local" filesystem, and therefore as part
> of the process of checking if a file needs to be updated, it copies the
> entire file from s3 to generate its hash for comparison. Rsync to a remote
> system invokes rsync on the remote end to compute the hash,and avoids
> the bandwidth usage that the "local" rsync uses.
>
> 2. The rsync snapshots process uses hard links to make each daily backup
> directory look like a complete filesystem -- daily.0, daily.1,
> daily.2, etc. are
> all complete filesystems from different days, but files that are the same in
> all of these are hard-linked to a single instance, so it doesn't waste storage
> space with multiple copies of the same file. Is it possible to do the same
> with an s3-based solution?
>

John,

You may be able to use a combination of Amazon S3 for storage and
Amazon EC2 to handle the rsync snapshots.  I came across this howTo
for accessing S3 data using an EC2 instance:
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=931&categoryID=100