BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Backup software

Subject: Backup software
From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)
Date: Sun, 19 Dec 2010 11:16:52 -0500
In-reply-to: <000001cb9f38$ff577de0$fe0679a0$@nedharvey.com>
References: <4D0B6131.9030303@mohawksoft.com> <000001cb9df9$35146e20$9f3d4a60$@nedharvey.com> <4D0CF015.6030905@mohawksoft.com> <000001cb9f38$ff577de0$fe0679a0$@nedharvey.com>

On 12/18/2010 11:55 PM, Edward Ned Harvey wrote:
>> From: Mark Woodward [mailto:markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org]
>>
>> The permissions to keep Jane from reading tarzan's files is an
>> interesting one. Its obvious when said, but didn't occur to me. It's a
>> potentially difficult problem.  Would merely matching the user name to
>> the owner of the files be enough or would you also require full group
>> access?
>>      
> Well, I have no idea what you have in mind, but ... I usually solve this
> problem by backing up onto standard file shares of some kind.  If it's a
> cifs share on a windows server, I simply set the ACL's to allow only Jane to
> access Jane's backup directory.  If it's Apple, I create separate shares
> which are only accessible by their owner.  And so on.
>    
Well, that's more or less what I intend to do as well. This is basically 
a file system to file system backup. I hesitate to call it a "backup," 
because there are a lot of preconceived notions about what a backup is.  
This is more of a targeted information management system. It is intended 
to handle a number of "businessy" requirements, but at its core, it 
really is just a type of backup program.

Yea, ACLs and so on are probably the way to go if this were a typical 
backup, but it it needs to operate as a stand-alone product as well.

>> So, to get access to a backup set, you would need a user to be created
>> for you by an admin (or some audomated tool, I'm not sure) and then you
>> would only be able to see the files which you own. Would that be OK?

>> Well, yes.  But of course, it's desirable  if it's based on a pre-existing
>> credentials system such as AD.
>>      
Yes, that's a good idea and I could enumerate AD and get all the 
heirarchy and so on, but ACLs are a PITA to parse and get right. For 
instance, built-in accounts as seen by AD are the built-ins on the AD 
server.  So, if the AD server is more or less liberal for the built-ins 
than the targeted systems, then the rights will be incorrect giving too 
much or too little access. Causing users to either be able to access 
data they shouldn't or unable to access data they should.

I think a positive user rights grant by an admin is less problematic to 
get right the first time, and maybe work on the AD matching and 
enumeration later. Besides, it would need to connect to the AD server to 
validate the credentials and there is a strong likelihood that  this 
will operate outside a domain.


>> For compression, a user can specify a level 1-10, 1 is no compression,
>> and 10 is full.
>>      
> This is a low-priority request, basically irrelevant to anything, but I'm an
> idealist so I like to promote.  I think it's desirable to allow a choice of
> compression algorithm.  LZO is always so fast that it effectively removes
> large sequential repeated patterns (such as zero-filled files) but LZO will
> never become processor bound, because it's just so darn fast, and generally
> pretty wimpy compression.  But good whenever you have really fast IO
> channels, because it'll never slow you down and sometimes speed you up.
> Gzip/Zip seem to be industry standard, and as far as I can tell, they have
> no discernable advantage and should be antiquated.  Bzip2 is also often
> used, and it always loses to LZMA, so I think bzip2 should be retired.  LZMA
> (7-zip, XZ) if you set level-1 compression, is both faster and stronger
> compression than any level of gzip or bzip2.  End result is:  The only
> compressions I ever use are lzop and xz -1 (and 7-zip)
>    
If you can recommend a BSD licensed compression library that meets your 
wish list, I'll add it.
>    
>> Sparse files are interesting. I hadn't thought of those. Not sure how to
>> handle them. Got a suggestion?
>>      
> Not really.  It's something a filesystem either supports, or doesn't.  For
> example, if you have a sparse file inside a ZFS filesystem, and you do an
> incremental ZFS send...  Then ZFS is intelligent enough to instantly
> identify and send only the changed blocks.  No scanning or anything.
> However, there is no such functionality in NTFS, EXT3/4, HFS+, or most
> filesystems...
>
> In most filesystems...  Let's suppose you have a disk which will read
> 500Mbit/s (which is a typical 7200rpm sata drive.)  If you simply read a
> non-sparse file from start to end, then of course it will take some amount
> of time, based on the speed of the disk.  But if you read a sparse file from
> start to end, then the filesystem will generate zero-fill for all the sparse
> sections, and it does this faster than the disk could have read real data.
> I'd estimate about 10x faster.
>
> So unless your filesystem has a way of providing an index of the sparse
> sections of a file (which no filesystem does, AFAIK)...  And unless you're
> using ZFS Send...  The best alternative is to simply read the whole sparse
> file from start to end, as fast as possible.  And this is NOT fast, even in
> a sparse file.  Oh well.  The world stinks!!!
>    
Thanks for that, by the way. I had worked on a product, previously, and 
I got a bunch of crap from marketing and QA for discrepancies about 
"file size" and "size on disk." I had to endure a bunch of crap when I 
tried to explain that a sparse file is not compressed, it is "sparse," 
i.e. it doesn't actually take up the size it reports. They had never 
heard of such a thing. (Don't even get me started on a VP of engineering 
who didn't know what a symbolic link was, and having to defend testing 
for such a thing. That's when I should have known to quit ASAP.)

I am hoping that sparse files aren't too much of an issue, because 
without specific platform knowledge (and even then it may be difficult) 
it would be impossible to archive these suckers correctly.
>> In version 1.0 I'm stuck, its a fail walker. Next version I'm thinking
>> of using OS specific file access monitoring for incremental backup  and
>> access logging.
>>      
> FYI, easier said than done.  Just so you know.  Yes, it's possible (just
> look at dropbox, and ifolder, and sugarsync...)  but I have to assume there
> are tricks and unreliability issues and booby traps, or else there would be
> more products out there capable of doing that.
>    
Well, like all things, everything is imperfect. I'm more thinking of the 
JPEG sort of model where you have anchor blocks of full backup, followed 
by incremental.  With access monitoring, you may miss a change or two, 
but the anchor block will keep you from getting too out of date.

References:
- Backup software
  - From: markw-FJ05HQ0HCKaWd6l5hS35sQ at public.gmane.org (Mark Woodward)

Prev by Date: Red Hat's response to my system-config-samba rhel6 issue
Next by Date: apache puzzler...
Previous by thread: Backup software
Next by thread: Samba, id, uid, Active Directory and CentOS 5
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Boston Linux & Unix / webmaster@blu.org