[Discuss] Any Subversion geniuses out there?

John Abreau abreauj at gmail.com
Fri Dec 2 10:01:05 EST 2011


I've seen this before with "text" files on Windows. Just changing the
MIME type wil not work, because the files are encoded in UTF-16
(note *NOT* UTF-8). 16-bit characters, not 8-bit characters. If you
change the MIME type to force it to be interpreted as normal text,
the file will have a null byte between each and every character.

When I had to deal with those issues at a previous job, I used iconv(1)
in my shell scripts to convert the MS "text" to UTF-8.

    iconv --from-code=UTF-16 --to-code=UTF-8 ms-text-file.txt >
plain-text-file.txt

I also ran it through "tr -d '\r'" to scrape off the ^M at the end of
each line before dropping it into the output file, but that's a separate issue.


On Fri, Dec 2, 2011 at 9:40 AM, Matt Shields <matt at mattshields.org> wrote:
> On Fri, Dec 2, 2011 at 8:11 AM, Edward Ned Harvey <blu at nedharvey.com> wrote:
>
>> > From: discuss-bounces+blu=nedharvey.com at blu.org [mailto:discuss-
>> > bounces+blu=nedharvey.com at blu.org] On Behalf Of Matt Shields
>> >
>> >  What I was wondering is it possible in Subversion when a changeset is
>> > being committed that a hook could be used to change the mime-type.  So if
>> > the file being committed is a *.sql, then it would override whatever
>> > mime-type the client is saying and apply text/x-sql.
>>
>> This question will be best answered by the subversion-users mailing list,
>> http://subversion.apache.org/mailing-lists.html
>> but let's see what we can say about it here.
>>
>> The mime type, I believe, is determined by the svn client, and it's
>> determined by file contents.  What do you get, if you run linux "file" on
>> the file?  What do you see if you try to open the file in vim or emacs?
>>
>> I'm sure you can change the mime-type as a precommit or postcommit hook
>> (probably best precommit) but I'm almost equally sure that it's not what
>> you
>> want to do.  When they detect the contents and select a mime type, the
>> reason they're doing it is because svn internally employs all sorts of diff
>> and compression algorithms, to optimize both the network traffic and disk
>> storage.  If you go overriding the mime types against its natural wishes,
>> you run the risk of ...  Suboptimizing performance.  Is probably the
>> diplomatic way of saying effing everything up.
>>
>> Another option you might consider, I believe, is that they have a mechanism
>> of some kind to allow you to inject a custom client-side diff utility for
>> certain files or mime types or something like that.  You might configure it
>> so that your client doing the diff might run something like the SQL
>> equivalent of "dos2unix" to convert a file format and then diff it, or
>> something like that.  Of course the odds of success doing this are
>> diminished by trac.  You might just have to use something like tortoisesvn
>> or whatever to perform these diffs.
>>
>> In fact, tortoisesvn does some pretty excellent diffing.  What happens if
>> you try diffing with tortoise?
>>
>>
> Yes, I'm aware of that, and I can put something in each client's svnconfig
> to override this behavior for specific filetypes.  I don't want to have to
> do that since everytime we get a new developer it's one more step I have to
> remember to do to their dev machine.
>
> The issue is SQL Server Management Studio is encoding it weird and
> TortoiseSVN is then taking that as it being a binary and not a text file.
>  See the two outputs of file.  The first has been fixed by me forcing it to
> be proper encoding and the proper mime-type.  The second was created in
> SSMS and committed.
>
> dbo.Proc_xxxx.sql:         Little-endian UTF-16 Unicode c program text,
> with CRLF, CR line terminators
> dbo.Proc_yyyy.sql:                 ASCII c program text, with CRLF line
> terminators
>
> Yes, diff's in TortoiseSVN are great, same with Unix command line.  The
> issue is the Dir of Tech prefer's to use Trac to review all changes, and
> because it's encoded wrong, that means svn is applying the wrong mime-type
> which causes Trac's diff feature not to work.
>
> In this case I don't believe there is any harm forcing svn to use a
> specific mime-type since they are both text. I'll check out the
> check-mime-type.pl that Greg mentioned.
>
> Matthew Shields
> Owner
> BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
> Managed Services
> www.beantownhost.com
> www.sysadminvalley.com
> www.jeeprally.com
> Like us on Facebook <http://www.facebook.com/beantownhost>
> Follow us on Twitter <https://twitter.com/#!/beantownhost>
> _______________________________________________
> Discuss mailing list
> Discuss at blu.org
> http://lists.blu.org/mailman/listinfo/discuss



-- 
John Abreau / Executive Director, Boston Linux & Unix
OLD GnuPG KeyID: D5C7B5D9 / Email: abreauj at gmail.com
OLD GnuPG FP: 72 FB 39 4F 3C 3B D6 5B E0 C8 5A 6E F1 2C BE 99
2011 PGP KeyID: 32A492D8 / Email: abreauj at gmail.com
2011 PGP FP: 7834 AEC2 EFA3 565C A4B6  9BA4 0ACB AD85 32A4 92D8



More information about the Discuss mailing list