simple regex question

Danny daniel.robert at acm.org
Thu Jan 11 16:06:42 EST 2007


Quoting Dwight E Chadbourne <dwighte.chadbourne at stopandshop.com>:

> Hi all.  I want the 20 digit hash in this text.
>
> d5:filesd20:xxxxxxxxxxxxxxxxxxxxd8:completei2e10:downloadedi0e10:incompletei4e
> 4:name12:xxxxxxxxxxxxee5:flagsd20:min_request_intervali3600eee
>
> How do I get only the xxxxxxxxxxxxxxxxxxxx and not the preceding
> identifier?

I can't give a definitive answer without knowing if there's either  
always 20 "x"s, or if you just want the full text in between the  
second and third ':'.

So, using regex:

1) assuming 20 characters, immediately following after the second ':'
     ^([^:]*:){2}(.{20}).*$
This will set your desired value in the backreference #2, so if you  
were using perl (assuming your original content was in '$string')
$string =~ s/^([^:]+:){2}(.{20}).*$/\2/;

2) The full text between the second and third ':'
    ^([^:]*:){2}([^:]*):.*$
Again, this will put everything between the second and third ':' into  
backreferece #2, to be used in the same fashion as the previous example.

One of the other responders mentioned using 'awk' via the command-line  
to isolate the content between the second and third ':'.  You could  
use 'cut' to accomplish the same thing.

echo  
"d5:filesd20:xxxxxxxxxxxxxxxxxxxxd8:completei2e10:downloadedi0e10:incompletei4e" | cut -d':'  
-f3

This specifies that the field delimiter is ':' and that you want the  
third field isolated.

I hope this was helpful,
-Danny Robert
daniel.robert at acm.org

P.S.:  This is my first post to this user list having moved to boston  
about a year ago.  Just thought I'd say "hi".

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.




More information about the Discuss mailing list