[Discuss] Weird awk processing

Jerry Feldman gaf at blu.org
Thu May 2 15:52:33 EDT 2013


On 05/02/2013 01:11 AM, David Rosenstrauch wrote:
> Just stumbled upon the most bizarre awk problem.  mawk and gawk are 
> showing 2 different results for the same code.  Can anyone shed any 
> light?
>
> TIA!
>
> DR
>
> ---
>
> sense at ip-10-98-190-45.job:/sense/work/feature-summary/debugging$ cat 
> sample.txt
> 32e49398e024dcb79a319c62ceb213ae3e824f77        2
> 32e4cb91fdefe6103d73f1d6e43ecd8430f85334        2
> 32e4cb91fdefe6103d73f1d6e43ecd8430f85334        132
> 32e5434c41a8e2f0178fd19bd868758af6eb67c0        2
> 32e56067        10
> 32e56067        79
> 32e56122        59
> 32e57aacfd27f7fde61184052cb35551213c7cd6        5
>
> sense at ip-10-98-190-45.job:/sense/work/feature-summary/debugging$ cat 
> ../totals-by-label.awk
> #!/usr/bin/awk -f
>
> BEGIN {FS="\t"; prev_label = "";}
> {
>         curr_label=$1;
>         count=$2;
>         if (prev_label != "" && curr_label != prev_label) {
>                 output();
>         }
>         prev_label=curr_label;
>         tot += count;
> }
> END { output(); }
>
> function output() {
>         print prev_label"\t"tot;
>         tot = 0;
> }
>
> sense at ip-10-98-190-45.job:/sense/work/feature-summary/debugging$ cat 
> sample.txt | mawk -f ../totals-by-label.awk
> 32e49398e024dcb79a319c62ceb213ae3e824f77        2
> 32e4cb91fdefe6103d73f1d6e43ecd8430f85334        134
> 32e5434c41a8e2f0178fd19bd868758af6eb67c0        2
> 32e56122        148
> 32e57aacfd27f7fde61184052cb35551213c7cd6        5
>
> sense at ip-10-98-190-45.job:/sense/work/feature-summary/debugging$ cat 
> sample.txt | gawk -f ../totals-by-label.awk
> 32e49398e024dcb79a319c62ceb213ae3e824f77        2
> 32e4cb91fdefe6103d73f1d6e43ecd8430f85334        134
> 32e5434c41a8e2f0178fd19bd868758af6eb67c0        2
> 32e56067        89
> 32e56122        59
> 32e57aacfd27f7fde61184052cb35551213c7cd6        5
> _
>
I get the same results from the 2. The first thing I did was to run 
sample.txt through sed to convert the spaces to tabs. Certainly the 
correct result is the gawk result you got.

[gaf at gaf awk]$ cat sample.txt | gawk -f totals-by-label.awk
2e49398e024dcb79a319c62ceb213ae3e824f77 2
32e4cb91fdefe6103d73f1d6e43ecd8430f85334        134
32e5434c41a8e2f0178fd19bd868758af6eb67c0        2
32e56067        89
32e56122        59
32e57aacfd27f7fde61184052cb35551213c7cd6        5

[gaf at gaf awk]$ cat sample.txt | mawk -f totals-by-label.awk
2e49398e024dcb79a319c62ceb213ae3e824f77 2
32e4cb91fdefe6103d73f1d6e43ecd8430f85334        134
32e5434c41a8e2f0178fd19bd868758af6eb67c0        2
32e56067        89
32e56122        59
32e57aacfd27f7fde61184052cb35551213c7cd6        5

-- 
Jerry Feldman <gaf at blu.org>
Boston Linux and Unix
PGP key id:3BC1EB90
PGP Key fingerprint: 49E2 C52A FC5A A31F 8D66  C0AF 7CEA 30FC 3BC1 EB90




More information about the Discuss mailing list