I know this is for CentOS stuff, but I'm at a loss on how to build a script that does what I need it to do.? It's probably really logically simple, I'm just not seeing it.? Hopefully someone will take pity on me and at least give me a big hint. I have a file with two columns 'email' and 'total' like this: me at example.com 20 me at example.com 40 you at domain.com 100 you at domain.com 30 I need to get the total number of messages for each email address.? This type of code has always been the hardest for me for whatever reason, and honestly, I don't write many scripts these days. I'm struggling to get psuedocode that works, much less a working script. I know this is off topic, and if it gets modded out, that's fine.? I just can't wrap my brain around it. -- Mark Haney Network Engineer at NeoNova 919-460-3330 option 1 mark.haney at neonova.net www.neonova.net
On Wed, Oct 25, 2017 at 9:02 AM, Mark Haney <mark.haney at neonova.net> wrote:> I know this is for CentOS stuff, but I'm at a loss on how to build a script > that does what I need it to do. It's probably really logically simple, I'm > just not seeing it. Hopefully someone will take pity on me and at least > give me a big hint. > > I have a file with two columns 'email' and 'total' like this: > > me at example.com 20 > me at example.com 40 > you at domain.com 100 > you at domain.com 30 > > I need to get the total number of messages for each email address. This > type of code has always been the hardest for me for whatever reason, and > honestly, I don't write many scripts these days. I'm struggling to get > psuedocode that works, much less a working script. I know this is off topic, > and if it gets modded out, that's fine. I just can't wrap my brain around > it. >here is a python solution #!/usr/bin/python #python 2 (did not check if it works) f=open('yourfilename') D={} for line in f: email,num = line.split() if email in D: D[email] = D[email] + num else: D[email] = num f.close() for key in D: print key, D[key]
On 10/25/2017 12:33 PM, Robert Arkiletian wrote:> here is a python solution > #!/usr/bin/python > #python 2 (did not check if it works) > f=open('yourfilename') > D={} > for line in f: > email,num = line.split() > if email in D: > D[email] = D[email] + num > else: > D[email] = num > f.close() > for key in D: > print key, D[key] > _______________________________________________That gets me closer, I think.? It's concatenating the number of messages, but it's a start. Thanks. -- Mark Haney Network Engineer at NeoNova 919-460-3330 option 1 mark.haney at neonova.net www.neonova.net
On Wed, 2017-10-25 at 12:02 -0400, Mark Haney wrote:> I know this is for CentOS stuff, but I'm at a loss on how to build a > script that does what I need it to do. It's probably really logically > simple, I'm just not seeing it. Hopefully someone will take pity on me > and at least give me a big hint. > > I have a file with two columns 'email' and 'total' like this: > > me at example.com 20 > me at example.com 40 > you at domain.com 100 > you at domain.com 30 > > I need to get the total number of messages for each email address. This > type of code has always been the hardest for me for whatever reason, and > honestly, I don't write many scripts these days. I'm struggling to get > psuedocode that works, much less a working script. I know this is off > topic, and if it gets modded out, that's fine. I just can't wrap my > brain around it. >Not bash but perl: ##### #!/usr/bin/perl my %dd; while (<>) { my @f=split; $dd{$f[0]}{COUNT}+=$f[1]; } print "\nSums:\n"; for (keys %dd) { print "$_\t $dd{$_}{COUNT}\n"; }; #### It takes the data on stdin, sums it into an associative array and prints out the result Results: ###### $ ./ppp me at example.com 20 me at example.com 40 you at domain.com 100 you at domain.com 30 Sums: you at domain.com 130 me at example.com 60 ###### I'm sure some perl monk can come up with a single line command to do the same thing. P.
On Oct 25, 2017, at 10:02 AM, Mark Haney <mark.haney at neonova.net> wrote:> > I have a file with two columns 'email' and 'total' like this: > > me at example.com 20 > me at example.com 40 > you at domain.com 100 > you at domain.com 30 > > I need to get the total number of messages for each email address.This screams out for associative arrays. (Also called hashes, dictionaries, maps, etc.) That does limit you to CentOS 7+, or maybe 6+, as I recall. CentOS 5 is definitely out, as that ships Bash 3, which lacks this feature. #!/bin/bash declare -A totals while read line do IFS="\t " read -r -a elems <<< "$line" email=${elems[0]} subtotal=${elems[1]} declare -i n=${totals[$email]} n=n+$subtotal totals[$email]=$n done < stats for k in "${!totals[@]}" do printf "%6d %s\n" ${totals[$k]} $k done You?re making things hard on yourself by insisting on Bash, by the way. This solution is better expressed in Perl, Python, Ruby, Lua, JavaScript?probably dozens of languages.
Although "not my question", thanks, I learned a lot about array processing from your example. ----- Original Message ----- From: "warren" <warren at etr-usa.com> To: "centos" <centos at centos.org> Sent: Wednesday, October 25, 2017 11:47:12 AM Subject: Re: [CentOS] [OT] Bash help On Oct 25, 2017, at 10:02 AM, Mark Haney <mark.haney at neonova.net> wrote:> > I have a file with two columns 'email' and 'total' like this: > > me at example.com 20 > me at example.com 40 > you at domain.com 100 > you at domain.com 30 > > I need to get the total number of messages for each email address.This screams out for associative arrays. (Also called hashes, dictionaries, maps, etc.) That does limit you to CentOS 7+, or maybe 6+, as I recall. CentOS 5 is definitely out, as that ships Bash 3, which lacks this feature. #!/bin/bash declare -A totals while read line do IFS="\t " read -r -a elems <<< "$line" email=${elems[0]} subtotal=${elems[1]} declare -i n=${totals[$email]} n=n+$subtotal totals[$email]=$n done < stats for k in "${!totals[@]}" do printf "%6d %s\n" ${totals[$k]} $k done You?re making things hard on yourself by insisting on Bash, by the way. This solution is better expressed in Perl, Python, Ruby, Lua, JavaScript?probably dozens of languages. _______________________________________________ CentOS mailing list CentOS at centos.org https://lists.centos.org/mailman/listinfo/centos
Warren Young wrote:> On Oct 25, 2017, at 10:02 AM, Mark Haney <mark.haney at neonova.net> wrote: >> >> I have a file with two columns 'email' and 'total' like this: >> >> me at example.com 20 >> me at example.com 40 >> you at domain.com 100 >> you at domain.com 30 >> >> I need to get the total number of messages for each email address. > > This screams out for associative arrays. (Also called hashes, > dictionaries, maps, etc.) > > That does limit you to CentOS 7+, or maybe 6+, as I recall. CentOS 5 is > definitely out, as that ships Bash 3, which lacks this feature.<snip> Associative arrays? Awk! Awk! (No, I am not a seagull...) sort file | awk '{ array[$1] += $2;} END { for (i in array) { print i "\t" array[i];}' mark "associative arrays, how do I love thee? Let me tot the arrays..."
On 10/25/2017 12:47 PM, Warren Young wrote:> > You?re making things hard on yourself by insisting on Bash, by the way. This solution is better expressed in Perl, Python, Ruby, Lua, JavaScript?probably dozens of languages.Yeah, you're right, I am. An associative array was the first thing I thought of, then realized BASH doesn't do those.? I honestly expected there to be a fairly straight forward way to do it in BASH, but I was sadly mistaken.? In my defense, I gave virtually no thought on the logic of what I was trying to do until after I'd committed significant time to a BASH script.? (Well maybe that's not a defense, but an indictment.) As I said, I don't do much scripting anymore as the majority of my time is spent DB tuning and Ansible automation.? Not really an excuse, and I appreciate your indulgence(s) in giving me a hand.? As embarrassed as I am, I'll just go sit in the corner the rest of the day. Thanks again. -- Mark Haney Network Engineer at NeoNova 919-460-3330 option 1 mark.haney at neonova.net www.neonova.net
On Wed, Oct 25, 2017 at 10:47:12AM -0600, Warren Young wrote:> On Oct 25, 2017, at 10:02 AM, Mark Haney <mark.haney at neonova.net> wrote: > > > > I have a file with two columns 'email' and 'total' like this: > > > > me at example.com 20 > > me at example.com 40 > > you at domain.com 100 > > you at domain.com 30 > > > > I need to get the total number of messages for each email address. > > This screams out for associative arrays. (Also called hashes, dictionaries, maps, etc.) > > That does limit you to CentOS 7+, or maybe 6+, as I recall. CentOS 5 is definitely out, as that ships Bash 3, which lacks this feature. > > > #!/bin/bash > declare -A totals > > while read line > do > IFS="\t " read -r -a elems <<< "$line" > email=${elems[0]} > subtotal=${elems[1]} > > declare -i n=${totals[$email]} > n=n+$subtotal > totals[$email]=$n > done < stats > > for k in "${!totals[@]}" > do > printf "%6d %s\n" ${totals[$k]} $k > doneA slightly different approach written for ksh but seems to also work with bash 4. typeset -A arr while read addr cnt do arr[$addr]=$(( ${arr[$addr]:-0} + cnt)) done < ${1} for a in ${!arr[*]} do printf "%6d %s\n" ${arr[$a]} $a done Jon -- Jon H. LaBadie jon at jgcomp.com 11226 South Shore Rd. (703) 787-0688 (H) Reston, VA 20190 (703) 935-6720 (C)
> > You?re making things hard on yourself by insisting on Bash, by the way.I'd always assumed that shell scripting was a kind of sado masochistic medium allowing people who don't get out much to inflict horrible torture on each other. It certainly causes me great pain every time I try and read a bash script with more than a couple of clauses. I'm just taking over a bunch of bash CI plumbing that seems to have been written by a committee of Manson family members.
On Wed, Oct 25, 2017 at 9:47 AM, Warren Young <warren at etr-usa.com> wrote:> > This screams out for associative arrays. (Also called hashes, dictionaries, maps, etc.) > > That does limit you to CentOS 7+, or maybe 6+, as I recall. CentOS 5 is definitely out, as that ships Bash 3, which lacks this feature.Nonsense. Every POSIX shell has an associative array called "the filesystem." (hash=$(mktemp -d); while read addr msgs; do echo $msgs >> "$hash/$addr"; done; cd "$hash"; for x in *; do echo "$x $(paste -s -d+ < $x | bc)"; done;) < msg-counts