Hi Friends,
I am trying to write a shell script which can merge the 2 columns into
3rd one on Centos 5. The file is very long around 31200 rows having
around 1370 unique groups and around 12000 unique user-names.
The 1st column is the groupname and then 2nd column is the user-name.
1st Column (Groupname) 2nd Column (username)
admin ankush
admin amit
powerusers dinesh
powerusers jitendra
The desired output should be like this
admin: ankush, amit
powerusers: dinesh, jitendra
There are commands available but not able to use it properly to get
the desired output. Please help me
Thanks & Regards
Ankush
I knocked up the enclosed under Cygwin:
#!/bin/sh
(
cat <<EOTx
admin ankush
admin amit
powerusers dinesh
powerusers jitendra
EOTx
) | awk '
{
grpnm[$1] = grpnm[$1] ", " $2
}
END {
for (i in grpnm) {
print i ": " substr(grpnm[i], 3)
}
}
' | sort
The meat is the AWK programme. If collects all instances of the second
column in an array indexed on the entries in the first column. At the
end of the input file it handles each element of the array in turn,
dropping the grammatically incorrect leading comma and space. The sort
just sorts lines alphabetically, as you implied. The ( cat ... ) |
construct is just to push in your test data.
Are the headings part of the file? In which case you may need to add a
line:
NR == 1 { next }
immediately after the awk line.
HTH,
Martin Rushton
HPC System Manager, Weapons Technologies
Tel: 01959 514777, Mobile: 07939 219057
email: jmrushton at QinetiQ.com
www.QinetiQ.com
QinetiQ - Delivering customer-focused solutions
Please consider the environment before printing this email.
-----Original Message-----
From: centos-bounces at centos.org [mailto:centos-bounces at centos.org] On
Behalf Of ankush grover
Sent: 30 December 2011 12:01
To: CentOS mailing list
Subject: [CentOS] Need help in writing a shell/bash script
Hi Friends,
I am trying to write a shell script which can merge the 2 columns into
3rd one on Centos 5. The file is very long around 31200 rows having
around 1370 unique groups and around 12000 unique user-names.
The 1st column is the groupname and then 2nd column is the user-name.
1st Column (Groupname) 2nd Column (username)
admin ankush
admin amit
powerusers dinesh
powerusers jitendra
The desired output should be like this
admin: ankush, amit
powerusers: dinesh, jitendra
There are commands available but not able to use it properly to get the
desired output. Please help me
Thanks & Regards
Ankush
_______________________________________________
CentOS mailing list
CentOS at centos.org
http://lists.centos.org/mailman/listinfo/centos
This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is
addressed. If you are not the intended recipient of this email,
you must neither take any action based upon its contents, nor
copy or show it to anyone. Please contact the sender if you
believe you have received this email in error. QinetiQ may
monitor email traffic data and also the content of email for
the purposes of security. QinetiQ Limited (Registered in England
& Wales: Company Number: 3796233) Registered office: Cody Technology
Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.
Hi, On Friday, December 30, 2011 at 9:00 PM, ankush grover wrote:> Hi Friends, > > I am trying to write a shell script which can merge the 2 columns into > 3rd one on Centos 5. The file is very long around 31200 rows having > around 1370 unique groups and around 12000 unique user-names. > The 1st column is the groupname and then 2nd column is the user-name.I?m not sure I understood that ?2 columns into 3rd one? there but...> 1st Column (Groupname) 2nd Column (username) > admin ankush > admin amit > powerusers dinesh > powerusers jitendraIf that?s the format of your input and ?> The desired output should be like this > > admin: ankush, amit > powerusers: dinesh, jitendra >If that?s your desired output, and assuming the input file is already sorted, try the ff: # -- code starts here --> #!/bin/bash GROUPNAMENOW='' while read LINE do GROUPNAME=$(echo $LINE | cut -d ' ' -f 1) USERNAME=$(echo $LINE | cut -d ' ' -f 2) if [ "$GROUPNAME" == "$GROUPNAMENOW" ]; then echo ", $USERNAME" else GROUPNAMENOW=$GROUPNAME echo -n "$GROUPNAMENOW: $USERNAME" fi done < input.txt # <-- code ends here -- Note: Tested and worked as expected in OS X. It should work in CentOS too. HTH, -- - Edo - mailto:ml2edwin at gmail.com ?Happy are those conscious of their spiritual need ?? ?Matthew 5:3
On 12/30/2011 09:00 PM, ankush grover wrote:> Hi Friends, > > I am trying to write a shell script which can merge the 2 columns into > 3rd one on Centos 5. The file is very long around 31200 rows having > around 1370 unique groups and around 12000 unique user-names. > The 1st column is the groupname and then 2nd column is the user-name. > > 1st Column (Groupname) 2nd Column (username) > admin ankush > admin amit > powerusers dinesh > powerusers jitendra > > > > > The desired output should be like this > > admin: ankush, amit > powerusers: dinesh, jitendra > > > There are commands available but not able to use it properly to get > the desired output. Please help meHi Ankush, This will do what you want. But please read the comments in the code. As a side note, this sort of thing is way more natural in Postgres. That will become more apparent as the file contents grow. In particular, the concept of appending tens of thousands of names to a single line in a file is a little crazy, as most text editors will start choking on display without a \n in there somewhere to relieve the way most of them read and display text. #######BEGIN collator.sh #! /bin/bash # # collator.sh # # Invocation: # If executable and in $PATH (~/bin is a good idea): # collator.sh input-filename output-filename # If not executable, not in $PATH, but in present working directory: # sh ./collator.sh input-filename output-filename # # WARNING: There is NO serious attempt at error checking implemented. # This means you should check the contents of OUTFILE before # using it for anything important. INFILE=${1:?"Input filename missing, please read script comments."} OUTFILE=${2:?"Output filename missing, please read script comments."} awk '{print $1 ": "}' $INFILE | uniq > $OUTFILE for GROUP in `cat $OUTFILE | cut -d ':' -f 1` do for NAME in `cat $INFILE | grep $GROUP | awk '{print $2}'` do sed -i "s/^$GROUP: /&$NAME,\ /" $OUTFILE done done #######END collator.sh
On Fri, Dec 30, 2011 at 6:00 AM, ankush grover <ankushcentos at gmail.com> wrote:> Hi Friends, > > I am trying to write a shell script which can merge the 2 columns into > 3rd one on Centos 5. The file is very long around 31200 rows having > around 1370 unique groups and around 12000 unique user-names. > The 1st column is the groupname and then 2nd column is the user-name. > > 1st Column (Groupname) ? ? ? ? ? ?2nd Column (username) > ? ? ? ? ? ? ? ?admin ? ? ? ? ? ? ? ? ? ? ?ankush > ? ? ? ? ? ? ? ?admin ? ? ? ? ? ? ? ? ? ? ? amit > ? ? ? ? ? ? ? ?powerusers ? ? ? ? ? ? ? dinesh > ? ? ? ? ? ? ? ?powerusers ? ? ? ? ? ? ? jitendra > > > > > The desired output should be like this > > admin: ? ankush, amit > powerusers: ?dinesh, jitendra > > > There are commands available but not able to use it properly to get > the desired output. Please help me >Here's a perl approach: #!/usr/bin/perl my ($group,$name); my %groups=(); while (<>) { chomp(); ($group,$name) = split(/ /); push @{ $groups{$group} }, $name; } foreach $group (sort keys(%groups)) { print "$group: " . join("," , @{$groups{$group}}) ."\n"; } Cat or redirect the list to the program input, output is on stdout. -- Les Mikesell lesmikesell at gmail.com