Hi Friends, I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name. 1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra The desired output should be like this admin: ankush, amit powerusers: dinesh, jitendra There are commands available but not able to use it properly to get the desired output. Please help me Thanks & Regards Ankush
I knocked up the enclosed under Cygwin: #!/bin/sh ( cat <<EOTx admin ankush admin amit powerusers dinesh powerusers jitendra EOTx ) | awk ' { grpnm[$1] = grpnm[$1] ", " $2 } END { for (i in grpnm) { print i ": " substr(grpnm[i], 3) } } ' | sort The meat is the AWK programme. If collects all instances of the second column in an array indexed on the entries in the first column. At the end of the input file it handles each element of the array in turn, dropping the grammatically incorrect leading comma and space. The sort just sorts lines alphabetically, as you implied. The ( cat ... ) | construct is just to push in your test data. Are the headings part of the file? In which case you may need to add a line: NR == 1 { next } immediately after the awk line. HTH, Martin Rushton HPC System Manager, Weapons Technologies Tel: 01959 514777, Mobile: 07939 219057 email: jmrushton at QinetiQ.com www.QinetiQ.com QinetiQ - Delivering customer-focused solutions Please consider the environment before printing this email. -----Original Message----- From: centos-bounces at centos.org [mailto:centos-bounces at centos.org] On Behalf Of ankush grover Sent: 30 December 2011 12:01 To: CentOS mailing list Subject: [CentOS] Need help in writing a shell/bash script Hi Friends, I am trying to write a shell script which can merge the 2 columns into 3rd one on Centos 5. The file is very long around 31200 rows having around 1370 unique groups and around 12000 unique user-names. The 1st column is the groupname and then 2nd column is the user-name. 1st Column (Groupname) 2nd Column (username) admin ankush admin amit powerusers dinesh powerusers jitendra The desired output should be like this admin: ankush, amit powerusers: dinesh, jitendra There are commands available but not able to use it properly to get the desired output. Please help me Thanks & Regards Ankush _______________________________________________ CentOS mailing list CentOS at centos.org http://lists.centos.org/mailman/listinfo/centos This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. QinetiQ may monitor email traffic data and also the content of email for the purposes of security. QinetiQ Limited (Registered in England & Wales: Company Number: 3796233) Registered office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.
Hi, On Friday, December 30, 2011 at 9:00 PM, ankush grover wrote:> Hi Friends, > > I am trying to write a shell script which can merge the 2 columns into > 3rd one on Centos 5. The file is very long around 31200 rows having > around 1370 unique groups and around 12000 unique user-names. > The 1st column is the groupname and then 2nd column is the user-name.I?m not sure I understood that ?2 columns into 3rd one? there but...> 1st Column (Groupname) 2nd Column (username) > admin ankush > admin amit > powerusers dinesh > powerusers jitendraIf that?s the format of your input and ?> The desired output should be like this > > admin: ankush, amit > powerusers: dinesh, jitendra >If that?s your desired output, and assuming the input file is already sorted, try the ff: # -- code starts here --> #!/bin/bash GROUPNAMENOW='' while read LINE do GROUPNAME=$(echo $LINE | cut -d ' ' -f 1) USERNAME=$(echo $LINE | cut -d ' ' -f 2) if [ "$GROUPNAME" == "$GROUPNAMENOW" ]; then echo ", $USERNAME" else GROUPNAMENOW=$GROUPNAME echo -n "$GROUPNAMENOW: $USERNAME" fi done < input.txt # <-- code ends here -- Note: Tested and worked as expected in OS X. It should work in CentOS too. HTH, -- - Edo - mailto:ml2edwin at gmail.com ?Happy are those conscious of their spiritual need ?? ?Matthew 5:3
On 12/30/2011 09:00 PM, ankush grover wrote:> Hi Friends, > > I am trying to write a shell script which can merge the 2 columns into > 3rd one on Centos 5. The file is very long around 31200 rows having > around 1370 unique groups and around 12000 unique user-names. > The 1st column is the groupname and then 2nd column is the user-name. > > 1st Column (Groupname) 2nd Column (username) > admin ankush > admin amit > powerusers dinesh > powerusers jitendra > > > > > The desired output should be like this > > admin: ankush, amit > powerusers: dinesh, jitendra > > > There are commands available but not able to use it properly to get > the desired output. Please help meHi Ankush, This will do what you want. But please read the comments in the code. As a side note, this sort of thing is way more natural in Postgres. That will become more apparent as the file contents grow. In particular, the concept of appending tens of thousands of names to a single line in a file is a little crazy, as most text editors will start choking on display without a \n in there somewhere to relieve the way most of them read and display text. #######BEGIN collator.sh #! /bin/bash # # collator.sh # # Invocation: # If executable and in $PATH (~/bin is a good idea): # collator.sh input-filename output-filename # If not executable, not in $PATH, but in present working directory: # sh ./collator.sh input-filename output-filename # # WARNING: There is NO serious attempt at error checking implemented. # This means you should check the contents of OUTFILE before # using it for anything important. INFILE=${1:?"Input filename missing, please read script comments."} OUTFILE=${2:?"Output filename missing, please read script comments."} awk '{print $1 ": "}' $INFILE | uniq > $OUTFILE for GROUP in `cat $OUTFILE | cut -d ':' -f 1` do for NAME in `cat $INFILE | grep $GROUP | awk '{print $2}'` do sed -i "s/^$GROUP: /&$NAME,\ /" $OUTFILE done done #######END collator.sh
On Fri, Dec 30, 2011 at 6:00 AM, ankush grover <ankushcentos at gmail.com> wrote:> Hi Friends, > > I am trying to write a shell script which can merge the 2 columns into > 3rd one on Centos 5. The file is very long around 31200 rows having > around 1370 unique groups and around 12000 unique user-names. > The 1st column is the groupname and then 2nd column is the user-name. > > 1st Column (Groupname) ? ? ? ? ? ?2nd Column (username) > ? ? ? ? ? ? ? ?admin ? ? ? ? ? ? ? ? ? ? ?ankush > ? ? ? ? ? ? ? ?admin ? ? ? ? ? ? ? ? ? ? ? amit > ? ? ? ? ? ? ? ?powerusers ? ? ? ? ? ? ? dinesh > ? ? ? ? ? ? ? ?powerusers ? ? ? ? ? ? ? jitendra > > > > > The desired output should be like this > > admin: ? ankush, amit > powerusers: ?dinesh, jitendra > > > There are commands available but not able to use it properly to get > the desired output. Please help me >Here's a perl approach: #!/usr/bin/perl my ($group,$name); my %groups=(); while (<>) { chomp(); ($group,$name) = split(/ /); push @{ $groups{$group} }, $name; } foreach $group (sort keys(%groups)) { print "$group: " . join("," , @{$groups{$group}}) ."\n"; } Cat or redirect the list to the program input, output is on stdout. -- Les Mikesell lesmikesell at gmail.com