Dear all, I'm trying to process a log file which logs the date, the username and the computer number accessed. The table looks like this:>table.usersDate UserName Machine 1 2008-11-25 John 641 2 2008-11-25 Clive 611 3 2008-11-25 Jeremy 641 4 2008-11-25 Walt 722 5 2008-11-25 Tony 645 6 2008-11-26 Tony 645 7 2008-11-26 Tony 641 8 2008-11-26 Tony 641 9 2008-11-26 Walt 641 10 2008-11-26 Walt 645 11 2008-11-30 John 641 12 2008-11-30 Clive 611 13 2008-11-30 Tony 641 14 2008-11-30 John 641 15 2008-11-30 John 641 ..................etc What I want to do is to find out how many unique users logged on each day, and how many individual machines where accessed per day. In the above example, therefore on 2008-11-25 there were 5 separate users accessing 4 machines, on 2008-11-26 there were 2 unique users who used 2 machines (although both logged on more than once). I've got as far as apply(table.users, 2, FUN=table) which gives me an output of date, or username or machine and how many times they were accessed, but not really what I want. Any help appreciated Jabez [[alternative HTML version deleted]]
On Nov 13, 2009, at 6:03 AM, Jabez Wilson wrote:> Dear all, I'm trying to process a log file which logs the date, the > username and the computer number accessed. The table looks like this: >> table.users > Date UserName Machine > 1 2008-11-25 John 641 > 2 2008-11-25 Clive 611 > 3 2008-11-25 Jeremy 641 > 4 2008-11-25 Walt 722 > 5 2008-11-25 Tony 645 > 6 2008-11-26 Tony 645 > 7 2008-11-26 Tony 641 > 8 2008-11-26 Tony 641 > 9 2008-11-26 Walt 641 > 10 2008-11-26 Walt 645 > 11 2008-11-30 John 641 > 12 2008-11-30 Clive 611 > 13 2008-11-30 Tony 641 > 14 2008-11-30 John 641 > 15 2008-11-30 John 641 > ..................etc > What I want to do is to find out how many unique users logged on > each day, and how many individual machines where accessed per day. > In the above example, therefore on 2008-11-25 there were 5 separate > users accessing 4 machines, on 2008-11-26 there were 2 unique users > who used 2 machines (although both logged on more than once). > I've got as far as apply(table.users, 2, FUN=table) which gives me > an output of date, or username or machine and how many times they > were accessed, but not really what I want. > Any help appreciatedYou were almost there. Just use lapply on the list object you produced: > lapply(apply(table.users, 2, FUN=table), length) $Date [1] 3 $UserName [1] 5 $Machine [1] 4 Or if you want the individual items that you requested: > lapply(apply(table.users, 2, FUN=table), length)$UserName [1] 5 > lapply(apply(table.users, 2, FUN=table), length)$Machine [1] 4 -- David Winsemius, MD Heritage Laboratories West Hartford, CT
On Fri, 13 Nov 2009 11:03:31 +0000 (GMT) Jabez Wilson <jabezwuk at yahoo.co.uk> wrote:> What I want to do is to find out how many unique users logged > on each day, and how many individual machines where accessed per day.Use the 'plyr' package: library(plyr) ddply(table.users, .(Date), summarise, users=length(unique(Username)), machines=length(unique(Machine))) -- Karl Ove Hufthammer
Thanks, that's helpful because I can see the individuals and how many times they accessed: The 'plyr' solution of Karl Ove Hufthammer gives me the exact summary statistics that I'm looking for. Jab --- On Fri, 13/11/09, markleeds@verizon.net <markleeds@verizon.net> wrote: From: markleeds@verizon.net <markleeds@verizon.net> Subject: Re: Re: [R] processing log file To: jabezwuk@yahoo.co.uk Date: Friday, 13 November, 2009, 16:36 Hi: I think below does what you want but it doesn't come out formatted very nicely. Maybe someone can show you the formatting ? Good luck. table.users <- read.table(textConnection("Date UserName Machine 2008-11-25 John 641 2008-11-25 Clive 611 2008-11-25 Jeremy 641 2008-11-25 Walt 722 2008-11-25 Tony 645 2008-11-26 Tony 645 2008-11-26 Tony 641 2008-11-26 Tony 641 2008-11-26 Walt 641 2008-11-26 Walt 645 2008-11-30 John 641 2008-11-30 Clive 611 2008-11-30 Tony 641 2008-11-30 John 641 2008-11-30 John 641"),header=TRUE,as.is=TRUE) print(table.users) print(str(table.users)) lapply(split(table.users,table.users$Date),function(.df) { table(.df$Machine) }) lapply(split(table.users,table.users$Date),function(.df) { table(.df$UserName) }) On Nov 13, 2009, Karl Ove Hufthammer <karl@huftis.org> wrote: On Fri, 13 Nov 2009 11:03:31 +0000 (GMT) Jabez Wilson <jabezwuk@yahoo.co.uk> wrote:> What I want to do is to find out how many unique users logged > on each day, and how many individual machines where accessed per day.Use the 'plyr' package: library(plyr) ddply(table.users, .(Date), summarise, users=length(unique(Username)), machines=length(unique(Machine))) -- Karl Ove Hufthammer ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
You can use aggregate to get this too: aggregate(table.users[,c('UserName', 'Machine')], table.users['Date'], function(x)length(unique(x))) On Fri, Nov 13, 2009 at 3:01 PM, Jabez Wilson <jabezwuk at yahoo.co.uk> wrote:> Thanks, that's helpful because I can see the individuals and how many times they accessed: > The 'plyr' solution of Karl Ove Hufthammer gives me the exact summary statistics that I'm looking for. > > Jab > > --- On Fri, 13/11/09, markleeds at verizon.net <markleeds at verizon.net> wrote: > > > From: markleeds at verizon.net <markleeds at verizon.net> > Subject: Re: Re: [R] processing log file > To: jabezwuk at yahoo.co.uk > Date: Friday, 13 November, 2009, 16:36 > > > Hi: I think below does what you want but it doesn't come out formatted very nicely. Maybe someone can show you > the formatting ? Good luck. > > table.users <- read.table(textConnection("Date UserName Machine > 2008-11-25???? John???? 641 > 2008-11-25??? Clive???? 611 > 2008-11-25?? Jeremy???? 641 > 2008-11-25???? Walt???? 722 > 2008-11-25???? Tony???? 645 > 2008-11-26???? Tony???? 645 > 2008-11-26???? Tony???? 641 > 2008-11-26???? Tony???? 641 > 2008-11-26???? Walt???? 641 > 2008-11-26???? Walt???? 645 > 2008-11-30???? John???? 641 > 2008-11-30??? Clive???? 611 > 2008-11-30???? Tony???? 641 > 2008-11-30???? John???? 641 > 2008-11-30???? John???? 641"),header=TRUE,as.is=TRUE) > > print(table.users) > print(str(table.users)) > > lapply(split(table.users,table.users$Date),function(.df) { > ??? table(.df$Machine) > }) > > lapply(split(table.users,table.users$Date),function(.df) { > ??? table(.df$UserName) > }) > > > > > > > On Nov 13, 2009, Karl Ove Hufthammer <karl at huftis.org> wrote: > > On Fri, 13 Nov 2009 11:03:31 +0000 (GMT) Jabez Wilson > <jabezwuk at yahoo.co.uk> wrote: >> What I want to do is to find out how many unique users logged >> on each day, and how many individual machines where accessed per day. > > Use the 'plyr' package: > > library(plyr) > ddply(table.users, .(Date), summarise, > users=length(unique(Username)), > machines=length(unique(Machine))) > > -- > Karl Ove Hufthammer > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O