I am not sure exactly what your data represents. For example, from
looking at the data it appears that user1 and user2 have been logged
on for about 4 days; is that what the data is saying? If you are
keeping track of users, why not write out a file that has the
start/end time for each user's session. The first time you see them,
put an entry in a table and as soon as they don't show up in your
sample, write out a record for them. With that information is it easy
to create a report of the number of unique people over time.
On Tue, Jan 11, 2011 at 10:47 AM, Jason Edgecombe
<jason at rampaginggeek.com> wrote:> Hello,
>
> I have logging information for multiple machines, which I am trying to
> summarize and graph. So far, I process each host individually, but I would
> like to summarize the user count across multiple hosts. I want to answer
the
> question "how many unique users logged in on a certain day across a
group of
> machines"?
>
> I'm not quite sure how to scale the data frame and analysis to
summarize
> multiple hosts, though. I'm still getting a feel for using R.
>
> Here is a snippet of data for one host. the user_count column is generated
> from the users column using my custom function "usercount()". the
samples
> are taken roughly once per minute and only unique samples are recorded.
> (i.e. use na.locf() to uncompress the data). Samples may occur twice in the
> same minute and are rarely aligned on the same time.
>
> Here is the original data before I turn t into a zoo series and run
> na.locf() over it so I can aggregate a single host by day. I'm open to
a
> better way.
>> foo
> ? ? ? ? ? ? ? ? ?users ? ? ? ? ? ?datetime user_count
> 1 ? ? ? ? user1 & user2 2007-03-29 19:16:30 ? ? ? ? ?2
> 2 ? ? ? ? user1 & user2 2007-03-31 00:04:46 ? ? ? ? ?2
> 3 ? ? ? ? user1 & user2 2007-04-02 11:49:20 ? ? ? ? ?2
> 4 ? ? ? ? user1 & user2 2007-04-02 12:02:04 ? ? ? ? ?2
> 5 ? ? ? ? user1 & user2 2007-04-02 12:44:02 ? ? ? ? ?2
> 6 user1 & user2 & user3 2007-04-02 16:34:05 ? ? ? ? ?3
>
>> dput(foo)
> structure(list(users = c("user1 & user2", "user1 &
user2", "user1 & user2",
> "user1 & user2", "user1 & user2", "user1
& user2 & user3"), datetime > structure(c(1175210190,
> 1175313886, 1175528960, 1175529724, 1175532242, 1175546045), class >
c("POSIXt",
> "POSIXct"), tzone = "US/Eastern"), user_count = c(2, 2,
2, 2,
> 2, 3)), .Names = c("users", "datetime",
"user_count"), row.names = c(NA,
> 6L), class = "data.frame")
>
>
> Thanks,
> Jason
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?