thr3ads.net - R help - [R] How can I rearange my dataframe [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Alex Levitchi

2010-Feb-09 16:24 UTC

[R] How can I rearange my dataframe

An embedded and charset-unspecified text was scrubbed...
Name: ???????????
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20100209/db12d14b/attachment.pl>

jim holtman

2010-Feb-09 17:02 UTC

head link

[R] How can I rearange my dataframe

try this:
> x <- read.table(textConnection("name nicknames value+ 1 A A1 4
+ 2 B B1 5
+ 3 C C1 9
+ 4 B B2 2
+ 5 C C2 7
+ 6 C C3 6
+ 7 C C4 3
+ 8 B B3 6
+ 9 C C5 7"), header=TRUE)> closeAllConnections()
> result <- do.call(rbind, lapply(split(x, x$name), function(.name){+     data.frame(name=.name$name[1], nicknames=paste(.name$nicknames,
collapse=','),
+         mean=mean(.name$value))
+ }))>
> result  name      nicknames     mean
A    A             A1 4.000000
B    B       B1,B2,B3 4.333333
C    C C1,C2,C3,C4,C5 6.400000>

On Tue, Feb 9, 2010 at 11:24 AM, Alex Levitchi <alex.levitchi at
cbm.fvg.it> wrote:> Hello
> I am recently began to work with R, so I am not so experienced.
> But anyway I cannot find a clear way to process my dataframe which is a
bigger one.
> It shows similar to this
>
>>
name=c("A","B","C","B","C","C","C","B","C")
>>
nicknames=c("A1","B1","C1","B2","C2","C3","C4","B3","C5")
>> value=c(4,5,9,2,7,6,3,6,7)
>> table=data.frame(cbind(name,nickname,value))
>> table=data.frame(cbind(name,nicknames,value))
>> table
> name nicknames value
> 1 A A1 4
> 2 B B1 5
> 3 C C1 9
> 4 B B2 2
> 5 C C2 7
> 6 C C3 6
> 7 C C4 3
> 8 B B3 6
> 9 C C5 7
>
> So I have to rearrange it in the next way:
> - the first column should contain just unduplicated data, I did this, it is
OK and it will look like
> 1 A
> 2 B
> 3 C
>
> - the second column should contain different 'nicknames' which
correspond to the single A, B or C
> name nickname value
> 1 A A1
> 2 B B1,B2,B3
> 3 C C1,C2,C3,C4,C5
>
> -the third one should contain the mean value of the numbers which
correspond to the same A, B or C
> 1 A A1 mean(4)
> 2 B B1,B2,B3 mean(5,2,6)
> 3 C C1,C2,C3,C4,C5 mean(9,7,6,3,7)
>
> I did this using a loop 'for'.
> to be clear I created tree dataframes which correspond to each of columns,
and finally will combine them
>
>> ulist=which(!duplicated(table$name)) # I extract the list of positions
in which I don't have duplications
>> name1=data.frame(table$name[ulist]) # I extract the list of unique
names
>> nicknames1=data.frame(row.names(1:length(ulist))) # I create a
dataframe of dimension equal to unique list length
>> value1=data.frame(row.names(1:length(ulist))) # I create a dataframe of
dimension equal to unique list length
>
>> for(i in 1:length(ulist)) {
> position=which(as.character(name1[i,1])==table$name)
> nicknames1[i,1]=toString(table$nicknames[position])
> value1[i,1]=mean(as.numeric(table$value[position]))
> }
>> fin=cbind(name1,nicknames1,value1)
>>
colnames(fin)=c("NAME","NICKNAME","VALUE")
>> fin
> NAME NICKNAME VALUE
> 1 A A1 3.000000
> 2 B B1, B2, B3 3.333333
> 3 C C1, C2, C3, C4, C5 5.200000
>
> it works successfully. But in general I work with dataframes of high
dimensions (tens thousands or more rows).
> So my loop works too slow (i.e., a dataframe of 20000 rows and 3 columns is
processed in about 10 minutes).
> I intend to integrate it into a function, so it is obvious that time will
be even longer.
>
> If someone can advise me any possibility to modify which I have done or to
the way I can do it, please give me a message.
>
> King regards to all guys who develop and maintain R sources for such
dummies as me
> Alex Levitchi
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

David Winsemius

2010-Feb-09 17:09 UTC

head link

[R] How can I rearange my dataframe

On Feb 9, 2010, at 11:24 AM, Alex Levitchi wrote:
> Hello
> I am recently began to work with R, so I am not so experienced.
> But anyway I cannot find a clear way to process my dataframe which  
> is a bigger one.
> It shows similar to this
>
>>
name=c("A","B","C","B","C","C","C","B","C")
>>
nicknames=c("A1","B1","C1","B2","C2","C3","C4","B3","C5")
>> value=c(4,5,9,2,7,6,3,6,7)
>> table=data.frame(cbind(name,nickname,value))
>> table=data.frame(cbind(name,nicknames,value))
>> table
> name nicknames value
> 1 A A1 4
> 2 B B1 5
> 3 C C1 9
> 4 B B2 2
> 5 C C2 7
> 6 C C3 6
> 7 C C4 3
> 8 B B3 6
> 9 C C5 7
>
> So I have to rearrange it in the next way:
> - the first column should contain just unduplicated data, I did  
> this, it is OK and it will look like
> 1 A
> 2 B
> 3 C
>
> - the second column should contain different 'nicknames' which  
> correspond to the single A, B or C
> name nickname value
> 1 A A1
> 2 B B1,B2,B3
> 3 C C1,C2,C3,C4,C5
Dataframes are not designed to hold irregular length items. Lists are  
the data structure best suited for this type of data. tapply() is one  
function useful for colecting elements of one structure based on the  
contents of another ("name"):

(I renamed your table object "table1" to avoid confusion with the  
table function.)

 > tapply(table1$nicknames, table1$name, list)
$A
[1] A1
Levels: A1 B1 B2 B3 C1 C2 C3 C4 C5

$B
[1] B1 B2 B3
Levels: A1 B1 B2 B3 C1 C2 C3 C4 C5

$C
[1] C1 C2 C3 C4 C5
Levels: A1 B1 B2 B3 C1 C2 C3 C4 C5

The process of tabulating has created factor variables which some  
would see as a good thing, but perhaps was not desired. Since you now  
have a lis,  you can sequentially apply the as.character function to  
recover only the character vectors:

 >lapply( tapply(table1$nicknames, table1$name, list), as.character)
$A
[1] "A1"

$B
[1] "B1" "B2" "B3"

$C
[1] "C1" "C2" "C3" "C4" "C5"

Then I saw the rest of your request, so forget the above and see if  
this two-liner looks a bit more simple.

 > tcollapse <- tapply(table1$nicknames, table1$name,  paste,   
collapse=", ")
#gets you the strings separated by commas and spaces.

 > cbind(names(tcollapse), tcollapse, lapply( tapply(table1$nicknames,  
table1$name, list), length)  )
       tcollapse
A "A" "A1"                 1
B "B" "B1, B2, B3"         3
C "C" "C1, C2, C3, C4, C5" 5

You can obviously name them whatever you like.

-- 
David>
> -the third one should contain the mean value of the numbers which  
> correspond to the same A, B or C
> 1 A A1 mean(4)
> 2 B B1,B2,B3 mean(5,2,6)
> 3 C C1,C2,C3,C4,C5 mean(9,7,6,3,7)
>
> I did this using a loop 'for'.
> to be clear I created tree dataframes which correspond to each of  
> columns, and finally will combine them
>
>> ulist=which(!duplicated(table$name)) # I extract the list of  
>> positions in which I don't have duplications
>> name1=data.frame(table$name[ulist]) # I extract the list of unique  
>> names
>> nicknames1=data.frame(row.names(1:length(ulist))) # I create a  
>> dataframe of dimension equal to unique list length
>> value1=data.frame(row.names(1:length(ulist))) # I create a  
>> dataframe of dimension equal to unique list length
>
>> for(i in 1:length(ulist)) {
> position=which(as.character(name1[i,1])==table$name)
> nicknames1[i,1]=toString(table$nicknames[position])
> value1[i,1]=mean(as.numeric(table$value[position]))
> }
>> fin=cbind(name1,nicknames1,value1)
>>
colnames(fin)=c("NAME","NICKNAME","VALUE")
>> fin
> NAME NICKNAME VALUE
> 1 A A1 3.000000
> 2 B B1, B2, B3 3.333333
> 3 C C1, C2, C3, C4, C5 5.200000
>
> it works successfully. But in general I work with dataframes of high  
> dimensions (tens thousands or more rows).
> So my loop works too slow (i.e., a dataframe of 20000 rows and 3  
> columns is processed in about 10 minutes).
> I intend to integrate it into a function, so it is obvious that time  
> will be even longer.
>
> If someone can advise me any possibility to modify which I have done  
> or to the way I can do it, please give me a message.
>
> King regards to all guys who develop and maintain R sources for such  
> dummies as me
> Alex Levitchi
>
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Reasonably Related Threads

Search for more reasonably related threads

R help - Feb 2010 - How can I rearange my dataframe

[R] How can I rearange my dataframe

[R] How can I rearange my dataframe

[R] How can I rearange my dataframe

Reasonably Related Threads