Thanks for your reply. I was trying to simplify it a little, but must
have got it wrong. Here is the real dataframe, TF2list:
?str(TF2list)
'data.frame':??? 152 obs. of? 2 variables:
?$ Regulator: Factor w/ 87 levels
"AT1G02065","AT1G13960",..: 17 6 6 54
54 82 82 82 82 82 ...
?$ hits???? : Factor w/ 97 levels
"AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
__truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
?? And the first few lines resulting from dput(head(TF2list)):
dput(head(TF2list))
structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
82L), .Label = c("AT1G02065", "AT1G13960",
"AT1G18860", "AT1G23380",
"AT1G29280", "AT1G29860", "AT1G30650",
"AT1G55600", "AT1G62300",
"AT1G62990", "AT1G64000", "AT1G66550",
"AT1G66560", "AT1G66600",
"AT1G68150", "AT1G69310", "AT1G69490",
"AT1G69810", "AT1G70510", ...
This is another way of looking at the first 4 entries (Regulator is
tab-separated from hits):
Regulator
? hits
1
AT1G69490
?AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
2
AT1G29860
?AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
3
AT1G2986
?AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830
?? So, the goal would be to
first: Transpose the existing dataframe so that the factor Regulator
becomes a column name (column 1 name = AT1G69490, column2 name
AT1G29860, etc.) and the hits associated with each Regulator become
rows. Hits is a comma separated 'list' ( I do not not know if
technically it is an R list.), so it would have to be comma
'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950,
col 1 row 2 - AT5G24410, etc); like this :
AT1G69490
AT4G31950
AT5G24110
AT1G05675
AT5G64905
... I did not include all the rows)
I think it would be best to actually make the first entry a separate
dataframe ( 1 column with name = AT1G69490 and number of rows depending
on the number of hits), then make the second column (column name =
AT1G29860, and number of rows depending on the number of hits) into a
new dataframe and do a full join of of the two dataframes; continue by
making the third column (column name = AT1G2986) into a dataframe and
full join it with the previous; continue for the 152 observations so
that then end result is a dataframe with 152 columns and number of rows
depending on the entry with the greatest number of hits. The full joins
I can do with dplyr, but getting up to that point seems rather difficult.
This would get me what my ultimate goal would be; each Regulator is a
column name (152 columns) and a given row has either NA or the same hit.
?? This seems very difficult to me, but I appreciate any attempt.
Matthew
On 4/30/2019 4:34 PM, David L Carlson wrote:> External Email - Use Caution
>
> I think we need more information. Can you give us the structure of the data
with str(YourDataFrame). Alternatively you could copy a small piece into your
email message by copying and pasting the results of the following code:
>
> dput(head(YourDataFrame))
>
> The data frame you present could not be a data frame since you say
"hits" is a factor with a variable number of elements. If each value
of "hits" was a single character string, it would only have 2 factor
levels not 6 and your efforts to parse the string would make more sense.
Transposing to a data frame would only be possible if each column was padded
with NAs to make them equal in length. Since your example tries use the name
TF2list, it is possible that you do not have a data frame but a list and you
have no factor levels, just character vectors.
>
> If you are not familiar with R, it may be helpful to tell us what your
overall goal is rather than an intermediate step. Very likely R can easily
handle what you want by doing things a different way.
>
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>
> -----Original Message-----
> From: R-help<r-help-bounces at r-project.org> On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 2:25 PM
> To: r-help (r-help at r-project.org)<r-help at r-project.org>
> Subject: [R] transpose and split dataframe
>
> I have a data frame that is a lot bigger but for simplicity sake we can
> say it looks like this:
>
> Regulator??? hits
> AT1G69490??? AT4G31950,AT5G24110,AT1G26380,AT1G05675
> AT2G55980??? AT2G85403,AT4G89223
>
> ?? In other words:
>
> data.frame : 2 obs. of 2 variables
> $Regulator: Factor w/ 2 levels
> $hits???????? : Factor w/ 6 levels
>
> ? I want to transpose it so that Regulator is now the column headings
> and each of the AGI numbers now separated by commas is a row. So,
> AT1G69490 is now the header of the first column and AT4G31950 is row 1
> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> column 2 and AT2G85403 is row 1 of column 2, etc.
>
> ? I have tried playing around with strsplit(TF2list[2:2]) and
> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>
> Matthew
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
If you read the data frame with read.csv() or one of the other read() functions,
use the asis=TRUE argument to prevent conversion to factors. If not do the
conversion first:
# Convert factors to characters
DataMatrix <- sapply(TF2list, as.character)
# Split the vector of hits
DataList <- sapply(DataMatrix[, 2], strsplit, split=",")
# Use the values in Regulator to name the parts of the list
names(DataList) <- DataMatrix[,"Regulator"]
# Now create a data frame
# How long is the longest list of hits?
mx <- max(sapply(DataList, length))
# Now add NAs to vectors shorter than mx
DataList2 <- lapply(DataList, function(x) c(x, rep(NA, mx-length(x))))
# Finally convert back to a data frame
TF2list2 <- do.call(data.frame, DataList2)
Try this on a portion of the list, say 25 lines and print each object to see
what is happening.
----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Matthew
Sent: Tuesday, April 30, 2019 4:31 PM
To: r-help at r-project.org
Subject: [R] Fwd: Re: transpose and split dataframe
Thanks for your reply. I was trying to simplify it a little, but must
have got it wrong. Here is the real dataframe, TF2list:
?str(TF2list)
'data.frame':??? 152 obs. of? 2 variables:
?$ Regulator: Factor w/ 87 levels
"AT1G02065","AT1G13960",..: 17 6 6 54
54 82 82 82 82 82 ...
?$ hits???? : Factor w/ 97 levels
"AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
__truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
?? And the first few lines resulting from dput(head(TF2list)):
dput(head(TF2list))
structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
82L), .Label = c("AT1G02065", "AT1G13960",
"AT1G18860", "AT1G23380",
"AT1G29280", "AT1G29860", "AT1G30650",
"AT1G55600", "AT1G62300",
"AT1G62990", "AT1G64000", "AT1G66550",
"AT1G66560", "AT1G66600",
"AT1G68150", "AT1G69310", "AT1G69490",
"AT1G69810", "AT1G70510", ...
This is another way of looking at the first 4 entries (Regulator is
tab-separated from hits):
Regulator
? hits
1
AT1G69490
?AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
2
AT1G29860
?AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
3
AT1G2986
?AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830
?? So, the goal would be to
first: Transpose the existing dataframe so that the factor Regulator
becomes a column name (column 1 name = AT1G69490, column2 name
AT1G29860, etc.) and the hits associated with each Regulator become
rows. Hits is a comma separated 'list' ( I do not not know if
technically it is an R list.), so it would have to be comma
'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950,
col 1 row 2 - AT5G24410, etc); like this :
AT1G69490
AT4G31950
AT5G24110
AT1G05675
AT5G64905
... I did not include all the rows)
I think it would be best to actually make the first entry a separate
dataframe ( 1 column with name = AT1G69490 and number of rows depending
on the number of hits), then make the second column (column name =
AT1G29860, and number of rows depending on the number of hits) into a
new dataframe and do a full join of of the two dataframes; continue by
making the third column (column name = AT1G2986) into a dataframe and
full join it with the previous; continue for the 152 observations so
that then end result is a dataframe with 152 columns and number of rows
depending on the entry with the greatest number of hits. The full joins
I can do with dplyr, but getting up to that point seems rather difficult.
This would get me what my ultimate goal would be; each Regulator is a
column name (152 columns) and a given row has either NA or the same hit.
?? This seems very difficult to me, but I appreciate any attempt.
Matthew
On 4/30/2019 4:34 PM, David L Carlson wrote:> External Email - Use Caution
>
> I think we need more information. Can you give us the structure of the data
with str(YourDataFrame). Alternatively you could copy a small piece into your
email message by copying and pasting the results of the following code:
>
> dput(head(YourDataFrame))
>
> The data frame you present could not be a data frame since you say
"hits" is a factor with a variable number of elements. If each value
of "hits" was a single character string, it would only have 2 factor
levels not 6 and your efforts to parse the string would make more sense.
Transposing to a data frame would only be possible if each column was padded
with NAs to make them equal in length. Since your example tries use the name
TF2list, it is possible that you do not have a data frame but a list and you
have no factor levels, just character vectors.
>
> If you are not familiar with R, it may be helpful to tell us what your
overall goal is rather than an intermediate step. Very likely R can easily
handle what you want by doing things a different way.
>
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>
> -----Original Message-----
> From: R-help<r-help-bounces at r-project.org> On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 2:25 PM
> To: r-help (r-help at r-project.org)<r-help at r-project.org>
> Subject: [R] transpose and split dataframe
>
> I have a data frame that is a lot bigger but for simplicity sake we can
> say it looks like this:
>
> Regulator??? hits
> AT1G69490??? AT4G31950,AT5G24110,AT1G26380,AT1G05675
> AT2G55980??? AT2G85403,AT4G89223
>
> ?? In other words:
>
> data.frame : 2 obs. of 2 variables
> $Regulator: Factor w/ 2 levels
> $hits???????? : Factor w/ 6 levels
>
> ? I want to transpose it so that Regulator is now the column headings
> and each of the AGI numbers now separated by commas is a row. So,
> AT1G69490 is now the header of the first column and AT4G31950 is row 1
> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> column 2 and AT2G85403 is row 1 of column 2, etc.
>
> ? I have tried playing around with strsplit(TF2list[2:2]) and
> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>
> Matthew
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thank you very much, David and Jim for your work and solutions. I have been working through both of them to better learn R. They both proceed through a similar logic except David's starts with a character matrix and Jim's with a dataframe, and both end with equivalent dataframes (? identical(tmmdf, TF2list2)) returns TRUE? ). They have both been very helpful. However, there is one attribute of my intended final dataframe that is missing. Looking at part of the final dataframe: ?head(tmmdf) ? AT1G69490 AT1G29860 AT1G29860.1 AT4G18170 AT4G18170.1 AT5G46350 1 *AT4G31950* *AT4G31950*?? AT5G64905 *AT4G31950* AT5G64905 *AT4G31950* 2 AT5G24110 AT5G24110?? AT1G21120 AT5G24110?? AT1G14540 AT5G24110 3 AT1G26380 AT1G05675?? AT1G07160 AT1G05675?? AT1G21120 AT1G05675 Row 1 has *AT4G31950* in columns 1,2,4 and 6, but AT4G31950 in columns 3 and 5. What I was aiming at would be that each row would have a unique entry so that AT4G31950 is row 1 columns 1,2,4 and 6, and NA is row 1 columns 3 and 5. AT4G31950 is row 2 columns 3 and 5 and NA is row 2 columns 1,2,4 and 6. So, it would look like this: ?head(intended_df) ? AT1G69490 AT1G29860 AT1G29860.1 AT4G18170 AT4G18170.1 AT5G46350 1 AT4G31950 AT4G31950???? NA ? ? ? ? ? ? ?? AT4G31950? ? ?? NA ? ? ? ? AT4G31950 2 ? ?? NA ? ? ? ? ? ? ?? NA ? ? ? ? ? AT4G31950 ? ? ? NA ? ? ? ? ?? AT4G31950 ? ?? NA I have been trying to adjust the code to get my intended result basically by trying to build a dataframe one column at a time from each entry in the character matrix, but have not got anything near working yet. Matthew On 4/30/2019 6:29 PM, David L Carlson wrote> If you read the data frame with read.csv() or one of the other read() functions, use the asis=TRUE argument to prevent conversion to factors. If not do the conversion first: > > # Convert factors to characters > DataMatrix <- sapply(TF2list, as.character) > # Split the vector of hits > DataList <- sapply(DataMatrix[, 2], strsplit, split=",") > # Use the values in Regulator to name the parts of the list > names(DataList) <- DataMatrix[,"Regulator"] > > # Now create a data frame > # How long is the longest list of hits? > mx <- max(sapply(DataList, length)) > # Now add NAs to vectors shorter than mx > DataList2 <- lapply(DataList, function(x) c(x, rep(NA, mx-length(x)))) > # Finally convert back to a data frame > TF2list2 <- do.call(data.frame, DataList2) > > Try this on a portion of the list, say 25 lines and print each object to see what is happening. > > ---------------------------------------- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77843-4352 > > > > > > -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Matthew > Sent: Tuesday, April 30, 2019 4:31 PM > To: r-help at r-project.org > Subject: [R] Fwd: Re: transpose and split dataframe > > Thanks for your reply. I was trying to simplify it a little, but must > have got it wrong. Here is the real dataframe, TF2list: > > ?str(TF2list) > 'data.frame':??? 152 obs. of? 2 variables: > ?$ Regulator: Factor w/ 87 levels "AT1G02065","AT1G13960",..: 17 6 6 54 > 54 82 82 82 82 82 ... > ?$ hits???? : Factor w/ 97 levels > "AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"| > __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ... > > ?? And the first few lines resulting from dput(head(TF2list)): > > dput(head(TF2list)) > structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L, > 82L), .Label = c("AT1G02065", "AT1G13960", "AT1G18860", "AT1G23380", > "AT1G29280", "AT1G29860", "AT1G30650", "AT1G55600", "AT1G62300", > "AT1G62990", "AT1G64000", "AT1G66550", "AT1G66560", "AT1G66600", > "AT1G68150", "AT1G69310", "AT1G69490", "AT1G69810", "AT1G70510", ... > > This is another way of looking at the first 4 entries (Regulator is > tab-separated from hits): > > Regulator > ? hits > 1 > AT1G69490 > ?AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830 > 2 > AT1G29860 > ?AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135 > > 3 > AT1G2986 > ?AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830 > > ?? So, the goal would be to > > first: Transpose the existing dataframe so that the factor Regulator > becomes a column name (column 1 name = AT1G69490, column2 name > AT1G29860, etc.) and the hits associated with each Regulator become > rows. Hits is a comma separated 'list' ( I do not not know if > technically it is an R list.), so it would have to be comma > 'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950, > col 1 row 2 - AT5G24410, etc); like this : > > AT1G69490 > AT4G31950 > AT5G24110 > AT1G05675 > AT5G64905 > > ... I did not include all the rows) > > I think it would be best to actually make the first entry a separate > dataframe ( 1 column with name = AT1G69490 and number of rows depending > on the number of hits), then make the second column (column name > AT1G29860, and number of rows depending on the number of hits) into a > new dataframe and do a full join of of the two dataframes; continue by > making the third column (column name = AT1G2986) into a dataframe and > full join it with the previous; continue for the 152 observations so > that then end result is a dataframe with 152 columns and number of rows > depending on the entry with the greatest number of hits. The full joins > I can do with dplyr, but getting up to that point seems rather difficult. > > This would get me what my ultimate goal would be; each Regulator is a > column name (152 columns) and a given row has either NA or the same hit. > > ?? This seems very difficult to me, but I appreciate any attempt. > > Matthew > > On 4/30/2019 4:34 PM, David L Carlson wrote: >> External Email - Use Caution >> >> I think we need more information. Can you give us the structure of the data with str(YourDataFrame). Alternatively you could copy a small piece into your email message by copying and pasting the results of the following code: >> >> dput(head(YourDataFrame)) >> >> The data frame you present could not be a data frame since you say "hits" is a factor with a variable number of elements. If each value of "hits" was a single character string, it would only have 2 factor levels not 6 and your efforts to parse the string would make more sense. Transposing to a data frame would only be possible if each column was padded with NAs to make them equal in length. Since your example tries use the name TF2list, it is possible that you do not have a data frame but a list and you have no factor levels, just character vectors. >> >> If you are not familiar with R, it may be helpful to tell us what your overall goal is rather than an intermediate step. Very likely R can easily handle what you want by doing things a different way. >> >> ---------------------------------------- >> David L Carlson >> Department of Anthropology >> Texas A&M University >> College Station, TX 77843-4352 >> >> >> >> -----Original Message----- >> From: R-help<r-help-bounces at r-project.org> On Behalf Of Matthew >> Sent: Tuesday, April 30, 2019 2:25 PM >> To: r-help (r-help at r-project.org)<r-help at r-project.org> >> Subject: [R] transpose and split dataframe >> >> I have a data frame that is a lot bigger but for simplicity sake we can >> say it looks like this: >> >> Regulator??? hits >> AT1G69490??? AT4G31950,AT5G24110,AT1G26380,AT1G05675 >> AT2G55980??? AT2G85403,AT4G89223 >> >> ?? In other words: >> >> data.frame : 2 obs. of 2 variables >> $Regulator: Factor w/ 2 levels >> $hits???????? : Factor w/ 6 levels >> >> ? I want to transpose it so that Regulator is now the column headings >> and each of the AGI numbers now separated by commas is a row. So, >> AT1G69490 is now the header of the first column and AT4G31950 is row 1 >> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of >> column 2 and AT2G85403 is row 1 of column 2, etc. >> >> ? I have tried playing around with strsplit(TF2list[2:2]) and >> strsplit(as.character(TF2list[2:2]), but I am getting nowhere. >> >> Matthew >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
We still have only the toy version of your data from your first email. The
second email used dput() as I suggested, but you truncated the results so it is
useless for testing purposes.
Use the following code after creating DataList (up to mx <- ... ) in my
earlier answer:
n <- sapply(DataList, length)
hits <- unname(unlist(DataList))
Regulator <- unname(unlist(mapply(rep, names(DataList), times=n)))
DataTable <- table(hits, Regulator)
# Regulator
# hits AT1G69490 AT2G55980
# AT1G05675 1 0
# AT1G26380 1 0
# AT2G85403 0 1
# AT4G31950 1 0
# AT4G89223 0 1
# AT5G24110 1 0
Now the Regulators and the hits will be listed in alphabetical order. The table
has 0's for Regulators that do not have a particular hit. If you want NAs:
DataTable[DataTable==0] <- NA
print(DataTable, na.print="NA")
# Regulator
# hits AT1G69490 AT2G55980
# AT1G05675 1 NA
# AT1G26380 1 NA
# AT2G85403 NA 1
# AT4G31950 1 NA
# AT4G89223 NA 1
# AT5G24110 1 NA
If you need a data frame instead of a table:
as.data.frame.matrix(DataTable)
----------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77843-4352
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Matthew
Sent: Tuesday, April 30, 2019 4:31 PM
To: r-help at r-project.org
Subject: [R] Fwd: Re: transpose and split dataframe
Thanks for your reply. I was trying to simplify it a little, but must
have got it wrong. Here is the real dataframe, TF2list:
?str(TF2list)
'data.frame':??? 152 obs. of? 2 variables:
?$ Regulator: Factor w/ 87 levels
"AT1G02065","AT1G13960",..: 17 6 6 54
54 82 82 82 82 82 ...
?$ hits???? : Factor w/ 97 levels
"AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
__truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
?? And the first few lines resulting from dput(head(TF2list)):
dput(head(TF2list))
structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
82L), .Label = c("AT1G02065", "AT1G13960",
"AT1G18860", "AT1G23380",
"AT1G29280", "AT1G29860", "AT1G30650",
"AT1G55600", "AT1G62300",
"AT1G62990", "AT1G64000", "AT1G66550",
"AT1G66560", "AT1G66600",
"AT1G68150", "AT1G69310", "AT1G69490",
"AT1G69810", "AT1G70510", ...
This is another way of looking at the first 4 entries (Regulator is
tab-separated from hits):
Regulator
? hits
1
AT1G69490
?AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
2
AT1G29860
?AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
3
AT1G2986
?AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830
?? So, the goal would be to
first: Transpose the existing dataframe so that the factor Regulator
becomes a column name (column 1 name = AT1G69490, column2 name
AT1G29860, etc.) and the hits associated with each Regulator become
rows. Hits is a comma separated 'list' ( I do not not know if
technically it is an R list.), so it would have to be comma
'unseparated' with each entry becoming a row (col 1 row 1 = AT4G31950,
col 1 row 2 - AT5G24410, etc); like this :
AT1G69490
AT4G31950
AT5G24110
AT1G05675
AT5G64905
... I did not include all the rows)
I think it would be best to actually make the first entry a separate
dataframe ( 1 column with name = AT1G69490 and number of rows depending
on the number of hits), then make the second column (column name =
AT1G29860, and number of rows depending on the number of hits) into a
new dataframe and do a full join of of the two dataframes; continue by
making the third column (column name = AT1G2986) into a dataframe and
full join it with the previous; continue for the 152 observations so
that then end result is a dataframe with 152 columns and number of rows
depending on the entry with the greatest number of hits. The full joins
I can do with dplyr, but getting up to that point seems rather difficult.
This would get me what my ultimate goal would be; each Regulator is a
column name (152 columns) and a given row has either NA or the same hit.
?? This seems very difficult to me, but I appreciate any attempt.
Matthew
On 4/30/2019 4:34 PM, David L Carlson wrote:> External Email - Use Caution
>
> I think we need more information. Can you give us the structure of the data
with str(YourDataFrame). Alternatively you could copy a small piece into your
email message by copying and pasting the results of the following code:
>
> dput(head(YourDataFrame))
>
> The data frame you present could not be a data frame since you say
"hits" is a factor with a variable number of elements. If each value
of "hits" was a single character string, it would only have 2 factor
levels not 6 and your efforts to parse the string would make more sense.
Transposing to a data frame would only be possible if each column was padded
with NAs to make them equal in length. Since your example tries use the name
TF2list, it is possible that you do not have a data frame but a list and you
have no factor levels, just character vectors.
>
> If you are not familiar with R, it may be helpful to tell us what your
overall goal is rather than an intermediate step. Very likely R can easily
handle what you want by doing things a different way.
>
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
>
>
> -----Original Message-----
> From: R-help<r-help-bounces at r-project.org> On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 2:25 PM
> To: r-help (r-help at r-project.org)<r-help at r-project.org>
> Subject: [R] transpose and split dataframe
>
> I have a data frame that is a lot bigger but for simplicity sake we can
> say it looks like this:
>
> Regulator??? hits
> AT1G69490??? AT4G31950,AT5G24110,AT1G26380,AT1G05675
> AT2G55980??? AT2G85403,AT4G89223
>
> ?? In other words:
>
> data.frame : 2 obs. of 2 variables
> $Regulator: Factor w/ 2 levels
> $hits???????? : Factor w/ 6 levels
>
> ? I want to transpose it so that Regulator is now the column headings
> and each of the AGI numbers now separated by commas is a row. So,
> AT1G69490 is now the header of the first column and AT4G31950 is row 1
> of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header of
> column 2 and AT2G85403 is row 1 of column 2, etc.
>
> ? I have tried playing around with strsplit(TF2list[2:2]) and
> strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
>
> Matthew
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hi Matthew,
I'm not sure whether you want something like your initial request or
David's solution. The result of this can be transformed into the
latter:
mmdf<-read.table(text="Regulator hits
AT1G69490
AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
AT1G29860
AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
AT1G2986
AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830",
header=TRUE,stringsAsFactors=FALSE)
# split the second column at the commas
hitsplit<-strsplit(mmdf$hits,",")
# define a function that will fill with NAs
NAfill<-function(x,n) return(x[1:n])
# get the maximum length of hits
maxlen<-max(unlist(lapply(hitsplit,length)))
# fill the list with NAs
hitsplit<-lapply(hitsplit,NAfill,maxlen)
# get all the sorted hits
allhits<-sort(unique(unlist(hitsplit)))
tmmdf<-as.data.frame(matrix(NA,ncol=length(hitsplit),nrow=length(allhits)))
# change the names of the list
names(tmmdf)<-mmdf$Regulator
# replace all NA values in tmmdf where they appear in hitsplit
for(column in 1:length(hitsplit)) {
hitmatches<-match(hitsplit[[column]],allhits)
hitmatches<-hitmatches[!is.na(hitmatches)]
tmmdf[hitmatches,column]<-allhits[hitmatches]
}
Jim
On Fri, May 3, 2019 at 12:43 AM David L Carlson <dcarlson at tamu.edu>
wrote:>
> We still have only the toy version of your data from your first email. The
second email used dput() as I suggested, but you truncated the results so it is
useless for testing purposes.
>
> Use the following code after creating DataList (up to mx <- ... ) in my
earlier answer:
>
> n <- sapply(DataList, length)
> hits <- unname(unlist(DataList))
> Regulator <- unname(unlist(mapply(rep, names(DataList), times=n)))
> DataTable <- table(hits, Regulator)
>
> # Regulator
> # hits AT1G69490 AT2G55980
> # AT1G05675 1 0
> # AT1G26380 1 0
> # AT2G85403 0 1
> # AT4G31950 1 0
> # AT4G89223 0 1
> # AT5G24110 1 0
>
> Now the Regulators and the hits will be listed in alphabetical order. The
table has 0's for Regulators that do not have a particular hit. If you want
NAs:
>
> DataTable[DataTable==0] <- NA
> print(DataTable, na.print="NA")
> # Regulator
> # hits AT1G69490 AT2G55980
> # AT1G05675 1 NA
> # AT1G26380 1 NA
> # AT2G85403 NA 1
> # AT4G31950 1 NA
> # AT4G89223 NA 1
> # AT5G24110 1 NA
>
> If you need a data frame instead of a table:
>
> as.data.frame.matrix(DataTable)
>
> ----------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77843-4352
>
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Matthew
> Sent: Tuesday, April 30, 2019 4:31 PM
> To: r-help at r-project.org
> Subject: [R] Fwd: Re: transpose and split dataframe
>
> Thanks for your reply. I was trying to simplify it a little, but must
> have got it wrong. Here is the real dataframe, TF2list:
>
> str(TF2list)
> 'data.frame': 152 obs. of 2 variables:
> $ Regulator: Factor w/ 87 levels
"AT1G02065","AT1G13960",..: 17 6 6 54
> 54 82 82 82 82 82 ...
> $ hits : Factor w/ 97 levels
>
"AT1G05675,AT3G12910,AT1G22810,AT1G14540,AT1G21120,AT1G07160,AT5G22520,AT1G56250,AT2G31345,AT5G22530,AT4G11170,A"|
> __truncated__,..: 65 57 90 57 87 57 56 91 31 17 ...
>
> And the first few lines resulting from dput(head(TF2list)):
>
> dput(head(TF2list))
> structure(list(Regulator = structure(c(17L, 6L, 6L, 54L, 54L,
> 82L), .Label = c("AT1G02065", "AT1G13960",
"AT1G18860", "AT1G23380",
> "AT1G29280", "AT1G29860", "AT1G30650",
"AT1G55600", "AT1G62300",
> "AT1G62990", "AT1G64000", "AT1G66550",
"AT1G66560", "AT1G66600",
> "AT1G68150", "AT1G69310", "AT1G69490",
"AT1G69810", "AT1G70510", ...
>
> This is another way of looking at the first 4 entries (Regulator is
> tab-separated from hits):
>
> Regulator
> hits
> 1
> AT1G69490
>
AT4G31950,AT5G24110,AT1G26380,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G79680,AT3G02840,AT5G25260,AT5G57220,AT2G37430,AT2G26560,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT5G05300,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT5G52760,AT5G66020,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT2G02010,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT2G40180,AT1G59865,AT4G35180,AT4G15417,AT1G51820,AT1G06135,AT1G36622,AT5G42830
> 2
> AT1G29860
>
AT4G31950,AT5G24110,AT1G05675,AT3G12910,AT5G64905,AT1G22810,AT1G14540,AT1G79680,AT1G07160,AT3G23250,AT5G25260,AT1G53625,AT5G57220,AT2G37430,AT3G54150,AT1G56250,AT3G23230,AT1G16420,AT1G78410,AT4G22030,AT1G69930,AT4G03460,AT4G11470,AT5G25250,AT5G36925,AT4G14450,AT2G30750,AT1G16150,AT1G02930,AT2G19190,AT4G11890,AT1G72520,AT4G31940,AT5G37490,AT4G08555,AT5G66020,AT5G26920,AT3G57460,AT4G23220,AT3G15518,AT2G43620,AT1G35210,AT5G46295,AT1G17147,AT1G11925,AT2G39200,AT1G02920,AT4G35180,AT4G15417,AT1G51820,AT4G40020,AT1G06135
>
> 3
> AT1G2986
>
AT5G64905,AT1G21120,AT1G07160,AT5G25260,AT1G53625,AT1G56250,AT2G31345,AT4G11170,AT1G66090,AT1G26410,AT3G55840,AT1G69930,AT4G03460,AT5G25250,AT5G36925,AT1G26420,AT5G42380,AT1G16150,AT2G22880,AT1G02930,AT4G11890,AT1G72520,AT5G66020,AT2G43620,AT2G44370,AT4G15975,AT1G35210,AT5G46295,AT1G11925,AT2G39200,AT1G02920,AT4G14370,AT4G35180,AT4G15417,AT2G18690,AT5G11140,AT1G06135,AT5G42830
>
> So, the goal would be to
>
> first: Transpose the existing dataframe so that the factor Regulator
> becomes a column name (column 1 name = AT1G69490, column2 name
> AT1G29860, etc.) and the hits associated with each Regulator become
> rows. Hits is a comma separated 'list' ( I do not not know if
> technically it is an R list.), so it would have to be comma
> 'unseparated' with each entry becoming a row (col 1 row 1 =
AT4G31950,
> col 1 row 2 - AT5G24410, etc); like this :
>
> AT1G69490
> AT4G31950
> AT5G24110
> AT1G05675
> AT5G64905
>
> ... I did not include all the rows)
>
> I think it would be best to actually make the first entry a separate
> dataframe ( 1 column with name = AT1G69490 and number of rows depending
> on the number of hits), then make the second column (column name >
AT1G29860, and number of rows depending on the number of hits) into a
> new dataframe and do a full join of of the two dataframes; continue by
> making the third column (column name = AT1G2986) into a dataframe and
> full join it with the previous; continue for the 152 observations so
> that then end result is a dataframe with 152 columns and number of rows
> depending on the entry with the greatest number of hits. The full joins
> I can do with dplyr, but getting up to that point seems rather difficult.
>
> This would get me what my ultimate goal would be; each Regulator is a
> column name (152 columns) and a given row has either NA or the same hit.
>
> This seems very difficult to me, but I appreciate any attempt.
>
> Matthew
>
> On 4/30/2019 4:34 PM, David L Carlson wrote:
> > External Email - Use Caution
> >
> > I think we need more information. Can you give us the structure of the
data with str(YourDataFrame). Alternatively you could copy a small piece into
your email message by copying and pasting the results of the following code:
> >
> > dput(head(YourDataFrame))
> >
> > The data frame you present could not be a data frame since you say
"hits" is a factor with a variable number of elements. If each value
of "hits" was a single character string, it would only have 2 factor
levels not 6 and your efforts to parse the string would make more sense.
Transposing to a data frame would only be possible if each column was padded
with NAs to make them equal in length. Since your example tries use the name
TF2list, it is possible that you do not have a data frame but a list and you
have no factor levels, just character vectors.
> >
> > If you are not familiar with R, it may be helpful to tell us what your
overall goal is rather than an intermediate step. Very likely R can easily
handle what you want by doing things a different way.
> >
> > ----------------------------------------
> > David L Carlson
> > Department of Anthropology
> > Texas A&M University
> > College Station, TX 77843-4352
> >
> >
> >
> > -----Original Message-----
> > From: R-help<r-help-bounces at r-project.org> On Behalf Of
Matthew
> > Sent: Tuesday, April 30, 2019 2:25 PM
> > To: r-help (r-help at r-project.org)<r-help at r-project.org>
> > Subject: [R] transpose and split dataframe
> >
> > I have a data frame that is a lot bigger but for simplicity sake we
can
> > say it looks like this:
> >
> > Regulator hits
> > AT1G69490 AT4G31950,AT5G24110,AT1G26380,AT1G05675
> > AT2G55980 AT2G85403,AT4G89223
> >
> > In other words:
> >
> > data.frame : 2 obs. of 2 variables
> > $Regulator: Factor w/ 2 levels
> > $hits : Factor w/ 6 levels
> >
> > I want to transpose it so that Regulator is now the column
headings
> > and each of the AGI numbers now separated by commas is a row. So,
> > AT1G69490 is now the header of the first column and AT4G31950 is row 1
> > of column 1, AT5G24110 is row 2 of column 1, etc. AT2G55980 is header
of
> > column 2 and AT2G85403 is row 1 of column 2, etc.
> >
> > I have tried playing around with strsplit(TF2list[2:2]) and
> > strsplit(as.character(TF2list[2:2]), but I am getting nowhere.
> >
> > Matthew
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting
guidehttp://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.