I'm doing some string manipulation on a vector of file names, and noticed something curious. When I strsplit the vector, I get a list of character vectors. The list is numbered, as lists are. When I cast that list as a data frame with 'as.data.frame()', the resulting columns have names derived from the original filenames. Example code is below. My question is, where are these names stored in the list? Are there methods that can access this from the list? Is there a way to preserve them verbatim? Thanks -Ed> example.names[1] "con1-1-masked-bottom-green.tsv" "con1-1-masked-bottom-red.tsv" [3] "con1-1-masked-top-green.tsv" "con1-1-masked-top-red.tsv"> example.list <- strsplit(example.names, "-") > example.list[[1]] [1] "con1" "1" "masked" "bottom" "green.tsv" [[2]] [1] "con1" "1" "masked" "bottom" "red.tsv" [[3]] [1] "con1" "1" "masked" "top" "green.tsv" [[4]] [1] "con1" "1" "masked" "top" "red.tsv"> example.df <- as.data.frame(example.list) > example.dfc..con1....1....masked....bottom....green.tsv.. 1 con1 2 1 3 masked 4 bottom 5 green.tsv c..con1....1....masked....bottom....red.tsv.. 1 con1 2 1 3 masked 4 bottom 5 red.tsv c..con1....1....masked....top....green.tsv.. 1 con1 2 1 3 masked 4 top 5 green.tsv c..con1....1....masked....top....red.tsv.. 1 con1 2 1 3 masked 4 top 5 red.tsv
They aren't being stored, they are being generated on the fly. You can create the same names using make.names() example.names <- c("con1-1-masked-bottom-green.tsv", "con1-1-masked-bottom-red.tsv", "con1-1-masked-top-green.tsv", "con1-1-masked-top-red.tsv") example.list <- strsplit(example.names, "-") as.data.frame(example.list)> make.names(example.list)[1] "c..con1....1....masked....bottom....green.tsv.." "c..con1....1....masked....bottom....red.tsv.." [3] "c..con1....1....masked....top....green.tsv.." "c..con1....1....masked....top....red.tsv.." But you'll probably get a more usable result if you set names explicitly, for instance: names(example.list) <- example.names as.data.frame(example.list) Note that the characters that are not legal in column names are changed for you. You can disable that behavior with check.names=FALSE if you use data.frame() rather than as.data.frame(). Sarah On Mon, Apr 18, 2016 at 4:21 PM, Ed Siefker <ebs15242 at gmail.com> wrote:> I'm doing some string manipulation on a vector of file names, and noticed > something curious. When I strsplit the vector, I get a list of > character vectors. > The list is numbered, as lists are. When I cast that list as a data > frame with 'as.data.frame()', the resulting columns have names derived > from the original filenames. > > Example code is below. My question is, where are these names stored > in the list? Are there methods that can access this from the list? > Is there a way to preserve them verbatim? Thanks > -Ed > >> example.names > [1] "con1-1-masked-bottom-green.tsv" "con1-1-masked-bottom-red.tsv" > [3] "con1-1-masked-top-green.tsv" "con1-1-masked-top-red.tsv" >> example.list <- strsplit(example.names, "-") >> example.list > [[1]] > [1] "con1" "1" "masked" "bottom" "green.tsv" > > [[2]] > [1] "con1" "1" "masked" "bottom" "red.tsv" > > [[3]] > [1] "con1" "1" "masked" "top" "green.tsv" > > [[4]] > [1] "con1" "1" "masked" "top" "red.tsv" > >> example.df <- as.data.frame(example.list) >> example.df > c..con1....1....masked....bottom....green.tsv.. > 1 con1 > 2 1 > 3 masked > 4 bottom > 5 green.tsv > c..con1....1....masked....bottom....red.tsv.. > 1 con1 > 2 1 > 3 masked > 4 bottom > 5 red.tsv > c..con1....1....masked....top....green.tsv.. > 1 con1 > 2 1 > 3 masked > 4 top > 5 green.tsv > c..con1....1....masked....top....red.tsv.. > 1 con1 > 2 1 > 3 masked > 4 top > 5 red.tsv >
You can always add those names to the list: is this what you are after?> example.names <- c("con1-1-masked-bottom-green.tsv","con1-1-masked-bottom-red.tsv" + , "con1-1-masked-top-green.tsv", "con1-1-masked-top-red.tsv")> example.list <- strsplit(example.names, "-") > names(example.list) <- example.names > example.df <- as.data.frame(example.list) > > example.dfcon1.1.masked.bottom.green.tsv con1.1.masked.bottom.red.tsv con1.1.masked.top.green.tsv 1 con1 con1 con1 2 1 1 1 3 masked masked masked 4 bottom bottom top 5 green.tsv red.tsv green.tsv con1.1.masked.top.red.tsv 1 con1 2 1 3 masked 4 top 5 red.tsv> str(example.df)'data.frame': 5 obs. of 4 variables: $ con1.1.masked.bottom.green.tsv: Factor w/ 5 levels "1","bottom","con1",..: 3 1 5 2 4 $ con1.1.masked.bottom.red.tsv : Factor w/ 5 levels "1","bottom","con1",..: 3 1 4 2 5 $ con1.1.masked.top.green.tsv : Factor w/ 5 levels "1","con1","green.tsv",..: 2 1 4 5 3 $ con1.1.masked.top.red.tsv : Factor w/ 5 levels "1","con1","masked",..: 2 1 3 5 4 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Mon, Apr 18, 2016 at 4:21 PM, Ed Siefker <ebs15242 at gmail.com> wrote:> I'm doing some string manipulation on a vector of file names, and noticed > something curious. When I strsplit the vector, I get a list of > character vectors. > The list is numbered, as lists are. When I cast that list as a data > frame with 'as.data.frame()', the resulting columns have names derived > from the original filenames. > > Example code is below. My question is, where are these names stored > in the list? Are there methods that can access this from the list? > Is there a way to preserve them verbatim? Thanks > -Ed > > > example.names > [1] "con1-1-masked-bottom-green.tsv" "con1-1-masked-bottom-red.tsv" > [3] "con1-1-masked-top-green.tsv" "con1-1-masked-top-red.tsv" > > example.list <- strsplit(example.names, "-") > > example.list > [[1]] > [1] "con1" "1" "masked" "bottom" "green.tsv" > > [[2]] > [1] "con1" "1" "masked" "bottom" "red.tsv" > > [[3]] > [1] "con1" "1" "masked" "top" "green.tsv" > > [[4]] > [1] "con1" "1" "masked" "top" "red.tsv" > > > example.df <- as.data.frame(example.list) > > example.df > c..con1....1....masked....bottom....green.tsv.. > 1 con1 > 2 1 > 3 masked > 4 bottom > 5 green.tsv > c..con1....1....masked....bottom....red.tsv.. > 1 con1 > 2 1 > 3 masked > 4 bottom > 5 red.tsv > c..con1....1....masked....top....green.tsv.. > 1 con1 > 2 1 > 3 masked > 4 top > 5 green.tsv > c..con1....1....masked....top....red.tsv.. > 1 con1 > 2 1 > 3 masked > 4 top > 5 red.tsv > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]