I have a list of strings of different lengths and would like to split each string by underscore "_" pc_m2_45_ssp3_wheat pc_m2_45_ssp3_wheat ssp3_maize m2_wheat I would like to separate each part of the string into different columns such as pc m2 45 ssp3 wheat But because of the different lengths - I would like NA in the columns for the variables have fewer parts such as NA NA NA m2 wheat I have tried unlist(strsplit(x, "_")) to split, it works for one variable but not for the list - gives me "non-character argument" error. I would highly appreciate any help. Thank you! [[alternative HTML version deleted]]
Try this: mylist <- list("pc_m2_45_ssp3_wheat", "pc_m2_45_ssp3_wheat", "ssp3_maize", "m2_wheat") mylist <- lapply(mylist, function(x) unlist(strsplit(x, split="_"))) allstrings <- unique(unlist(mylist)) lapply(mylist, function(x) allstrings[match(allstrings, x)]) [[1]] [1] "pc" "m2" "45" "ssp3" "wheat" NA [[2]] [1] "pc" "m2" "45" "ssp3" "wheat" NA [[3]] [1] NA NA NA "pc" NA "m2" [[4]] [1] NA "pc" NA NA "m2" NA Hope this helps, Adrian On Sun, Jan 17, 2016 at 10:56 PM, Miluji Sb <milujisb at gmail.com> wrote:> I have a list of strings of different lengths and would like to split each > string by underscore "_" > > pc_m2_45_ssp3_wheat > pc_m2_45_ssp3_wheat > ssp3_maize > m2_wheat > > I would like to separate each part of the string into different columns > such as > > pc m2 45 ssp3 wheat > > But because of the different lengths - I would like NA in the columns for > the variables have fewer parts such as > > NA NA NA m2 wheat > > I have tried unlist(strsplit(x, "_")) to split, it works for one variable > but not for the list - gives me "non-character argument" error. I would > highly appreciate any help. Thank you! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Adrian Dusa University of Bucharest Romanian Social Data Archive Soseaua Panduri nr.90 050663 Bucharest sector 5 Romania [[alternative HTML version deleted]]
> str_1 <- list("pc_m2_45_ssp3_wheat", "pc_m2_45_ssp3_wheat", "ssp3_maize", "m2_wheat") > str_2 <- strsplit(unlist(str_1), "_") > max.length <- max(sapply(str_2,length)) > str_3 <- lapply(lapply(str_2, unlist), "length<-", max.length) > str_3See: stackoverflow.com/questions/27995639/i-have-a-numeric-list-where-id-like-to-add-0-or-na-to-extend-the-length-of-the Hope this helps, Bill William Michels, Ph.D. On Sun, Jan 17, 2016 at 12:56 PM, Miluji Sb <milujisb at gmail.com> wrote:> I have a list of strings of different lengths and would like to split each > string by underscore "_" > > pc_m2_45_ssp3_wheat > pc_m2_45_ssp3_wheat > ssp3_maize > m2_wheat > > I would like to separate each part of the string into different columns > such as > > pc m2 45 ssp3 wheat > > But because of the different lengths - I would like NA in the columns for > the variables have fewer parts such as > > NA NA NA m2 wheat > > I have tried unlist(strsplit(x, "_")) to split, it works for one variable > but not for the list - gives me "non-character argument" error. I would > highly appreciate any help. Thank you! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Miluji, While the other answers are correct in general, I noticed that your request was for the elements of an incomplete string to be placed in the same positions as in the complete strings. Perhaps this will help: strings<-list("pc_m2_45_ssp3_wheat","pc_m2_45_ssp3_wheat", "ssp3_maize","m2_wheat","pc_m2_45_ssp3_maize") split_strings<-strsplit(unlist(strings),"_") max_length <- max(sapply(split_strings,length)) complete_sets<-split_strings[sapply(split_strings,length)==max_length] element_sets<-list() # build a list with the unique elements of each complete string for(i in 1:max_length) element_sets[[i]]<-unique(sapply(complete_sets,"[",i)) # function to guess the position of the elements in a partial string # and return them in the hopefully correct positions fill_strings<-function(split_string,max_length,element_sets) { if(length(split_string) < max_length) { new_split_string<-rep(NA,max_length) for(i in 1:length(split_string)) { for(j in 1:length(complete_sets)) { if(grep(split_string[i],element_sets[j])) new_split_string[j]<-split_string[i] } } return(new_split_string) } return(split_string) } # however, if you know that the incomplete strings will always # be composed of the last elements in the complete strings fill_strings<-function(split_string,max_length) { lenstring<-length(split_string) if(lenstring < max_length) split_string<-c(rep(NA,max_length-lenstring),split_string) return(split_string) } sapply(split_strings,fill_strings,list(max_length,element_sets)) Jim On Mon, Jan 18, 2016 at 7:56 AM, Miluji Sb <milujisb at gmail.com> wrote:> I have a list of strings of different lengths and would like to split each > string by underscore "_" > > pc_m2_45_ssp3_wheat > pc_m2_45_ssp3_wheat > ssp3_maize > m2_wheat > > I would like to separate each part of the string into different columns > such as > > pc m2 45 ssp3 wheat > > But because of the different lengths - I would like NA in the columns for > the variables have fewer parts such as > > NA NA NA m2 wheat > > I have tried unlist(strsplit(x, "_")) to split, it works for one variable > but not for the list - gives me "non-character argument" error. I would > highly appreciate any help. Thank you! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thank you everyone for the codes and the link. They work well! Mr. Lemon, thank you for the detailed code and the explanations. I appreciate it. One thing though, in the last line sapply(split_strings,fill_strings,list(max_length,element_sets)) should it be unlist instead of list - I get this error "Error in FUN(X[[i]], ...) : (list) object cannot be coerced to type 'integer'". Thanks again! On Mon, Jan 18, 2016 at 9:19 AM, Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Miluji, > While the other answers are correct in general, I noticed that your > request was for the elements of an incomplete string to be placed in the > same positions as in the complete strings. Perhaps this will help: > > strings<-list("pc_m2_45_ssp3_wheat","pc_m2_45_ssp3_wheat", > "ssp3_maize","m2_wheat","pc_m2_45_ssp3_maize") > split_strings<-strsplit(unlist(strings),"_") > max_length <- max(sapply(split_strings,length)) > complete_sets<-split_strings[sapply(split_strings,length)==max_length] > element_sets<-list() > > # build a list with the unique elements of each complete string > for(i in 1:max_length) > element_sets[[i]]<-unique(sapply(complete_sets,"[",i)) > > # function to guess the position of the elements in a partial string > # and return them in the hopefully correct positions > fill_strings<-function(split_string,max_length,element_sets) { > if(length(split_string) < max_length) { > new_split_string<-rep(NA,max_length) > for(i in 1:length(split_string)) { > for(j in 1:length(complete_sets)) { > if(grep(split_string[i],element_sets[j])) > new_split_string[j]<-split_string[i] > } > } > return(new_split_string) > } > return(split_string) > } > > # however, if you know that the incomplete strings will always > # be composed of the last elements in the complete strings > fill_strings<-function(split_string,max_length) { > lenstring<-length(split_string) > if(lenstring < max_length) > split_string<-c(rep(NA,max_length-lenstring),split_string) > return(split_string) > } > > sapply(split_strings,fill_strings,list(max_length,element_sets)) > > Jim > > On Mon, Jan 18, 2016 at 7:56 AM, Miluji Sb <milujisb at gmail.com> wrote: > >> I have a list of strings of different lengths and would like to split each >> string by underscore "_" >> >> pc_m2_45_ssp3_wheat >> pc_m2_45_ssp3_wheat >> ssp3_maize >> m2_wheat >> >> I would like to separate each part of the string into different columns >> such as >> >> pc m2 45 ssp3 wheat >> >> But because of the different lengths - I would like NA in the columns for >> the variables have fewer parts such as >> >> NA NA NA m2 wheat >> >> I have tried unlist(strsplit(x, "_")) to split, it works for one variable >> but not for the list - gives me "non-character argument" error. I would >> highly appreciate any help. Thank you! >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >[[alternative HTML version deleted]]