***Dear Eric,***** sending from gmail following the way you suggested. Hope now everyone can see this email. **** I have also attached the first 50 rows of the FIght.csv.*** ***Output - I will try to do Market basket analysis on this to find out rules that I am learning. so once I have the data in transactional format - then I can run the algorithm and keep learning. This little problem has caused a barrier in my path - I can alienate the string in excel - but wanted to do in R - so researching I tried doing this: x<- substr(x, 1, nchar(x) - 1) // but I wasn't successful and I tried many other things - its not coming in the transactional format. *** Hence now reached out to the experts.**** Many Thanks. Hello Dear R Community, I would ask a little bit of help from you please:I have a dataset, which is in a CSV file ? I have read it into R as follows: V1 tropical fruit" whole milk" pip fruit" other vegetables" whole milk" rolls/buns" The issue is: the data set in csv file also appears with the quotation marks ?. I can?t get rid of the quotation marks. I want to do it in R. The Quotes only appear at the end of the string. The dataset has many rows ? this is just a copy. My intention is to be able to get rid of the quotes and then want to separate the strings with a ?/?. i.e. rolls/buns should be rolls in one column and buns in another. I know this is something very simple I am lacking ? but if you could please show me how to do this? If someone could throw some light please. I read the data in with a simple read.csv statement:> x <- read.csv("Fight.csv", stringsAsFactors = F, header = F) > str(x)Output:> str(calc)'data.frame': 38765 obs. of 1 variable: $ V1: chr "tropical fruit\"" "whole milk\"" "pip fruit\"" "other vegetables\"" ... Many Thanks in advance for your help. Kind Regards, Sam.
Hi Sam, My code below adds new columns to your data frame so you have the original columns in order to compare. (Also this could help in case there are a few rows that don't work in the full set.)> x <- read.csv("Fight.csv", stringsAsFactors = F, header = F) > x$V3 <- sub("\\\"","",x$V1) # remove the " > iV <- grep("/",x$V3) # get the indices of the rows that have / in thename> x$V4 <- x$V2 # or rep(NA,nrow(x)) > x$V4[iV] <- sub(".*/","",x$V3[iV]) # remove up-to-and-including the / > x$V3[iV] <- sub("/.*","",x$V3[iV]) # remove from the / and beyond > xHTH, Eric On Tue, Apr 14, 2020 at 1:55 PM Soumyadip Bhattacharyya < s.b.sam2801 at gmail.com> wrote:> ***Dear Eric,***** > sending from gmail following the way you suggested. Hope now everyone > can see this email. **** I have also attached the first 50 rows of the > FIght.csv.*** > ***Output - I will try to do Market basket analysis on this to find > out rules that I am learning. so once I have the data in transactional > format - then I can run the algorithm and keep learning. This little > problem has caused a barrier in my path - I can alienate the string in > excel - but wanted to do in R - so researching I tried doing this: > x<- substr(x, 1, nchar(x) - 1) // but I wasn't successful and I tried > many other things - its not coming in the transactional format. *** > Hence now reached out to the experts.**** Many Thanks. > > Hello Dear R Community, > > I would ask a little bit of help from you please:I have a dataset, > which is in a CSV file ? I have read it into R as follows: > > V1 > tropical fruit" > whole milk" > pip fruit" > other vegetables" > whole milk" > rolls/buns" > > The issue is: the data set in csv file also appears with the quotation > marks ?. I can?t get rid of the quotation marks. I want to do it in R. > The Quotes only appear at the end of the string. The dataset has many > rows ? this is just a copy. My intention is to be able to get rid of > the quotes and then want to separate the strings with a ?/?. i.e. > rolls/buns should be rolls in one column and buns in another. > > I know this is something very simple I am lacking ? but if you could > please show me how to do this? If someone could throw some light > please. I read the data in with a simple read.csv statement: > > > x <- read.csv("Fight.csv", stringsAsFactors = F, header = F) > > str(x) > Output: > > str(calc) > 'data.frame': 38765 obs. of 1 variable: > $ V1: chr "tropical fruit\"" "whole milk\"" "pip fruit\"" "other > vegetables\"" ... > > Many Thanks in advance for your help. > Kind Regards, > Sam. >[[alternative HTML version deleted]]
Hi attachement did not went through, only limited attachement types are allowed - see Posting guide. I am not sure if R is the best possibility to remove some characters. If " is at the end of all your strings> dput(test)structure(list(V1 = c("adfvadfg\"", "sdfasd\"", "vafdv\"", "hjk/tiuk\"" )), class = "data.frame", row.names = c(NA, -4L))> test2 <- sapply(test, function(x) substr(x, 1, as.numeric(apply(test, 2, nchar))-1))combination of sapply, apply and substr could remove trailing ".> test2V1 [1,] "adfvadfg" [2,] "sdfasd" [3,] "vafdv" [4,] "hjk/tiuk">And splitting acccording to / is simpler but it ends in list> sapply(test2, function(x) strsplit(x, "/"))$adfvadfg [1] "adfvadfg" $sdfasd [1] "sdfasd" $vafdv [1] "vafdv" $`hjk/tiuk` [1] "hjk" "tiuk" Changing to data frame you could find yourself, I believe it is mentioned several times on Stackexchange, simple as.data.frame is not a best option. Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Soumyadip > Bhattacharyya > Sent: Tuesday, April 14, 2020 12:55 PM > To: r-help-request at r-project.org; r-help-owner at r-project.org; r-help at r- > project.org; ericjberger at gmail.com > Subject: [R] A simple string alienation problem > > ***Dear Eric,***** > sending from gmail following the way you suggested. Hope now everyone can > see this email. **** I have also attached the first 50 rows of the > FIght.csv.*** > ***Output - I will try to do Market basket analysis on this to find out rules > that I am learning. so once I have the data in transactional format - then I can > run the algorithm and keep learning. This little problem has caused a barrier > in my path - I can alienate the string in excel - but wanted to do in R - so > researching I tried doing this: > x<- substr(x, 1, nchar(x) - 1) // but I wasn't successful and I tried many other > things - its not coming in the transactional format. *** Hence now reached > out to the experts.**** Many Thanks. > > Hello Dear R Community, > > I would ask a little bit of help from you please:I have a dataset, which is in a > CSV file ? I have read it into R as follows: > > V1 > tropical fruit" > whole milk" > pip fruit" > other vegetables" > whole milk" > rolls/buns" > > The issue is: the data set in csv file also appears with the quotation marks ?. I > can?t get rid of the quotation marks. I want to do it in R. > The Quotes only appear at the end of the string. The dataset has many rows ? > this is just a copy. My intention is to be able to get rid of the quotes and then > want to separate the strings with a ?/?. i.e. > rolls/buns should be rolls in one column and buns in another. > > I know this is something very simple I am lacking ? but if you could please > show me how to do this? If someone could throw some light please. I read > the data in with a simple read.csv statement: > > > x <- read.csv("Fight.csv", stringsAsFactors = F, header = F) > > str(x) > Output: > > str(calc) > 'data.frame': 38765 obs. of 1 variable: > $ V1: chr "tropical fruit\"" "whole milk\"" "pip fruit\"" "other vegetables\"" ... > > Many Thanks in advance for your help. > Kind Regards, > Sam. > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
I'm very confused by the phrase "string alienation". You mention two problems: (1) remove " from a string sub('"', '', vector.of.strings) will do that. See ?grep for details. (2) split a string at occurrences of / strsplit(vector.of.strings, "/") will do that. It gives you a list of vectors of strings. See ?strsplit for details.