I have a file that for basically carries three datasets of differing lengths. ?To make this a single downloadable file the creator of the file as used both NUL hex00 and space hex20 to normalize the lengths. Below is the function that I am writing. ?I am using sed to replace the hex characters. ?First, to get past NUL I use sed to replace hex 00 with hex 20. ?This has worked. ?Once the Nul is removed and can successfully parse the file with ReadLine sub_str. ?This final step before delimiting the file and making it nice and tidy is to remove the hex 20 characters. ? I am using the same strategy to eliminate the spaces and sed command works in a shell but does not work in the R function. ?What am I doing wrong? ?I have dput - some of the nastier lines with hex 20 characters below my code. Any advice is appreciated. Glenn arm <- function(filepath){ callpath <- paste(filepath, "arm.txt", sep ="") ARMReturn <- paste(filepath, "arm.csv", sep = "") ARMPoolReturnPath <- paste(filepath,"armatpool.csv", sep = "") ARMNextChgReturnPath <- paste(filepath,"nexratechangedate.csv", sep = "") ARMFirstPmtReturnPath <- paste(filepath,"firstpaymentdate.csv", sep = "") # This file contains NUL hex characters before parsing the file replace # the hex NUL x00 with space x20 and save as a csv file. Use system command sedcommand <- paste("sed -e 's/\\x00/\\x20/g' <", filepath, "arm.txt", ">", "arm.csv", sep = " ") system(sedcommand) # read the arm quartile data to a file once skipNuls then length of each # record set changes and the data map provided by FNMA is no longer valid # with respect to the length of each embedded data set data <- readLines(ARMReturn, encoding = "ascii") quartile <- NULL numchar <- nchar(x = data, type = "chars") start <- c(seq(1, numchar, 399)) end <- c(seq(399, numchar, 399)) quartile <- str_sub(data, start[1:length(start)], end[1:length(end)]) write(quartile, ARMReturn) # The file has been parsed accroding to length 400 for each data element. # The next step is to remove all the trailing white space hex character # x20 sedcommand2 <- paste("sed -e '/\\x20/d' <", filepath, "arm.csv", ">", "arm2.csv", sep = "") system(sedcommand2) } # end of function c(" 555556 WS320021201006125{000378{000348{ ", " 555556 WS320021201006250{000954{000880{ ", " 555556 WS320021201005625{001062{000983{ ", " 555556 WS320030101005250{000027{000025{ ", " 555556 WS320030101006500{000033{000030{ ", " 555556 WS320030101005125{000061{000056{ ", " 555556 WS320030101005375{000095{000088{ ", " 555556 WS320030101005350{000217{000200{ ", " 555556 WS320030101006125{000400{000369{ ", " 555556 WS320030101005310{000439{000406{ ", " 555556 WS320030101006000{000573{000529{ "
On Sat, Sep 10, 2016 at 07:23:37PM +0000, Glenn Schultz wrote:> ... > Below is the function that I am writing. ?I am using sed to replace the hex characters. ?First, to get past NUL I use sed to replace hex 00 with hex 20. ?This has worked. ?Once the Nul is removed and can successfully parse the file with ReadLine sub_str. ?This final step before delimiting the file and making it nice and tidy is to remove the hex 20 characters. ? I am using the same strategy to eliminate the spaces and sed command works in a shell but does not work in the R function. ?What am I doing wrong? ?I have dput - some of the nastier lines with hex 20 characters below my code.I believe that you will find that the sed "d" command deletes the "pattern space" (in a simple text file, it would delete the line) in which the specified regular expression is found. I suspect that you actually want to eliminate the "space" characters themselves, so rather than:> ... > # The file has been parsed accroding to length 400 for each data element. > # The next step is to remove all the trailing white space hex character > # x20 > > sedcommand2 <- paste("sed -e '/\\x20/d' <",what is wanted is: sedcommand2 <- paste("sed -e 's/\\x20//g' <",> ...Note that you might consider using R's gsub() function to perform that "space elimination"both natively and a bit earlier. Peace, david -- David H. Wolfskill r at catwhisker.org Those who would murder in the name of God or prophet are blasphemous cowards. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 603 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160911/9bf52a5e/attachment.bin>
Excerpts from Glenn Schultz's message of 2016-09-10 19:23:37 +0000:> I have a file that for basically carries three datasets of differing > lengths. ?To make this a single downloadable file the creator of the > file as used both NUL hex00 and space hex20 to normalize the lengths. > > Below is the function that I am writing. ?I am using sed to replace > the hex characters. ?First, to get past NUL I use sed to replace hex > 00 with hex 20. ?This has worked. ?Once the Nul is removed and can > successfully parse the file with ReadLine sub_str. ?This final step > before delimiting the file and making it nice and tidy is to remove > the hex 20 characters. ? I am using the same strategy to eliminate the > spaces and sed command works in a shell but does not work in the R > function. ?What am I doing wrong? ?I have dput - some of the nastier > lines with hex 20 characters below my code. > > Any advice is appreciated.You can use readLines(pipe(sedcommand)) to get the filtered dataset. I didn't understand what kind of filtering you are doing, it seems confused to me. But, someone pointed out that use of command 'd' is for deletion of the role pattern space, so if you are trying to substitute use: s/pattern//g # effectively removing pattern from the text Best Luck, Marco -- Marco Arthur @ (M)arco Creatives
If you think you might want to put this function into a package, it would be much better to use gsub instead of passing the job off to an external program, because non-POSIX operating systems (Windows) will be a headache to support. -- Sent from my phone. Please excuse my brevity. On September 10, 2016 12:23:37 PM PDT, Glenn Schultz <glennmschultz at me.com> wrote:>I have a file that for basically carries three datasets of differing >lengths. ?To make this a single downloadable file the creator of the >file as used both NUL hex00 and space hex20 to normalize the lengths. > >Below is the function that I am writing. ?I am using sed to replace the >hex characters. ?First, to get past NUL I use sed to replace hex 00 >with hex 20. ?This has worked. ?Once the Nul is removed and can >successfully parse the file with ReadLine sub_str. ?This final step >before delimiting the file and making it nice and tidy is to remove the >hex 20 characters. ? I am using the same strategy to eliminate the >spaces and sed command works in a shell but does not work in the R >function. ?What am I doing wrong? ?I have dput - some of the nastier >lines with hex 20 characters below my code. > >Any advice is appreciated. > >Glenn > >arm <- function(filepath){ >callpath <- paste(filepath, "arm.txt", sep ="") >ARMReturn <- paste(filepath, "arm.csv", sep = "") >ARMPoolReturnPath <- paste(filepath,"armatpool.csv", sep = "") >ARMNextChgReturnPath <- paste(filepath,"nexratechangedate.csv", sep >"") >ARMFirstPmtReturnPath <- paste(filepath,"firstpaymentdate.csv", sep >"") > ># This file contains NUL hex characters before parsing the file replace ># the hex NUL x00 with space x20 and save as a csv file. Use system >command >sedcommand <- paste("sed -e 's/\\x00/\\x20/g' <", >filepath, "arm.txt", >">", "arm.csv", sep = " ") >system(sedcommand) > ># read the arm quartile data to a file once skipNuls then length of >each ># record set changes and the data map provided by FNMA is no longer >valid ># with respect to the length of each embedded data set >data <- readLines(ARMReturn, encoding = "ascii") > >quartile <- NULL >numchar <- nchar(x = data, type = "chars") >start <- c(seq(1, numchar, 399)) >end <- c(seq(399, numchar, 399)) >quartile <- str_sub(data, start[1:length(start)], end[1:length(end)]) >write(quartile, ARMReturn) > ># The file has been parsed accroding to length 400 for each data >element. ># The next step is to remove all the trailing white space hex character ># x20 > >sedcommand2 <- paste("sed -e '/\\x20/d' <", >filepath, "arm.csv", >">", "arm2.csv", sep = "") >system(sedcommand2) >} # end of function > > >c(" 555556 >WS320021201006125{000378{000348{ > ", >" 555556 >WS320021201006250{000954{000880{ > ", >" 555556 >WS320021201005625{001062{000983{ > ", >" 555556 >WS320030101005250{000027{000025{ > ", >" 555556 >WS320030101006500{000033{000030{ > ", >" 555556 >WS320030101005125{000061{000056{ > ", >" 555556 >WS320030101005375{000095{000088{ > ", >" 555556 >WS320030101005350{000217{000200{ > ", >" 555556 >WS320030101006125{000400{000369{ > ", >" 555556 >WS320030101005310{000439{000406{ > ", >" 555556 >WS320030101006000{000573{000529{ > " > > > > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.