I got hundreds of csv files. The real formats in each csv file are as follows: aa(cm) 1, 2 , 3, bb(mm) 1, 2, 3, 4, 5, 6, 7, 8, 9, cc(mm) 3, 4, 5, 7, 5, 9, 6, 5, 8, How can I use read.table or read.csv to convert the csv files to a tidy data frame format as follow: aa, bb, cc 1, 1, 3 1, 2, 4 1, 3, 5 2, 4, 7 2, 5, 5 2, 6, 9 3, 7, 6 3, 8, 5 3, 9, 8 many thanks.
On 06/10/2019 7:29 a.m., vod vos via R-help wrote:> I got hundreds of csv files. The real formats in each csv file are as follows: > > aa(cm) > 1, 2 , 3, > > bb(mm) > 1, 2, 3, > 4, 5, 6, > 7, 8, 9, > > cc(mm) > 3, 4, 5, > 7, 5, 9, > 6, 5, 8, > > How can I use read.table or read.csv to convert the csv files > to a tidy data frame format as follow: > > aa, bb, cc > 1, 1, 3 > 1, 2, 4 > 1, 3, 5 > 2, 4, 7 > 2, 5, 5 > 2, 6, 9 > 3, 7, 6 > 3, 8, 5 > 3, 9, 8 > > many thanks.You'll need more than those two functions to do the transformation you want. To work out what you need, write out the process in detail in English (or another natural language), not in code. For example: 1. Read aa from file 1. 2. Read bb from file 2. 3. Read cc from file 3. 4. Expand all vectors to the same length. 5. Combine them into a single dataframe. Then work out each step separately. I think you'll want to use something like scan("filename", skip = 1, sep = ",") in steps 1, 2, and 3, but this will add NA values at the end of each line because of the final comma, so you could do this: aa <- scan("file1", skip = 1, sep = ",") aa <- aa[!is.na(aa)] and similarly for the others. I don't know the rules for expanding that you'll need in your real data, but for your example step 4 could be aa <- rep(aa, each = 3) Then step 5 could be result <- data.frame(aa, bb, cc) Duncan Murdoch
The problem is aa, bb and cc all in a single csv file contains no blank line. The single csv file like list output. aa(cm) 1, 2 , 3, bb(mm) 1, 2, 3, 4, 5, 6, 7, 8, 9, cc(mm) 3, 4, 5, 7, 5, 9, 6, 5, 8, ---- ? ???, 06 ?? 2019 05:08:41 -0700 Duncan Murdoch <murdoch.duncan at gmail.com> ?? ---- > On 06/10/2019 7:29 a.m., vod vos via R-help wrote: > > I got hundreds of csv files. The real formats in each csv file are as follows: > > > > aa(cm) > > 1, 2 , 3, > > > > bb(mm) > > 1, 2, 3, > > 4, 5, 6, > > 7, 8, 9, > > > > cc(mm) > > 3, 4, 5, > > 7, 5, 9, > > 6, 5, 8, > > > > How can I use read.table or read.csv to convert the csv files > > to a tidy data frame format as follow: > > > > aa, bb, cc > > 1, 1, 3 > > 1, 2, 4 > > 1, 3, 5 > > 2, 4, 7 > > 2, 5, 5 > > 2, 6, 9 > > 3, 7, 6 > > 3, 8, 5 > > 3, 9, 8 > > > > many thanks. > > You'll need more than those two functions to do the transformation you > want. To work out what you need, write out the process in detail in > English (or another natural language), not in code. For example: > > 1. Read aa from file 1. > 2. Read bb from file 2. > 3. Read cc from file 3. > 4. Expand all vectors to the same length. > 5. Combine them into a single dataframe. > > Then work out each step separately. I think you'll want to use > something like scan("filename", skip = 1, sep = ",") in steps 1, 2, and > 3, but this will add NA values at the end of each line because of the > final comma, so you could do this: > > aa <- scan("file1", skip = 1, sep = ",") > aa <- aa[!is.na(aa)] > > and similarly for the others. > > I don't know the rules for expanding that you'll need in your real data, > but for your example step 4 could be > > aa <- rep(aa, each = 3) > > Then step 5 could be > > result <- data.frame(aa, bb, cc) > > Duncan Murdoch >
Hello, It is not clear if all files have * a first block with just one data line * all other blocks with as many rows as the numbers in that first data line. If yes, maybe something like this? lns <- readLines("strange.csv") lns <- lns[sapply(lns, nchar) > 0] lns <- sub(",$", "", lns) i_title <- grep("[[:alpha:]]", lns) tmp <- lapply(seq_along(i_title), function(i){ tmp <- if(i < length(i_title)){ lns[(i_title[i] + 1):(i_title[i + 1] - 1)] }else{ lns[(i_title[i] + 1):length(lns)] } list(n = length(tmp), text = unlist(strsplit(tmp, ","))) }) n <- max(sapply(tmp, '[[', 'n')) tmp <- lapply(tmp, function(x) as.numeric(x$text)) tmp[[1]] <- rep(tmp[[1]], each = n) res <- do.call(cbind.data.frame, tmp) names(res) <- lns[i_title] res If you have hundreds of files, you should make a function out of the code above. Hope this helps, Rui Barradas ?s 12:29 de 06/10/19, vod vos via R-help escreveu:> I got hundreds of csv files. The real formats in each csv file are as follows: > > aa(cm) > 1, 2 , 3, > > bb(mm) > 1, 2, 3, > 4, 5, 6, > 7, 8, 9, > > cc(mm) > 3, 4, 5, > 7, 5, 9, > 6, 5, 8, > > How can I use read.table or read.csv to convert the csv files > to a tidy data frame format as follow: > > aa, bb, cc > 1, 1, 3 > 1, 2, 4 > 1, 3, 5 > 2, 4, 7 > 2, 5, 5 > 2, 6, 9 > 3, 7, 6 > 3, 8, 5 > 3, 9, 8 > > many thanks. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
The csv file is exported from Windows (dos format), so the line break is different from Unix. ---- ? ???, 07 ?? 2019 01:18:54 -0700 <vodvos at zoho.com> ?? ---- > I am mad about importing this strange csv format type. > > The real csv has been attached now. The raw data points are huge. > > Many thanks. > > > > > ---- ? ???, 06 ?? 2019 07:58:37 -0700 Rui Barradas <ruipbarradas at sapo.pt> ?? ---- > > Hello, > > > > It is not clear if all files have > > > > * a first block with just one data line > > * all other blocks with as many rows as the numbers in that first data line. > > > > If yes, maybe something like this? > > > > lns <- readLines("strange.csv") > > lns <- lns[sapply(lns, nchar) > 0] > > lns <- sub(",$", "", lns) > > i_title <- grep("[[:alpha:]]", lns) > > > > tmp <- lapply(seq_along(i_title), function(i){ > > tmp <- if(i < length(i_title)){ > > lns[(i_title[i] + 1):(i_title[i + 1] - 1)] > > }else{ > > lns[(i_title[i] + 1):length(lns)] > > } > > list(n = length(tmp), text = unlist(strsplit(tmp, ","))) > > }) > > > > n <- max(sapply(tmp, '[[', 'n')) > > tmp <- lapply(tmp, function(x) as.numeric(x$text)) > > tmp[[1]] <- rep(tmp[[1]], each = n) > > res <- do.call(cbind.data.frame, tmp) > > names(res) <- lns[i_title] > > res > > > > > > If you have hundreds of files, you should make a function out of the > > code above. > > > > Hope this helps, > > > > Rui Barradas > > > > ?s 12:29 de 06/10/19, vod vos via R-help escreveu: > > > I got hundreds of csv files. The real formats in each csv file are as follows: > > > > > > aa(cm) > > > 1, 2 , 3, > > > > > > bb(mm) > > > 1, 2, 3, > > > 4, 5, 6, > > > 7, 8, 9, > > > > > > cc(mm) > > > 3, 4, 5, > > > 7, 5, 9, > > > 6, 5, 8, > > > > > > How can I use read.table or read.csv to convert the csv files > > > to a tidy data frame format as follow: > > > > > > aa, bb, cc > > > 1, 1, 3 > > > 1, 2, 4 > > > 1, 3, 5 > > > 2, 4, 7 > > > 2, 5, 5 > > > 2, 6, 9 > > > 3, 7, 6 > > > 3, 8, 5 > > > 3, 9, 8 > > > > > > many thanks. > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > >
Hello, OK, I had some spare time. Try readCSVFile <- function(filename){ lns <- readLines(filename) lns <- lns[sapply(lns, nchar) > 0] lns <- gsub(" ", "", lns) lns <- sub(";$", "", lns) i_title <- grep("[[:alpha:]]", lns) blocks <- lapply(seq_along(i_title)[-1], function(i){ if(i == length(i_title)){ j <- i_title[i] + 1 k <- length(lns) }else{ j <- i_title[i] + 1 k <- i_title[i + 1] - 1 } lns[j:k] }) n <- length(unlist(strsplit(blocks[[1]][1], ";"))) first <- unlist(strsplit(lns[i_title[1] + 1], ";")) first <- as.numeric(first) first <- rep(first, each = n) blocks <- lapply(blocks, function(x){ unlist(strsplit(x, ";")) }) res <- do.call(cbind.data.frame, blocks) res <- cbind.data.frame(first, res) names(res) <- sub("\\[.*\\]$", "", lns[i_title]) res } df1 <- readCSVFile("strange.csv") If this function doesn't do it, please try to make an effort on your own, R-Help is not a code writing service, it's a mail list for *doubts* on R code. Hope this helps, Rui Barradas ?s 09:18 de 07/10/19, vodvos at zoho.com escreveu:> I am mad about importing this strange csv format type. > > The real csv has been attached now. The raw data points are huge. > > Many thanks. > > > > > ---- ? ???, 06 ?? 2019 07:58:37 -0700 Rui Barradas <ruipbarradas at sapo.pt> ?? ---- > > Hello, > > > > It is not clear if all files have > > > > * a first block with just one data line > > * all other blocks with as many rows as the numbers in that first data line. > > > > If yes, maybe something like this? > > > > lns <- readLines("strange.csv") > > lns <- lns[sapply(lns, nchar) > 0] > > lns <- sub(",$", "", lns) > > i_title <- grep("[[:alpha:]]", lns) > > > > tmp <- lapply(seq_along(i_title), function(i){ > > tmp <- if(i < length(i_title)){ > > lns[(i_title[i] + 1):(i_title[i + 1] - 1)] > > }else{ > > lns[(i_title[i] + 1):length(lns)] > > } > > list(n = length(tmp), text = unlist(strsplit(tmp, ","))) > > }) > > > > n <- max(sapply(tmp, '[[', 'n')) > > tmp <- lapply(tmp, function(x) as.numeric(x$text)) > > tmp[[1]] <- rep(tmp[[1]], each = n) > > res <- do.call(cbind.data.frame, tmp) > > names(res) <- lns[i_title] > > res > > > > > > If you have hundreds of files, you should make a function out of the > > code above. > > > > Hope this helps, > > > > Rui Barradas > > > > ?s 12:29 de 06/10/19, vod vos via R-help escreveu: > > > I got hundreds of csv files. The real formats in each csv file are as follows: > > > > > > aa(cm) > > > 1, 2 , 3, > > > > > > bb(mm) > > > 1, 2, 3, > > > 4, 5, 6, > > > 7, 8, 9, > > > > > > cc(mm) > > > 3, 4, 5, > > > 7, 5, 9, > > > 6, 5, 8, > > > > > > How can I use read.table or read.csv to convert the csv files > > > to a tidy data frame format as follow: > > > > > > aa, bb, cc > > > 1, 1, 3 > > > 1, 2, 4 > > > 1, 3, 5 > > > 2, 4, 7 > > > 2, 5, 5 > > > 2, 6, 9 > > > 3, 7, 6 > > > 3, 8, 5 > > > 3, 9, 8 > > > > > > many thanks. > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > >