Hello, my question is about the data handling. I have a data set that is lined as: 4 1 17 1 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 4 1 17 2 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 This means that 29 variables are together as a set. You saw two sets of them in example. I have about 1000 sets (of 29 variables) in my data. When I "scan" this data set, the result comes with 7 columns and it is not possible, so far, to read the table by column wise, and thus it is not possible to analyze the data. I would like to know whether there is a way to solve this problem, say, by arranging columns or increasing the number of columns of data matrix by R. Also, I would like to know how you could name each column of the data so that you could use the individual column separately. Sincerely. [[alternative HTML version deleted]]
Dear Yoko, If you're sure that the data are complete, then data <- matrix(scan("file-name"), ncol=29) should do the trick. Then to name the columns of the data matrix, colnames(data) <- c("one", "two", etc.). [Of course, you'd substitute meaningful names.] I hope this helps, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Yoko Nakajima > Sent: Wednesday, April 13, 2005 7:56 PM > To: r-help at stat.math.ethz.ch > Subject: [R] data manipulation > > Hello, > my question is about the data handling. > > I have a data set that is lined as: > > 4 1 17 1 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 > -0.5081 -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 > -0.1033 -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > 4 1 17 2 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 > -0.5081 -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 > -0.1033 -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > > This means that 29 variables are together as a set. You saw > two sets of them in example. I have about 1000 sets (of 29 > variables) in my data. When I "scan" this data set, the > result comes with 7 columns and it is not possible, so far, > to read the table by column wise, and thus it is not possible > to analyze the data. I would like to know whether there is a > way to solve this problem, say, by arranging columns or > increasing the number of columns of data matrix by R. > > Also, I would like to know how you could name each column of > the data so that you could use the individual column separately. > > Sincerely. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
On Wed, 2005-04-13 at 20:56 -0400, Yoko Nakajima wrote:> Hello, > my question is about the data handling. > > I have a data set that is lined as: > > 4 1 17 1 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 > -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 > -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > 4 1 17 2 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 > -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 > -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > > This means that 29 variables are together as a set. You saw two sets > of them in example. I have about 1000 sets (of 29 variables) in my > data. When I "scan" this data set, the result comes with 7 columns and > it is not possible, so far, to read the table by column wise, and thus > it is not possible to analyze the data. I would like to know whether > there is a way to solve this problem, say, by arranging columns or > increasing the number of columns of data matrix by R. > > Also, I would like to know how you could name each column of the data > so that you could use the individual column separately.You probably change some default setting in scan(). By default it treats 'white space' as field delimiters. Using your data above, which I save in file called 'test.dat':> mat <- matrix(scan("test.dat"), ncol = 29)Read 58 items> dim(mat)[1] 2 29> colnames(mat) <- paste("Col", 1:29, sep = "")> matCol1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 [1,] 4 17 1.0000 -0.1668 -0.5062 0.3640 -0.5081 0.8142 -0.0445 [2,] 1 1 -5.1536 -2.3412 0.9621 0.3678 -0.2227 -0.0389 -0.0578 Col10 Col11 Col12 Col13 Col14 Col15 Col16 Col17 Col18 [1,] -0.1175 0.8673 -0.0796 -0.1716 -0.7014 0.5611 1 2 -5.1536 [2,] -0.1232 -0.1033 -0.0341 -0.1801 0.6578 4.0000 17 1 -0.1668 Col19 Col20 Col21 Col22 Col23 Col24 Col25 Col26 [1,] -2.3412 0.9621 0.3678 -0.2227 -0.0389 -0.0578 -0.1232 -0.1033 [2,] -0.5062 0.3640 -0.5081 0.8142 -0.0445 -0.1175 0.8673 -0.0796 Col27 Col28 Col29 [1,] -0.0341 -0.1801 0.6578 [2,] -0.1716 -0.7014 0.5611 In this case, 'mat' is a matrix with 2 rows and 29 columns. You can restructure this differently as per your requirements. HTH, Marc Schwartz
Hello, may I ask a further question? I have realized that "data <- matrix(scan("file-name"), ncol=29)" will read the data differently than I thought, i.e., (4,1) is the first column, (17,1) is the second column, and (1,1) is the third and so on by this code - please see the data below. Therefore, the data set I have would not be in order if I used this code. It needed to be read as: (4.4) first column, (1,1) the second column, and (17, 17) is the third and so on (i.e., from 4 to 0.5611 makes the first row and another 4 to 0.5611 makes the second row and so on). So, V1 V2 V3 ... V29 4 1 17 ... 0.5611 4 1 17 ... 0.5611 was needed. (Now I have , V1 V2 V3 .... V29 4 17 1 ... 0.6578 1 1 -5.1536 ... 0.5611) [The data set I have may have around 1000 sets of them (29 variables times around 1000 sets of these 29 variables). I only paste here two sets of them.] 4 1 17 1 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 4 1 17 2 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 I need 29 columns. This is true. But the data was read differently by "ncol=29". Is there any way I can handle this problem by R? I would very appreciate it if you could let me know. My guess is that I should probably rearrange the data set by excel etc.. I have used "data.entry(data)" and found this. I can not analyze this data set. Thank you very much, in advance. Sincerely, Yoko.
You just need to try harder in reading the documentation. Try: data <- matrix(scan("file-name"), ncol=29, byrow=TRUE) Andy> From: Yoko Nakajima > > Hello, > > may I ask a further question? > > I have realized that "data <- > matrix(scan("file-name"), ncol=29)" will read the data > differently than I > thought, i.e., (4,1) is the first column, (17,1) is the > second column, and > (1,1) is the third and so on by this code - please see the data below. > Therefore, the data set I have would not be in order if I > used this code. > > It needed to be read as: (4.4) first column, (1,1) the second > column, and > (17, 17) is the third and so on (i.e., from 4 to 0.5611 makes > the first row > and another 4 to 0.5611 makes the second row and so on). So, > > V1 V2 V3 ... V29 > 4 1 17 ... 0.5611 > 4 1 17 ... 0.5611 > > was needed. > > (Now I have , > V1 V2 V3 .... V29 > 4 17 1 ... 0.6578 > 1 1 -5.1536 ... 0.5611) > > > [The data set I have may have around 1000 sets of them (29 > variables times > around 1000 sets of these 29 variables). I only paste here two sets of > them.] > 4 1 17 1 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 > -0.5081 -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 > -0.1033 -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > > 4 1 17 2 1 > -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 > -0.5081 -0.2227 > 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 > -0.1033 -0.0796 > -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 > > > > I need 29 columns. This is true. But the data was read differently by > "ncol=29". Is there any way I can handle this problem by R? > > I would very appreciate it if you could let me know. My guess > is that I > should probably rearrange the data set by excel etc.. I have used > "data.entry(data)" and found this. I can not analyze this data set. > > Thank you very much, in advance. > Sincerely, > Yoko. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >