on 11/20/2008 02:28 PM Tul Gan wrote:> Hi !
> ?????????? I am new to R. Can somebody help me in reformatting?huge output
files ,i.e, rearranging sets of columns in specific order.
> For example: I have data for three compunds 1, 2 and 3
> file1:
> ID CA1 CA3 CA2 MA2 MA1 MA3
> 1 14 15 13 7 12 3
> 2 19 7 12 10 14 5
> 3 21 12 19 6 8 9
> ?
> to
> File 2:
> ?
> ID CA1 CA2 CA3 MA1 MA2 MA3
> 1 14 13 15 12 7 3
> 2 19 12 7 14 10 5
> 3 21 19 12 8 6 9
> ?
> or File3:
> ID CA1 MA1 CA2 MA2 CA3 MA3
> 1 14 12 13 7 15 3
> 2 19 14 12 10 7 5
> 3 21 8 19 6 12 9
> ?
> Thanks for your help,
> Tul Gan
A general approach to the first case, where 'DF' is the initial data
frame:
> DF[, c("ID", sort(names(DF)[-1]))]
ID CA1 CA2 CA3 MA1 MA2 MA3
1 1 14 13 15 12 7 3
2 2 19 12 7 14 10 5
3 3 21 19 12 8 6 9
Essentially, you are sorting the column names less "ID", then
reordering
the columns of 'DF' using indexing.
> names(DF)[-1]
[1] "CA1" "CA3" "CA2" "MA2"
"MA1" "MA3"
> sort(names(DF)[-1])
[1] "CA1" "CA2" "CA3" "MA1"
"MA2" "MA3"
The second case is a little more complicated, since you are
independently sorting on both alpha and numeric values, rather than just
alpha.
If there is a predictable sequence of names, you could do something like:
> paste(c("CA", "MA"), rep(1:3, each = 2), sep =
"")
[1] "CA1" "MA1" "CA2" "MA2"
"CA3" "MA3"
Thus:
> DF[, c("ID", paste(c("CA", "MA"), rep(1:3,
each = 2), sep = ""))]
ID CA1 MA1 CA2 MA2 CA3 MA3
1 1 14 12 13 7 15 3
2 2 19 14 12 10 7 5
3 3 21 8 19 6 12 9
If the sequence is not predictable, such that you would generate
non-existing column names with the above, then we need to split the
existing column names and sort them separately. Something along the
lines of the following:
# use gsub() to split out the alpha and numeric values. Note that we
# need to create a data frame to handle an alpha and a numeric column
names.df <- data.frame(Alpha = gsub("[[:digit:]]", "",
names(DF)[-1]),
Digits = as.numeric(gsub("[[:alpha:]]",
"",
names(DF)[-1])))
> names.df
Alpha Digits
1 CA 1
2 CA 3
3 CA 2
4 MA 2
5 MA 1
6 MA 3
Now, get the indices for the sorted rows, based first upon the digits
value, subsorted by the Alpha value:
> order(names.df$Digits, names.df$Alpha)
[1] 1 5 3 4 2 6
Now use those values to index the columns in 'DF'. Remember that we need
to add 1 to each index, to account for the "ID" column, which is not
included in 'names.mat':
> c(1, order(names.df$Digits, names.df$Alpha) + 1)
[1] 1 2 6 4 5 3 7
Thus:
> DF[, c(1, order(names.df$Digits, names.df$Alpha) + 1)]
ID CA1 MA1 CA2 MA2 CA3 MA3
1 1 14 12 13 7 15 3
2 2 19 14 12 10 7 5
3 3 21 8 19 6 12 9
Note, that we could also solve the first case, by reversing the sorting
levels from names.df:
> DF[, c(1, order(names.df$Alpha, names.df$Digits) + 1)]
ID CA1 CA2 CA3 MA1 MA2 MA3
1 1 14 13 15 12 7 3
2 2 19 12 7 14 10 5
3 3 21 19 12 8 6 9
See ?gsub, ?sort and ?order
HTH,
Marc Schwartz