Morway, Eric
2014-Feb-17 16:47 UTC
[R] Return rows with largest, cumulative, between column differences
Using the small data set below (which is a small snippet of a much larger dataset), I'd like to get the row(s) that corresponds to the largest, cumulative, absolute difference between the value in "base" and each of the remaining columns. For example, in row 2, the three absolute differences of abs(12.9664-12.9663), abs(12.9664-12.9666), abs(12.9664-12.9665), sum to 0.0004 (0.0001+0.0002+ 0.0001). Row 3 would be abs(12.9853-12.9847)+abs(12.9853-12.9856)+abs(12.9853-12.9852)=0.001. For the example data set provided, a check in excel revealed that rows 22-25, 28, 31, and 33 all had values of 0.0025, the largest cumulative residual among the three columns subtracted from the "base" column. Because the larger dataset will have more "perturb" columns to be included in the analysis, I'm hoping any offered solution will take that into account. Thanks, Eric edm <- read.table(textConnection(" t_ib3__0001 12.94150 12.94150 12.94180 12.94160 t_ib3__0002 12.96640 12.96630 12.96660 12.96650 t_ib3__0003 12.98530 12.98470 12.98560 12.98520 t_ib3__0004 13.00940 13.00890 13.00970 13.00930 t_ib3__0005 13.02840 13.02880 13.02930 13.02930 t_ib3__0006 13.05390 13.05430 13.05480 13.05470 t_ib3__0007 13.07650 13.07690 13.07740 13.07730 t_ib3__0008 13.10210 13.10240 13.10290 13.10280 t_ib3__0009 13.12530 13.12560 13.12610 13.12610 t_ib3__0010 13.15260 13.15290 13.15340 13.15340 t_ib3__0011 13.17750 13.17780 13.17830 13.17830 t_ib3__0012 13.20850 13.20870 13.20930 13.20930 t_ib3__0013 13.23710 13.23740 13.23800 13.23800 t_ib3__0014 13.27270 13.27290 13.27350 13.27360 t_ib3__0015 13.30560 13.30580 13.30650 13.30660 t_ib3__0016 13.34390 13.34400 13.34480 13.34490 t_ib3__0017 13.37220 13.37230 13.37310 13.37330 t_ib3__0018 13.41300 13.41310 13.41400 13.41410 t_ib3__0019 13.44310 13.44310 13.44400 13.44420 t_ib3__0020 13.48640 13.48640 13.48730 13.48760 t_ib3__0021 13.52740 13.52690 13.52820 13.52850 t_ib3__0022 13.57090 13.57030 13.57170 13.57200 t_ib3__0023 13.61340 13.61290 13.61430 13.61450 t_ib3__0024 13.65490 13.65430 13.65570 13.65600 t_ib3__0025 13.69530 13.69470 13.69610 13.69640 t_ib3__0026 13.73420 13.73350 13.73490 13.73520 t_ib3__0027 13.77090 13.77020 13.77160 13.77190 t_ib3__0028 13.80700 13.80630 13.80780 13.80800 t_ib3__0029 13.83910 13.83840 13.83980 13.84000 t_ib3__0030 13.87330 13.87260 13.87400 13.87430 t_ib3__0031 13.90590 13.90490 13.90650 13.90680 t_ib3__0032 13.93780 13.93690 13.93840 13.93870 t_ib3__0033 13.96830 13.96730 13.96890 13.96920 t_ib3__0034 13.99800 13.99710 13.99860 13.99890 t_ib3__0035 14.02680 14.02590 14.02740 14.02760 t_ib3__0036 14.05460 14.05360 14.05510 14.05540 t_ib3__0037 14.08110 14.08010 14.08160 14.08180 t_ib3__0038 14.10740 14.10640 14.10790 14.10810 t_ib3__0039 14.13180 14.13080 14.13230 14.13250 t_ib3__0040 14.15660 14.15560 14.15700 14.15720 t_ib3__0041 14.18040 14.17940 14.18080 14.18100 t_ib3__0042 14.20320 14.20210 14.20350 14.20380 t_ib3__0043 14.22450 14.22350 14.22490 14.22510 t_ib3__0044 14.24700 14.24600 14.24730 14.24760 t_ib3__0045 14.26880 14.26780 14.26920 14.26940 t_ib3__0046 14.29040 14.28930 14.29070 14.29090 t_ib3__0047 14.31150 14.31040 14.31170 14.31190 t_ib3__0048 14.33210 14.33100 14.33230 14.33250 t_ib3__0049 14.35200 14.35100 14.35230 14.35250 t_ib3__0050 14.37130 14.37030 14.37160 14.37170 "),header=FALSE,col.names=c("name","base","perturb1","perturb2","perturb3")) [[alternative HTML version deleted]]
arun
2014-Feb-17 17:17 UTC
[R] Return rows with largest, cumulative, between column differences
Hi, Try: indx <-? which(round(rowSums(abs(sweep(edm[,grep("perturb",colnames(edm))],1,edm[,"base"],`-`))),4) >=0.0025) ##if there is threshold value ?indx #[1] 22 23 24 25 28 31 33 edm[indx,] A.K. On Monday, February 17, 2014 11:49 AM, "Morway, Eric" <emorway at usgs.gov> wrote: Using the small data set below (which? is a small snippet of a much larger dataset), I'd like to get the row(s) that corresponds to the largest, cumulative, absolute difference between the value in "base" and each of the remaining columns. For example, in row 2, the three absolute differences of abs(12.9664-12.9663), abs(12.9664-12.9666), abs(12.9664-12.9665), sum to 0.0004 (0.0001+0.0002+ 0.0001). Row 3 would be abs(12.9853-12.9847)+abs(12.9853-12.9856)+abs(12.9853-12.9852)=0.001. For the example data set provided, a check in excel revealed that rows 22-25, 28, 31, and 33 all had values of 0.0025, the largest cumulative residual among the three columns subtracted from the "base" column. Because the larger dataset will have more "perturb" columns to be included in the analysis, I'm hoping any offered solution will take that into account. Thanks, Eric edm <- read.table(textConnection(" t_ib3__0001? ? ? ? ? ? 12.94150 12.94150? ? ? 12.94180? ? ? 12.94160 t_ib3__0002? ? ? ? ? ? 12.96640? ? ? 12.96630? ? ? 12.96660 12.96650 t_ib3__0003? ? ? ? ? ? 12.98530? ? ? 12.98470? ? ? 12.98560 12.98520 t_ib3__0004? ? ? ? ? ? 13.00940? ? ? 13.00890? ? ? 13.00970 13.00930 t_ib3__0005? ? ? ? ? ? 13.02840? ? ? 13.02880? ? ? 13.02930 13.02930 t_ib3__0006? ? ? ? ? ? 13.05390? ? ? 13.05430? ? ? 13.05480 13.05470 t_ib3__0007? ? ? ? ? ? 13.07650? ? ? 13.07690? ? ? 13.07740 13.07730 t_ib3__0008? ? ? ? ? ? 13.10210? ? ? 13.10240? ? ? 13.10290 13.10280 t_ib3__0009? ? ? ? ? ? 13.12530? ? ? 13.12560? ? ? 13.12610 13.12610 t_ib3__0010? ? ? ? ? ? 13.15260? ? ? 13.15290? ? ? 13.15340 13.15340 t_ib3__0011? ? ? ? ? ? 13.17750? ? ? 13.17780? ? ? 13.17830 13.17830 t_ib3__0012? ? ? ? ? ? 13.20850? ? ? 13.20870? ? ? 13.20930 13.20930 t_ib3__0013? ? ? ? ? ? 13.23710? ? ? 13.23740? ? ? 13.23800 13.23800 t_ib3__0014? ? ? ? ? ? 13.27270? ? ? 13.27290? ? ? 13.27350 13.27360 t_ib3__0015? ? ? ? ? ? 13.30560? ? ? 13.30580? ? ? 13.30650 13.30660 t_ib3__0016? ? ? ? ? ? 13.34390? ? ? 13.34400? ? ? 13.34480 13.34490 t_ib3__0017? ? ? ? ? ? 13.37220? ? ? 13.37230? ? ? 13.37310 13.37330 t_ib3__0018? ? ? ? ? ? 13.41300? ? ? 13.41310? ? ? 13.41400 13.41410 t_ib3__0019? ? ? ? ? ? 13.44310? ? ? 13.44310? ? ? 13.44400 13.44420 t_ib3__0020? ? ? ? ? ? 13.48640? ? ? 13.48640? ? ? 13.48730 13.48760 t_ib3__0021? ? ? ? ? ? 13.52740? ? ? 13.52690? ? ? 13.52820 13.52850 t_ib3__0022? ? ? ? ? ? 13.57090? ? ? 13.57030? ? ? 13.57170 13.57200 t_ib3__0023? ? ? ? ? ? 13.61340? ? ? 13.61290? ? ? 13.61430 13.61450 t_ib3__0024? ? ? ? ? ? 13.65490? ? ? 13.65430? ? ? 13.65570 13.65600 t_ib3__0025? ? ? ? ? ? 13.69530? ? ? 13.69470? ? ? 13.69610 13.69640 t_ib3__0026? ? ? ? ? ? 13.73420? ? ? 13.73350? ? ? 13.73490 13.73520 t_ib3__0027? ? ? ? ? ? 13.77090? ? ? 13.77020? ? ? 13.77160 13.77190 t_ib3__0028? ? ? ? ? ? 13.80700? ? ? 13.80630? ? ? 13.80780 13.80800 t_ib3__0029? ? ? ? ? ? 13.83910? ? ? 13.83840? ? ? 13.83980 13.84000 t_ib3__0030? ? ? ? ? ? 13.87330? ? ? 13.87260? ? ? 13.87400 13.87430 t_ib3__0031? ? ? ? ? ? 13.90590? ? ? 13.90490? ? ? 13.90650 13.90680 t_ib3__0032? ? ? ? ? ? 13.93780? ? ? 13.93690? ? ? 13.93840 13.93870 t_ib3__0033? ? ? ? ? ? 13.96830? ? ? 13.96730? ? ? 13.96890 13.96920 t_ib3__0034? ? ? ? ? ? 13.99800? ? ? 13.99710? ? ? 13.99860 13.99890 t_ib3__0035? ? ? ? ? ? 14.02680? ? ? 14.02590? ? ? 14.02740 14.02760 t_ib3__0036? ? ? ? ? ? 14.05460? ? ? 14.05360? ? ? 14.05510 14.05540 t_ib3__0037? ? ? ? ? ? 14.08110? ? ? 14.08010? ? ? 14.08160 14.08180 t_ib3__0038? ? ? ? ? ? 14.10740? ? ? 14.10640? ? ? 14.10790 14.10810 t_ib3__0039? ? ? ? ? ? 14.13180? ? ? 14.13080? ? ? 14.13230 14.13250 t_ib3__0040? ? ? ? ? ? 14.15660? ? ? 14.15560? ? ? 14.15700 14.15720 t_ib3__0041? ? ? ? ? ? 14.18040? ? ? 14.17940? ? ? 14.18080 14.18100 t_ib3__0042? ? ? ? ? ? 14.20320? ? ? 14.20210? ? ? 14.20350 14.20380 t_ib3__0043? ? ? ? ? ? 14.22450? ? ? 14.22350? ? ? 14.22490 14.22510 t_ib3__0044? ? ? ? ? ? 14.24700? ? ? 14.24600? ? ? 14.24730 14.24760 t_ib3__0045? ? ? ? ? ? 14.26880? ? ? 14.26780? ? ? 14.26920 14.26940 t_ib3__0046? ? ? ? ? ? 14.29040? ? ? 14.28930? ? ? 14.29070 14.29090 t_ib3__0047? ? ? ? ? ? 14.31150? ? ? 14.31040? ? ? 14.31170 14.31190 t_ib3__0048? ? ? ? ? ? 14.33210? ? ? 14.33100? ? ? 14.33230 14.33250 t_ib3__0049? ? ? ? ? ? 14.35200? ? ? 14.35100? ? ? 14.35230 14.35250 t_ib3__0050? ? ? ? ? ? 14.37130? ? ? 14.37030? ? ? 14.37160 14.37170 "),header=FALSE,col.names=c("name","base","perturb1","perturb2","perturb3")) ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.