thr3ads.net - R help - [R] help for fine mappting [Jun 2016]

If this information is useful, please help other people find it:
Share via:

greg holly

2016-Jun-15 15:20 UTC

[R] help for fine mappting

dear all;


I am sorry for this posting. I have got help from Jim, Bert, Jeff and PIKAL
on similar issue before. I tried to modify Jim`s code to the real data but
it did not work. Now I am posting first two rows the imitation of real data
using dput() format (please see at the bottom).  I have two data sets,
data=map and data=ref. The first to rows of each data set are given below.
Data map has more than 27 million and data ref has about 560 rows.
Basically I need run two different tasks. My R codes for these task are
given below but they do not work properly. I sincerely do appreciate your
helps.



Regards,

Greg



Task 1)

For example, the first and second columns for row 1 in data ref are chr1,
6457839 and 6638389. So I need write an R code normally first look the
first row in ref (which they are chre1 6457839  and 6638389) than summing
the column of "map$post_prob" and give the number of map$snp falls
between
6457839  and 6638389 that  their cumulative sum is >0.85. Then do the same
for the second, third....in ref. At the end I would like a table gave below
(need_ouput). Please notice the all value specified info in ref data file
are exist in map$CHR and map$POS columns.



Task2)

Again example, the first and second columns for row 1 in data ref are chr1,
6457839 and 6638389. So I need that R gives me the minimum map$p for the 2
chr1, 6457839 and 6638389 (as there are many snps between these regions and
would like choose the smallest one in those regions. Than do the same for
the second, third....rows in ref.



Then put the results of Task1 and Task2 into need_ouput file




#R codes modified from Jim


map2<-map[order(map$CHR, map$POS, -map$post_prob),]



                # get a field for the counts

 ref$n<-NA



                # and a field for the minimum p values

 ref$min_p<-NA



                # get the number of rows in "ref"

 nref<-dim(ref)[1]

 for(i in 1:nref) {

  CHR<- which(map2$CHR==ref$CHR[i])

  POS_start<-which(map2$POS==ref$POS_start[i])

  POS_end<-which(map2$POS==ref$POS_end[i])

  cat("CHR", "CHR","
POS_start",POS_start,"POS_end",POS_end,"\n")



                # get the range of matches

  POSrange<-range(c(CHR,POS_start,POS_end))



                # convert this to a sequence spanning all matches

  allPOS<-POSrange[1]:POSrange[2]

  ref$n[i]<-sum(map2$post_prob[allPOS] > 0.99)

  ref$min_p[i]<-min(map2$p[allPOS])

 }





      dput(map)

      structure(list(CHR = structure(c(1L, 1L), .Label = "chr1", class
"factor"),

          snp = structure(1:2, .Label = c("rs4747841",
"rs4749917"), class
= "factor"),

          Allel1 = structure(1:2, .Label = c("A", "T"),
class = "factor"),

          Allel2 = structure(c(2L, 1L), .Label = c("C",
"G"), class "factor"),

          fr = c(0.551, 0.436), effec = c(-0.0011, 0.0011), SE = c(0.0029,

          0.0029), p = c(0.7, 0.7), POS = c(9960129L, 9960259L), post_prob
= c(1.248817e-158,

          1.248817e-158)), .Names = c("CHR", "snp",
"Allel1", "Allel2",

      "fr", "effec", "SE", "p",
"POS", "post_prob"), class = "data.frame",
row.names = c(NA,

      -2L))





     dput(ref)

     structure(list(CHR = structure(1:2, .Label = c("chr10",
"chr14"

     ), class = "factor"), POS_start = c(6457839L, 21005246L), POS_end
c(6638389L,

     21550658L)), .Names = c("CHR", "POS_start",
"POS_end"), class "data.frame", row.names = c(NA,

-2L))





dput(need_output)

structure(list(CHR = structure(1:2, .Label = c("chr1",
"chr22"

), class = "factor"), POS = c(312127953L, 46487552L), POS_start
c(32036927L,

45766451L), POS_end = c(3232240262, 46801601), snp = structure(1:2, .Label
= c("rs1143427",

"rs55958907"), class = "factor"), alle1l = structure(1:2,
.Label = c("G",

"T"), class = "factor"), allel2 = structure(1:2, .Label =
c("A",

"G"), class = "factor"), fr = c(0.278, 0.974), effec =
c(0.6,

0.106), SE = c(0.015, 0.027), P = c(0.000156, 7.63e-05), post_prob c(0.229,

0.125), n = c(612L, 4218L)), .Names = c("CHR", "POS",
"POS_start",

"POS_end", "snp", "alle1l", "allel2",
"fr", "effec", "SE", "P",

"post_prob", "n"), class = "data.frame", row.names
= c(NA, -2L

))

	[[alternative HTML version deleted]]

greg holly

2016-Jun-16 00:45 UTC

head link

[R] Fwd: help for fine mappting

Dear all;


Unfortunately I did not get any response for my  following questions. It is
time sensitive job. I would be greatly appreciate if you give help soon.


Regards,

Greg



I am sorry for this posting. I have got help from Jim, Bert, Jeff and PIKAL
on similar issue before. I tried to modify Jim`s code to the real data but
it did not work. Now I am posting first two rows the imitation of real data
using dput() format (please see at the bottom).  I have two data sets,
data=map and data=ref. The first to rows of each data set are given below.
Data map has more than 27 million and data ref has about 560 rows.
Basically I need run two different tasks. My R codes for these task are
given below but they do not work properly. I sincerely do appreciate your
helps.



Regards,

Greg



Task 1)

For example, the first and second columns for row 1 in data ref are chr1,
6457839 and 6638389. So I need write an R code normally first look the
first row in ref (which they are chre1 6457839  and 6638389) than summing
the column of "map$post_prob" and give the number of map$snp falls
between
6457839  and 6638389 that  their cumulative sum is >0.85. Then do the same
for the second, third....in ref. At the end I would like a table gave below
(need_ouput). Please notice the all value specified info in ref data file
are exist in map$CHR and map$POS columns.



Task2)

Again example, the first and second columns for row 1 in data ref are chr1,
6457839 and 6638389. So I need that R gives me the minimum map$p for the 2
chr1, 6457839 and 6638389 (as there are many snps between these regions and
would like choose the smallest one in those regions. Than do the same for
the second, third....rows in ref.



Then put the results of Task1 and Task2 into need_ouput file




#R codes modified from Jim


map2<-map[order(map$CHR, map$POS, -map$post_prob),]



                # get a field for the counts

 ref$n<-NA



                # and a field for the minimum p values

 ref$min_p<-NA



                # get the number of rows in "ref"

 nref<-dim(ref)[1]

 for(i in 1:nref) {

  CHR<- which(map2$CHR==ref$CHR[i])

  POS_start<-which(map2$POS==ref$POS_start[i])

  POS_end<-which(map2$POS==ref$POS_end[i])

  cat("CHR", "CHR","
POS_start",POS_start,"POS_end",POS_end,"\n")



                # get the range of matches

  POSrange<-range(c(CHR,POS_start,POS_end))



                # convert this to a sequence spanning all matches

  allPOS<-POSrange[1]:POSrange[2]

  ref$n[i]<-sum(map2$post_prob[allPOS] > 0.99)

  ref$min_p[i]<-min(map2$p[allPOS])

 }





      dput(map)

      structure(list(CHR = structure(c(1L, 1L), .Label = "chr1", class
"factor"),

          snp = structure(1:2, .Label = c("rs4747841",
"rs4749917"), class
= "factor"),

          Allel1 = structure(1:2, .Label = c("A", "T"),
class = "factor"),

          Allel2 = structure(c(2L, 1L), .Label = c("C",
"G"), class "factor"),

          fr = c(0.551, 0.436), effec = c(-0.0011, 0.0011), SE = c(0.0029,

          0.0029), p = c(0.7, 0.7), POS = c(9960129L, 9960259L), post_prob
= c(1.248817e-158,

          1.248817e-158)), .Names = c("CHR", "snp",
"Allel1", "Allel2",

      "fr", "effec", "SE", "p",
"POS", "post_prob"), class = "data.frame",
row.names = c(NA,

      -2L))





     dput(ref)

     structure(list(CHR = structure(1:2, .Label = c("chr10",
"chr14"

     ), class = "factor"), POS_start = c(6457839L, 21005246L), POS_end
c(6638389L,

     21550658L)), .Names = c("CHR", "POS_start",
"POS_end"), class "data.frame", row.names = c(NA,

-2L))





dput(need_output)

structure(list(CHR = structure(1:2, .Label = c("chr1",
"chr22"

), class = "factor"), POS = c(312127953L, 46487552L), POS_start
c(32036927L,

45766451L), POS_end = c(3232240262, 46801601), snp = structure(1:2, .Label
= c("rs1143427",

"rs55958907"), class = "factor"), alle1l = structure(1:2,
.Label = c("G",

"T"), class = "factor"), allel2 = structure(1:2, .Label =
c("A",

"G"), class = "factor"), fr = c(0.278, 0.974), effec =
c(0.6,

0.106), SE = c(0.015, 0.027), P = c(0.000156, 7.63e-05), post_prob c(0.229,

0.125), n = c(612L, 4218L)), .Names = c("CHR", "POS",
"POS_start",

"POS_end", "snp", "alle1l", "allel2",
"fr", "effec", "SE", "P",

"post_prob", "n"), class = "data.frame", row.names
= c(NA, -2L

))

	[[alternative HTML version deleted]]

PIKAL Petr

2016-Jun-16 07:16 UTC

head link

[R] help for fine mappting

Hi

From posted ref and map you cannot obtain final file need, they have nothing in
common.

answers see in line
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of greg
holly
> Sent: Wednesday, June 15, 2016 5:21 PM
> To: r-help at r-project.org
> Subject: [R] help for fine mappting
>
> dear all;
>
>
> I am sorry for this posting. I have got help from Jim, Bert, Jeff and PIKAL
> on similar issue before. I tried to modify Jim`s code to the real data but
> it did not work. Now I am posting first two rows the imitation of real data
> using dput() format (please see at the bottom).  I have two data sets,
> data=map and data=ref. The first to rows of each data set are given below.
> Data map has more than 27 million and data ref has about 560 rows.
> Basically I need run two different tasks. My R codes for these task are
> given below but they do not work properly. I sincerely do appreciate your
> helps.
>
>
>
> Regards,
>
> Greg
>
>
>
> Task 1)
>
> For example, the first and second columns for row 1 in data ref are chr1,
> 6457839 and 6638389. So I need write an R code normally first look the
> first row in ref (which they are chre1 6457839  and 6638389) than summing
> the column of "map$post_prob" and give the number of map$snp
falls
> between
> 6457839  and 6638389 that  their cumulative sum is >0.85. Then do the
same
> for the second, third....in ref. At the end I would like a table gave below
> (need_ouput). Please notice the all value specified info in ref data file
> are exist in map$CHR and map$POS columns.
If I understand correctly you need to get

sel <- map$POS >= ref$POS_start & map$POS < ref$POS_end
result1 <- sum( map$post_prob[sel] )
and then check if the result is >0.85
(but in your final table post_prob is below this threshold)
compute
result2 <- length( map$post_prob[sel] )

and add the results into final table.
>
>
>
> Task2)
>
> Again example, the first and second columns for row 1 in data ref are chr1,
> 6457839 and 6638389. So I need that R gives me the minimum map$p for the
> 2
> chr1, 6457839 and 6638389 (as there are many snps between these regions
> and
> would like choose the smallest one in those regions. Than do the same for
> the second, third....rows in ref.
Your task 2 can be done alongside task1
result3 <- min( map$p[sel] )
>
>
>
> Then put the results of Task1 and Task2 into need_ouput file
Again if I understand correctly your result data frame shall have same number of
rows as ref data frame. I wonder how do you want to put there POS, snp,
allele... and other multiple values from map data frame? How do you want to
summarise them?

Two final comments:

Do not post in HTML, you can see that the code below is rather scrammbled due to
behaviour of HTML mail.
If posting some examples, it would be preferable that they can be used directly
with code we are trying to find to help you solve your task. Especially if you
want quick answer.

Cheers
Petr
>
>
>
>
> #R codes modified from Jim
>
>
> map2<-map[order(map$CHR, map$POS, -map$post_prob),]
>
>
>
>                 # get a field for the counts
>
>  ref$n<-NA
>
>
>
>                 # and a field for the minimum p values
>
>  ref$min_p<-NA
>
>
>
>                 # get the number of rows in "ref"
>
>  nref<-dim(ref)[1]
>
>  for(i in 1:nref) {
>
>   CHR<- which(map2$CHR==ref$CHR[i])
>
>   POS_start<-which(map2$POS==ref$POS_start[i])
>
>   POS_end<-which(map2$POS==ref$POS_end[i])
>
>   cat("CHR", "CHR","
POS_start",POS_start,"POS_end",POS_end,"\n")
>
>
>
>                 # get the range of matches
>
>   POSrange<-range(c(CHR,POS_start,POS_end))
>
>
>
>                 # convert this to a sequence spanning all matches
>
>   allPOS<-POSrange[1]:POSrange[2]
>
>   ref$n[i]<-sum(map2$post_prob[allPOS] > 0.99)
>
>   ref$min_p[i]<-min(map2$p[allPOS])
>
>  }
>
>
>
>
>
>       dput(map)
>
>       structure(list(CHR = structure(c(1L, 1L), .Label = "chr1",
class > "factor"),
>
>           snp = structure(1:2, .Label = c("rs4747841",
"rs4749917"), class
> = "factor"),
>
>           Allel1 = structure(1:2, .Label = c("A", "T"),
class = "factor"),
>
>           Allel2 = structure(c(2L, 1L), .Label = c("C",
"G"), class > "factor"),
>
>           fr = c(0.551, 0.436), effec = c(-0.0011, 0.0011), SE = c(0.0029,
>
>           0.0029), p = c(0.7, 0.7), POS = c(9960129L, 9960259L), post_prob
> = c(1.248817e-158,
>
>           1.248817e-158)), .Names = c("CHR", "snp",
"Allel1", "Allel2",
>
>       "fr", "effec", "SE", "p",
"POS", "post_prob"), class = "data.frame",
> row.names = c(NA,
>
>       -2L))
>
>
>
>
>
>      dput(ref)
>
>      structure(list(CHR = structure(1:2, .Label = c("chr10",
"chr14"
>
>      ), class = "factor"), POS_start = c(6457839L, 21005246L),
POS_end > c(6638389L,
>
>      21550658L)), .Names = c("CHR", "POS_start",
"POS_end"), class > "data.frame", row.names = c(NA,
>
> -2L))
>
>
>
>
>
> dput(need_output)
>
> structure(list(CHR = structure(1:2, .Label = c("chr1",
"chr22"
>
> ), class = "factor"), POS = c(312127953L, 46487552L), POS_start
> c(32036927L,
>
> 45766451L), POS_end = c(3232240262, 46801601), snp = structure(1:2, .Label
> = c("rs1143427",
>
> "rs55958907"), class = "factor"), alle1l =
structure(1:2, .Label = c("G",
>
> "T"), class = "factor"), allel2 = structure(1:2, .Label
= c("A",
>
> "G"), class = "factor"), fr = c(0.278, 0.974), effec =
c(0.6,
>
> 0.106), SE = c(0.015, 0.027), P = c(0.000156, 7.63e-05), post_prob >
c(0.229,
>
> 0.125), n = c(612L, 4218L)), .Names = c("CHR", "POS",
"POS_start",
>
> "POS_end", "snp", "alle1l",
"allel2", "fr", "effec", "SE",
"P",
>
> "post_prob", "n"), class = "data.frame",
row.names = c(NA, -2L
>
> ))
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
________________________________
Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny
pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho
odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho
syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email
jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i
zpo?d?n?m p?enosu e-mailu.

V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
- vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a
to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout;
Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce
s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m
dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost
??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn?
pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn?
osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi
?i osob? j?m zastoupen? zn?m?.

This e-mail and any documents attached to it may be confidential and are
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender.
Delete the contents of this e-mail with all attachments and its copies from your
system.
If you are not the intended recipient of this e-mail, you are not authorized to
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately
accept such offer; The sender of this e-mail (offer) excludes any acceptance of
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into
any contracts on behalf of the company except for cases in which he/she is
expressly authorized to do so in writing, and such authorization or power of
attorney is submitted to the recipient or the person represented by the
recipient, or the existence of such authorization is known to the recipient of
the person represented by the recipient.

PIKAL Petr

2016-Jun-16 13:28 UTC

head link

[R] help for fine mappting

Hi

Did you test my suggestions? If not, why not? If yes, in what respect they did
not work?

sel <- map$POS >= ref$POS_start[1] & map$POS < ref$POS_end[1]
result1 <- sum( map$post_prob[sel] )
result2 <- length( map$post_prob[sel] )
result3 <- min( map$p[sel] )

should give you desired values. It is up to you how do you want to organise
them, as from your examples I do not have faintest idea what you want to do.

And keep your responds to r help list, I cc?d it.

Cheers
Petr

From: greg holly [mailto:mak.hholly at gmail.com]
Sent: Thursday, June 16, 2016 3:06 PM
To: PIKAL Petr <petr.pikal at precheza.cz>
Subject: Re: [R] help for fine mappting

Hi PIKAL;

Thanks so much your writing. I am sorry if I could not explain precisely. All
information in ref file are exist in map file. So they are in common. Ref file
has about 560 and map file has 27 million rows.That is CHR column common in both
and all value given ref$POS_start & ref$POS_end columns  are exist in
map$POS.

Thanks in advance,

Greg

On Thu, Jun 16, 2016 at 3:16 AM, PIKAL Petr <petr.pikal at
precheza.cz<mailto:petr.pikal at precheza.cz>> wrote:
Hi

From posted ref and map you cannot obtain final file need, they have nothing in
common.

answers see in line
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at
r-project.org<mailto:r-help-bounces at r-project.org>] On Behalf Of greg
holly
> Sent: Wednesday, June 15, 2016 5:21 PM
> To: r-help at r-project.org<mailto:r-help at r-project.org>
> Subject: [R] help for fine mappting
>
> dear all;
>
>
> I am sorry for this posting. I have got help from Jim, Bert, Jeff and PIKAL
> on similar issue before. I tried to modify Jim`s code to the real data but
> it did not work. Now I am posting first two rows the imitation of real data
> using dput() format (please see at the bottom).  I have two data sets,
> data=map and data=ref. The first to rows of each data set are given below.
> Data map has more than 27 million and data ref has about 560 rows.
> Basically I need run two different tasks. My R codes for these task are
> given below but they do not work properly. I sincerely do appreciate your
> helps.
>
>
>
> Regards,
>
> Greg
>
>
>
> Task 1)
>
> For example, the first and second columns for row 1 in data ref are chr1,
> 6457839 and 6638389. So I need write an R code normally first look the
> first row in ref (which they are chre1 6457839  and 6638389) than summing
> the column of "map$post_prob" and give the number of map$snp
falls
> between
> 6457839  and 6638389 that  their cumulative sum is >0.85. Then do the
same
> for the second, third....in ref. At the end I would like a table gave below
> (need_ouput). Please notice the all value specified info in ref data file
> are exist in map$CHR and map$POS columns.If I understand correctly you need to get

sel <- map$POS >= ref$POS_start & map$POS < ref$POS_end
result1 <- sum( map$post_prob[sel] )
and then check if the result is >0.85
(but in your final table post_prob is below this threshold)
compute
result2 <- length( map$post_prob[sel] )

and add the results into final table.
>
>
>
> Task2)
>
> Again example, the first and second columns for row 1 in data ref are chr1,
> 6457839 and 6638389. So I need that R gives me the minimum map$p for the
> 2
> chr1, 6457839 and 6638389 (as there are many snps between these regions
> and
> would like choose the smallest one in those regions. Than do the same for
> the second, third....rows in ref.
Your task 2 can be done alongside task1
result3 <- min( map$p[sel] )
>
>
>
> Then put the results of Task1 and Task2 into need_ouput file
Again if I understand correctly your result data frame shall have same number of
rows as ref data frame. I wonder how do you want to put there POS, snp,
allele... and other multiple values from map data frame? How do you want to
summarise them?

Two final comments:

Do not post in HTML, you can see that the code below is rather scrammbled due to
behaviour of HTML mail.
If posting some examples, it would be preferable that they can be used directly
with code we are trying to find to help you solve your task. Especially if you
want quick answer.

Cheers
Petr
>
>
>
>
> #R codes modified from Jim
>
>
> map2<-map[order(map$CHR, map$POS, -map$post_prob),]
>
>
>
>                 # get a field for the counts
>
>  ref$n<-NA
>
>
>
>                 # and a field for the minimum p values
>
>  ref$min_p<-NA
>
>
>
>                 # get the number of rows in "ref"
>
>  nref<-dim(ref)[1]
>
>  for(i in 1:nref) {
>
>   CHR<- which(map2$CHR==ref$CHR[i])
>
>   POS_start<-which(map2$POS==ref$POS_start[i])
>
>   POS_end<-which(map2$POS==ref$POS_end[i])
>
>   cat("CHR", "CHR","
POS_start",POS_start,"POS_end",POS_end,"\n")
>
>
>
>                 # get the range of matches
>
>   POSrange<-range(c(CHR,POS_start,POS_end))
>
>
>
>                 # convert this to a sequence spanning all matches
>
>   allPOS<-POSrange[1]:POSrange[2]
>
>   ref$n[i]<-sum(map2$post_prob[allPOS] > 0.99)
>
>   ref$min_p[i]<-min(map2$p[allPOS])
>
>  }
>
>
>
>
>
>       dput(map)
>
>       structure(list(CHR = structure(c(1L, 1L), .Label = "chr1",
class > "factor"),
>
>           snp = structure(1:2, .Label = c("rs4747841",
"rs4749917"), class
> = "factor"),
>
>           Allel1 = structure(1:2, .Label = c("A", "T"),
class = "factor"),
>
>           Allel2 = structure(c(2L, 1L), .Label = c("C",
"G"), class > "factor"),
>
>           fr = c(0.551, 0.436), effec = c(-0.0011, 0.0011), SE = c(0.0029,
>
>           0.0029), p = c(0.7, 0.7), POS = c(9960129L, 9960259L), post_prob
> = c(1.248817e-158,
>
>           1.248817e-158)), .Names = c("CHR", "snp",
"Allel1", "Allel2",
>
>       "fr", "effec", "SE", "p",
"POS", "post_prob"), class = "data.frame",
> row.names = c(NA,
>
>       -2L))
>
>
>
>
>
>      dput(ref)
>
>      structure(list(CHR = structure(1:2, .Label = c("chr10",
"chr14"
>
>      ), class = "factor"), POS_start = c(6457839L, 21005246L),
POS_end > c(6638389L,
>
>      21550658L)), .Names = c("CHR", "POS_start",
"POS_end"), class > "data.frame", row.names = c(NA,
>
> -2L))
>
>
>
>
>
> dput(need_output)
>
> structure(list(CHR = structure(1:2, .Label = c("chr1",
"chr22"
>
> ), class = "factor"), POS = c(312127953L, 46487552L), POS_start
> c(32036927L,
>
> 45766451L), POS_end = c(3232240262, 46801601), snp = structure(1:2, .Label
> = c("rs1143427",
>
> "rs55958907"), class = "factor"), alle1l =
structure(1:2, .Label = c("G",
>
> "T"), class = "factor"), allel2 = structure(1:2, .Label
= c("A",
>
> "G"), class = "factor"), fr = c(0.278, 0.974), effec =
c(0.6,
>
> 0.106), SE = c(0.015, 0.027), P = c(0.000156, 7.63e-05), post_prob >
c(0.229,
>
> 0.125), n = c(612L, 4218L)), .Names = c("CHR", "POS",
"POS_start",
>
> "POS_end", "snp", "alle1l",
"allel2", "fr", "effec", "SE",
"P",
>
> "post_prob", "n"), class = "data.frame",
row.names = c(NA, -2L
>
> ))
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



________________________________
Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny
pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho
odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho
syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email
jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i
zpo?d?n?m p?enosu e-mailu.

V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
- vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a
to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout;
Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce
s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m
dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost
??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn?
pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn?
osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi
?i osob? j?m zastoupen? zn?m?.

This e-mail and any documents attached to it may be confidential and are
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender.
Delete the contents of this e-mail with all attachments and its copies from your
system.
If you are not the intended recipient of this e-mail, you are not authorized to
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately
accept such offer; The sender of this e-mail (offer) excludes any acceptance of
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into
any contracts on behalf of the company except for cases in which he/she is
expressly authorized to do so in writing, and such authorization or power of
attorney is submitted to the recipient or the person represented by the
recipient, or the existence of such authorization is known to the recipient of
the person represented by the recipient.

	[[alternative HTML version deleted]]

PIKAL Petr

2016-Jun-17 08:05 UTC

head link

[R] help for fine mappting

Hi Greg

Seems to me that spending some time with R tutorial would be way forward for
you.

Something like that could work, but without some fake (but resembling real) data
it is untested.

result <- data.frame(ss=rep(NA, nrow(ref)), ll= rep(NA, nrow(ref)), mm=
rep(NA, nrow(ref)))

for (i in 1:nrow(ref)) {
sel <- map$ SNP_chr==ref$CHR & map$POS >= ref$POS_start[i] &
map$POS < ref$POS_end[i]
result1 <- sum( map$post_prob[sel] )
result2 <- length( map$post_prob[sel] )
result3 <- min( map$p[sel] )
result[i, ] <-c(result1, result2, result3)
}

Cheers
Petr


From: greg holly [mailto:mak.hholly at gmail.com]
Sent: Thursday, June 16, 2016 10:04 PM
To: PIKAL Petr <petr.pikal at precheza.cz>
Subject: Re: [R] help for fine mappting

Hi Petr;

I got chance to try your codes. Once again thanks a lot. It seems the results3
is correct after I modified the "sel" as
sel <- map$ SNP_chr==ref$CHR & map$POS >= ref$POS_start[1] &
map$POS < ref$POS_end[1] in the your codes:

sel <- map$POS >= ref$POS_start[1] & map$POS < ref$POS_end[1]
result1 <- sum( map$post_prob[sel] )
result2 <- length( map$post_prob[sel] )
result3 <- min( map$p[sel] )

and results3 is output of only the dirst row in "ref"file. I need
results of other rows which I have 560 rows in "ref" file. I think I
need a loop which more difficult part for me as I am beginner in R. In addition
I need a output at the end as follow which has 560 rows.

All the best
Greg

structure(list(CHR = structure(1:2, .Label = c("chr1",
"chr22"
), class = "factor"), POS = c(312127953L, 46487552L), POS_start =
c(32036927L,
45766451L), POS_end = c(3232240262<tel:%283232240262>, 46801601), snp =
structure(1:2, .Label = c("rs1143427",
"rs55958907"), class = "factor"), alle1l = structure(1:2,
.Label = c("G",
"T"), class = "factor"), allel2 = structure(1:2, .Label =
c("A",
"G"), class = "factor"), fr = c(0.278, 0.974), effec =
c(0.6,
0.106), SE = c(0.015, 0.027), P = c(0.000156, 7.63e-05), post_prob = c(0.229,
0.125), n = c(612L, 4218L)), .Names = c("CHR", "POS",
"POS_start",
"POS_end", "snp", "alle1l", "allel2",
"fr", "effec", "SE", "P",
"post_prob", "n"), class = "data.frame", row.names
= c(NA, -2L
))

On Thu, Jun 16, 2016 at 9:28 AM, PIKAL Petr <petr.pikal at
precheza.cz<mailto:petr.pikal at precheza.cz>> wrote:
Hi

Did you test my suggestions? If not, why not? If yes, in what respect they did
not work?

sel <- map$POS >= ref$POS_start[1] & map$POS < ref$POS_end[1]
result1 <- sum( map$post_prob[sel] )
result2 <- length( map$post_prob[sel] )
result3 <- min( map$p[sel] )

should give you desired values. It is up to you how do you want to organise
them, as from your examples I do not have faintest idea what you want to do.

And keep your responds to r help list, I cc?d it.

Cheers
Petr

From: greg holly [mailto:mak.hholly at gmail.com<mailto:mak.hholly at
gmail.com>]
Sent: Thursday, June 16, 2016 3:06 PM
To: PIKAL Petr <petr.pikal at precheza.cz<mailto:petr.pikal at
precheza.cz>>
Subject: Re: [R] help for fine mappting

Hi PIKAL;

Thanks so much your writing. I am sorry if I could not explain precisely. All
information in ref file are exist in map file. So they are in common. Ref file
has about 560 and map file has 27 million rows.That is CHR column common in both
and all value given ref$POS_start & ref$POS_end columns  are exist in
map$POS.

Thanks in advance,

Greg

On Thu, Jun 16, 2016 at 3:16 AM, PIKAL Petr <petr.pikal at
precheza.cz<mailto:petr.pikal at precheza.cz>> wrote:
Hi

From posted ref and map you cannot obtain final file need, they have nothing in
common.

answers see in line
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at
r-project.org<mailto:r-help-bounces at r-project.org>] On Behalf Of greg
holly
> Sent: Wednesday, June 15, 2016 5:21 PM
> To: r-help at r-project.org<mailto:r-help at r-project.org>
> Subject: [R] help for fine mappting
>
> dear all;
>
>
> I am sorry for this posting. I have got help from Jim, Bert, Jeff and PIKAL
> on similar issue before. I tried to modify Jim`s code to the real data but
> it did not work. Now I am posting first two rows the imitation of real data
> using dput() format (please see at the bottom).  I have two data sets,
> data=map and data=ref. The first to rows of each data set are given below.
> Data map has more than 27 million and data ref has about 560 rows.
> Basically I need run two different tasks. My R codes for these task are
> given below but they do not work properly. I sincerely do appreciate your
> helps.
>
>
>
> Regards,
>
> Greg
>
>
>
> Task 1)
>
> For example, the first and second columns for row 1 in data ref are chr1,
> 6457839 and 6638389. So I need write an R code normally first look the
> first row in ref (which they are chre1 6457839  and 6638389) than summing
> the column of "map$post_prob" and give the number of map$snp
falls
> between
> 6457839  and 6638389 that  their cumulative sum is >0.85. Then do the
same
> for the second, third....in ref. At the end I would like a table gave below
> (need_ouput). Please notice the all value specified info in ref data file
> are exist in map$CHR and map$POS columns.If I understand correctly you need to get

sel <- map$POS >= ref$POS_start & map$POS < ref$POS_end
result1 <- sum( map$post_prob[sel] )
and then check if the result is >0.85
(but in your final table post_prob is below this threshold)
compute
result2 <- length( map$post_prob[sel] )

and add the results into final table.
>
>
>
> Task2)
>
> Again example, the first and second columns for row 1 in data ref are chr1,
> 6457839 and 6638389. So I need that R gives me the minimum map$p for the
> 2
> chr1, 6457839 and 6638389 (as there are many snps between these regions
> and
> would like choose the smallest one in those regions. Than do the same for
> the second, third....rows in ref.
Your task 2 can be done alongside task1
result3 <- min( map$p[sel] )
>
>
>
> Then put the results of Task1 and Task2 into need_ouput file
Again if I understand correctly your result data frame shall have same number of
rows as ref data frame. I wonder how do you want to put there POS, snp,
allele... and other multiple values from map data frame? How do you want to
summarise them?

Two final comments:

Do not post in HTML, you can see that the code below is rather scrammbled due to
behaviour of HTML mail.
If posting some examples, it would be preferable that they can be used directly
with code we are trying to find to help you solve your task. Especially if you
want quick answer.

Cheers
Petr
>
>
>
>
> #R codes modified from Jim
>
>
> map2<-map[order(map$CHR, map$POS, -map$post_prob),]
>
>
>
>                 # get a field for the counts
>
>  ref$n<-NA
>
>
>
>                 # and a field for the minimum p values
>
>  ref$min_p<-NA
>
>
>
>                 # get the number of rows in "ref"
>
>  nref<-dim(ref)[1]
>
>  for(i in 1:nref) {
>
>   CHR<- which(map2$CHR==ref$CHR[i])
>
>   POS_start<-which(map2$POS==ref$POS_start[i])
>
>   POS_end<-which(map2$POS==ref$POS_end[i])
>
>   cat("CHR", "CHR","
POS_start",POS_start,"POS_end",POS_end,"\n")
>
>
>
>                 # get the range of matches
>
>   POSrange<-range(c(CHR,POS_start,POS_end))
>
>
>
>                 # convert this to a sequence spanning all matches
>
>   allPOS<-POSrange[1]:POSrange[2]
>
>   ref$n[i]<-sum(map2$post_prob[allPOS] > 0.99)
>
>   ref$min_p[i]<-min(map2$p[allPOS])
>
>  }
>
>
>
>
>
>       dput(map)
>
>       structure(list(CHR = structure(c(1L, 1L), .Label = "chr1",
class > "factor"),
>
>           snp = structure(1:2, .Label = c("rs4747841",
"rs4749917"), class
> = "factor"),
>
>           Allel1 = structure(1:2, .Label = c("A", "T"),
class = "factor"),
>
>           Allel2 = structure(c(2L, 1L), .Label = c("C",
"G"), class > "factor"),
>
>           fr = c(0.551, 0.436), effec = c(-0.0011, 0.0011), SE = c(0.0029,
>
>           0.0029), p = c(0.7, 0.7), POS = c(9960129L, 9960259L), post_prob
> = c(1.248817e-158,
>
>           1.248817e-158)), .Names = c("CHR", "snp",
"Allel1", "Allel2",
>
>       "fr", "effec", "SE", "p",
"POS", "post_prob"), class = "data.frame",
> row.names = c(NA,
>
>       -2L))
>
>
>
>
>
>      dput(ref)
>
>      structure(list(CHR = structure(1:2, .Label = c("chr10",
"chr14"
>
>      ), class = "factor"), POS_start = c(6457839L, 21005246L),
POS_end > c(6638389L,
>
>      21550658L)), .Names = c("CHR", "POS_start",
"POS_end"), class > "data.frame", row.names = c(NA,
>
> -2L))
>
>
>
>
>
> dput(need_output)
>
> structure(list(CHR = structure(1:2, .Label = c("chr1",
"chr22"
>
> ), class = "factor"), POS = c(312127953L, 46487552L), POS_start
> c(32036927L,
>
> 45766451L), POS_end = c(3232240262<tel:%283232240262>, 46801601), snp
= structure(1:2, .Label
> = c("rs1143427",
>
> "rs55958907"), class = "factor"), alle1l =
structure(1:2, .Label = c("G",
>
> "T"), class = "factor"), allel2 = structure(1:2, .Label
= c("A",
>
> "G"), class = "factor"), fr = c(0.278, 0.974), effec =
c(0.6,
>
> 0.106), SE = c(0.015, 0.027), P = c(0.000156, 7.63e-05), post_prob >
c(0.229,
>
> 0.125), n = c(612L, 4218L)), .Names = c("CHR", "POS",
"POS_start",
>
> "POS_end", "snp", "alle1l",
"allel2", "fr", "effec", "SE",
"P",
>
> "post_prob", "n"), class = "data.frame",
row.names = c(NA, -2L
>
> ))
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
-- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




________________________________
Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny
pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho
odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho
syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email
jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i
zpo?d?n?m p?enosu e-mailu.

V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
- vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a
to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout;
Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce
s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m
dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost
??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn?
pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn?
osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi
?i osob? j?m zastoupen? zn?m?.

This e-mail and any documents attached to it may be confidential and are
intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender.
Delete the contents of this e-mail with all attachments and its copies from your
system.
If you are not the intended recipient of this e-mail, you are not authorized to
use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by
modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a
contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately
accept such offer; The sender of this e-mail (offer) excludes any acceptance of
the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an
express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into
any contracts on behalf of the company except for cases in which he/she is
expressly authorized to do so in writing, and such authorization or power of
attorney is submitted to the recipient or the person represented by the
recipient, or the existence of such authorization is known to the recipient of
the person represented by the recipient.

	[[alternative HTML version deleted]]

R help - Jun 2016 - help for fine mappting

[R] help for fine mappting

[R] Fwd: help for fine mappting

[R] help for fine mappting

[R] help for fine mappting

[R] help for fine mappting