Thomas Subia
2019-Dec-26 17:54 UTC
[R] Checking for similar file names in two different directories
Colleagues, I have two locations where my data resides. One folder is for data taken under treatment A One folder is for data taken under treatment B "G:\ 0020-49785 10806.xls" "Q:\ 301864 4519 10806.xls" Here the 10806 is the part which is common to both directories. Is there a way to have R extract parts common to both directories? Thomas Subia Statistician / Senior Quality Engineer ASQ CQE IMG Companies? 225 Mountain Vista Parkway Livermore, CA 94551 T.?(925) 273-1106 F.?(925) 273-1111 E. tsubia at imgprecision.com Precision Manufacturing for Emerging Technologies imgprecision.com? The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is legally privileged, confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you.
Bert Gunter
2019-Dec-26 18:48 UTC
[R] Checking for similar file names in two different directories
?list.files and ?regexp Warning: following obviously untested: Gfiles <- list.files("G:", pattern = ".*10806\\.xls$") should then give you a vector of character names of the files you want to feed to read.xls() or whatever function exists in the favored package is for reading Excel files these days. Cheers, Bert On Thu, Dec 26, 2019 at 9:54 AM Thomas Subia <tsubia at imgprecision.com> wrote:> Colleagues, > > I have two locations where my data resides. > One folder is for data taken under treatment A > One folder is for data taken under treatment B > > "G:\ 0020-49785 10806.xls" > "Q:\ 301864 4519 10806.xls" > > Here the 10806 is the part which is common to both directories. > > Is there a way to have R extract parts common to both directories? > > Thomas Subia > Statistician / Senior Quality Engineer > ASQ CQE > > IMG Companies > 225 Mountain Vista Parkway > Livermore, CA 94551 > T. (925) 273-1106 > F. (925) 273-1111 > E. tsubia at imgprecision.com > > > Precision Manufacturing for Emerging Technologies > imgprecision.com > > The contents of this message, together with any attachments, are intended > only for the use of the individual or entity to which they are addressed > and may contain information that is legally privileged, confidential and > exempt from disclosure. If you are not the intended recipient, you are > hereby notified that any dissemination, distribution, or copying of this > message, or any attachment, is strictly prohibited. If you have received > this message in error, please notify the original sender or IMG Companies, > LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and > delete this message, along with any attachments, from your computer. Thank > you. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Rui Barradas
2019-Dec-26 18:59 UTC
[R] Checking for similar file names in two different directories
Hello, I am not sure if the following code is what you need but maybe you can get some inspiration from it. x <- c("G:\ 0020-49785 10806.xls", "Q:\ 301864 4519 10806.xls") y <- strsplit(x, split = "[^[:alnum:]]+") eq <- sapply(y[[1]], `==`, y[[2]]) i <- apply(eq, 1, function(e) Reduce(`|`, e)) y[[1]][i] #[1] "10806" "xls" This returns "10806" but also returns the file extension "xls". And it could be made to loop through a vector of filenames. Hope this helps, Rui Barradas ?s 17:54 de 26/12/19, Thomas Subia escreveu:> Colleagues, > > I have two locations where my data resides. > One folder is for data taken under treatment A > One folder is for data taken under treatment B > > "G:\ 0020-49785 10806.xls" > "Q:\ 301864 4519 10806.xls" > > Here the 10806 is the part which is common to both directories. > > Is there a way to have R extract parts common to both directories? > > Thomas Subia > Statistician / Senior Quality Engineer > ASQ CQE > > IMG Companies > 225 Mountain Vista Parkway > Livermore, CA 94551 > T.?(925) 273-1106 > F.?(925) 273-1111 > E. tsubia at imgprecision.com > > > Precision Manufacturing for Emerging Technologies > imgprecision.com > > The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is legally privileged, confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Richard O'Keefe
2019-Dec-27 01:34 UTC
[R] Checking for similar file names in two different directories
I think you had better start by defining what you mean by "similar". Examples are good, but not enough. On Fri, 27 Dec 2019 at 06:54, Thomas Subia <tsubia at imgprecision.com> wrote:> > Colleagues, > > I have two locations where my data resides. > One folder is for data taken under treatment A > One folder is for data taken under treatment B > > "G:\ 0020-49785 10806.xls" > "Q:\ 301864 4519 10806.xls" > > Here the 10806 is the part which is common to both directories. > > Is there a way to have R extract parts common to both directories? > > Thomas Subia > Statistician / Senior Quality Engineer > ASQ CQE > > IMG Companies > 225 Mountain Vista Parkway > Livermore, CA 94551 > T. (925) 273-1106 > F. (925) 273-1111 > E. tsubia at imgprecision.com > > > Precision Manufacturing for Emerging Technologies > imgprecision.com > > The contents of this message, together with any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain information that is legally privileged, confidential and exempt from disclosure. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this message, or any attachment, is strictly prohibited. If you have received this message in error, please notify the original sender or IMG Companies, LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and delete this message, along with any attachments, from your computer. Thank you. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2019-Dec-27 04:22 UTC
[R] Checking for similar file names in two different directories
AHA! -- I think I now see what you mean. My previous suggestion was almost useless as it assumes you already know what the "common" parts are ... but you don't. However, if it is the filename parts at the end are separated by spaces from the preceding part of the filename, i.e. like "stuff xxxxxxx.xls", then something like the following example would work I think: ## Read in *all* the filenames from both directories as I previously suggested. Gfiles <- list.files("G:") Qfiles <- list.files("Q:") Suppose this gave you (a simplified example):> Gfiles[1] "kjqdx 157.xls" "aorgz 287.xls" "ioldc 380.xls" "fpnxr 509.xls" [5] "wytcg 853.xls" "xujos 964.xls" "xdeto 217.xls" "nqriu 574.xls" [9] "jclir 480.xls" "fndyu 769.xls"> Qfiles[1] "vexrb 509.xls" "jxeio 770.xls" "zhmwf 920.xls" "cajdq 287.xls" [5] "nwdic 259.xls" "sqjkb 889.xls" "brhfu 157.xls" "uyirq 574.xls" [9] "ijfqm 480.xls" "nedhj 982.xls" ## all that's important is the " xxx.xls" at the end ## extract the filename part, omitting the ".xls" using regex's> Gnm <- sub("^.+ (.+)\\.xls$","\\1",Gfiles) > Qnm <- sub("^.+ (.+)\\.xls$","\\1",Qfiles)> Gnm[1] "157" "287" "380" "509" "853" "964" "217" "574" "480" "769"> Qnm[1] "509" "770" "920" "287" "259" "889" "157" "574" "480" "982"> ## The 'common' parts are: > intersect(Gnm,Qnm)[1] "157" "287" "509" "574" "480" You can now use these as I described previously to extract your common files. A similar strategy can be used for any other definition of "common" you wish to use *provided* you can uniquely and specifically define "common" to match in the filenames. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Dec 26, 2019 at 9:54 AM Thomas Subia <tsubia at imgprecision.com> wrote:> Colleagues, > > I have two locations where my data resides. > One folder is for data taken under treatment A > One folder is for data taken under treatment B > > "G:\ 0020-49785 10806.xls" > "Q:\ 301864 4519 10806.xls" > > Here the 10806 is the part which is common to both directories. > > Is there a way to have R extract parts common to both directories? > > Thomas Subia > Statistician / Senior Quality Engineer > ASQ CQE > > IMG Companies > 225 Mountain Vista Parkway > Livermore, CA 94551 > T. (925) 273-1106 > F. (925) 273-1111 > E. tsubia at imgprecision.com > > > Precision Manufacturing for Emerging Technologies > imgprecision.com > > The contents of this message, together with any attachments, are intended > only for the use of the individual or entity to which they are addressed > and may contain information that is legally privileged, confidential and > exempt from disclosure. If you are not the intended recipient, you are > hereby notified that any dissemination, distribution, or copying of this > message, or any attachment, is strictly prohibited. If you have received > this message in error, please notify the original sender or IMG Companies, > LLC at Tel: 925-273-1100 immediately by telephone or by return E-mail and > delete this message, along with any attachments, from your computer. Thank > you. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]