dear Members, I have recently started studying SQL and MySQL. My question is, what exactly is SQL used for? That is, whatever can be done by SQL, like subsetting and filtering of data sets, can also be done by R. What's, then, the advantage of SQL? It is OK if you tag this question as offtopic, but I could'nt find any info on the web. Can you please refer me to some online resources that shed some light on this? Finally, how does SQL complement R? Are both dependent? THanking you, Yours sincerely, AKSHAY M KULKARNI [https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> [[alternative HTML version deleted]]
Others may know more than I do, but roughly: (1) SQL provides access to relational database management systems that are much more robust and handle large-scale data; (2) methods based on SQL will often handle data that are too large to fit in memory R complements SQL by providing a much larger set of statistical tools. The typical workflow would be that you would use SQL queries to do the extraction, subsetting, and simple manipulation of data to reduce the data to a manageable size for analysis on a single machine, then use R to analyze it. See also various packages (arrow, dbplyr) that provide alternative big-data workflows. cheers Ben Bolker On 12/11/24 08:16, akshay kulkarni wrote:> dear Members, > I have recently started studying SQL and MySQL. My question is, what exactly is SQL used for? That is, whatever can be done by SQL, like subsetting and filtering of data sets, can also be done by R. What's, then, the advantage of SQL? It is OK if you tag this question as offtopic, but I could'nt find any info on the web. Can you please refer me to some online resources that shed some light on this? Finally, how does SQL complement R? Are both dependent? > > THanking you, > Yours sincerely, > AKSHAY M KULKARNI > > [https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Director, School of Computational Science and Engineering * E-mail is sent at my convenience; I don't expect replies outside of working hours.
Some people prefer SQL syntax. Also, SQL implementations are generally intrinsically linked with persistent disk storage, so it works straightforwardly with data sets larger than RAM. Finally, most implementations support shared access to the data from multiple clients. A long time ago in a computer with little RAM I used to use SQL a lot. But I have not really used it for many years now. Your needs may vary. On December 11, 2024 5:16:59 AM PST, akshay kulkarni <akshay_e4 at hotmail.com> wrote:>dear Members, > I have recently started studying SQL and MySQL. My question is, what exactly is SQL used for? That is, whatever can be done by SQL, like subsetting and filtering of data sets, can also be done by R. What's, then, the advantage of SQL? It is OK if you tag this question as offtopic, but I could'nt find any info on the web. Can you please refer me to some online resources that shed some light on this? Finally, how does SQL complement R? Are both dependent? > >THanking you, >Yours sincerely, >AKSHAY M KULKARNI > >[https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide https://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Just a slight technical note -- Ben gave you a good answer already, imo. The note is: R is Turing complete, which mean that *anything* any language can do, R could be programmed to do also. The point is what can be done well in R and what can be done (often much) better with other tools, as Ben explained. Cheers, Bert On Wed, Dec 11, 2024 at 5:17?AM akshay kulkarni <akshay_e4 at hotmail.com> wrote:> > dear Members, > I have recently started studying SQL and MySQL. My question is, what exactly is SQL used for? That is, whatever can be done by SQL, like subsetting and filtering of data sets, can also be done by R. What's, then, the advantage of SQL? It is OK if you tag this question as offtopic, but I could'nt find any info on the web. Can you please refer me to some online resources that shed some light on this? Finally, how does SQL complement R? Are both dependent? > > THanking you, > Yours sincerely, > AKSHAY M KULKARNI > > [https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Akshay, Your question has way too many answers. SQL has a long history and early versions came long before R arrived on the scene. There is a huge embedded base of hardware and software dedicated to managing databases. It has some features that most R programs do not even dream of doing. Besides easily handling massive amounts of data or sometimes tweaking queries to possibly run more efficiently, there are all kinds of issue of how to manage multiple people accessing and changing the data at about the same time, or rolling the data back to an earlier checkpoint. R came along later and, as Ben pointed out, adds all kinds of things SQL does not have and likely does not need, or alternate ways to do things. For many people now, the workload is to use a programming language, and R is not the only one used, which has enhanced with packages or modules that allow access in a fairly general way to one or many databases running various versions of SQL. The programmer uses this API in many ways. In some ways, it is just a way to tell the database what to do without much other processing. You can ask to open a connection to the server, do a query that gets translated to SQL (or you can provide the actual SQL) and let the remote (or local) machine do much of the work. For example, imagine a database with terabytes of data and all you want is a few rows/columns that meet your query. In R, you might have to open a collection of huge CSV files and fill more memory than you have and do the query somehow. If the data is remote, we are talking about a huge receiving of data. Using SQL divides the work so you do parts here and parts there. Why use a local MYSQL? Part of the answer is that you have a fairly optimized and debugged system that does it well and lets the programmer focus on the parts they need to add within R like complex analyses. Part is portability, as you can later move the data outside your machine and with minor changes, your program should still work. And, there are many other scenarios such as wanting to gather data from different sources such as connecting to multiple remote databases and getting filtered data and doing an analysis across that data and perhaps updating them. R used in ways like this provides lots of flexibility. But part of the question is like asking why there are a hundred programming languages still in use out there. Why do we need so many? In short, we don't necessarily need all or even most of them but they are there because various people developed them and used them and it is not trivial to get people to switch and maybe abandon all the older software or try to rewrite it. Having said that, I think a large fraction of R users have never had any particular reason to learn SQL. Many have never used it directly or even indirectly. I know someone who I have programmed for who calls some expert to do a SQL query and save the results in CSV files and then works directly in R on those files. I have pointed out to them that their life could be even easier if they got a more focused dump of the SQL data with some of the added processing done in SQL and then a smaller amount of data coming into the R side. I also note that languages like R and python can have parts that run fairly slowly. Arguably, most versions of SQL have been tuned over decades ... -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of akshay kulkarni Sent: Wednesday, December 11, 2024 8:17 AM To: R help Mailing list <r-help at r-project.org> Subject: [R] SQL and R dear Members, I have recently started studying SQL and MySQL. My question is, what exactly is SQL used for? That is, whatever can be done by SQL, like subsetting and filtering of data sets, can also be done by R. What's, then, the advantage of SQL? It is OK if you tag this question as offtopic, but I could'nt find any info on the web. Can you please refer me to some online resources that shed some light on this? Finally, how does SQL complement R? Are both dependent? THanking you, Yours sincerely, AKSHAY M KULKARNI [https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-round-oran ge-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=ema il&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.www.avast.com<https://www.avast.com/sig-email?utm_medium=email&ut m_source=link&utm_campaign=sig-email&utm_content=webmail> [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Looks like an assignment question. If so, do your homework yourself. Google is your friend el On 2024-12-11 15:16, akshay kulkarni wrote:> dear Members, I have recently started studying SQL and MySQL. My > question is, what exactly is SQL used for? That is, whatever can be > done by SQL, like subsetting and filtering of data sets, can also be > done by R. What's, then, the advantage of SQL? It is OK if you tag > this question as offtopic, but I could'nt find any info on the web. > Can you please refer me to some online resources that shed some > light on this? Finally, how does SQL complement R? Are both > dependent? > > THanking you, Yours sincerely, AKSHAY M KULKARNI
Dear Askay, I believe my grey hair allows me to help answer your question. SQL, and its progenitor SEQUEL, were developed specifically to manipulate relational databases. It was developed in the early 1970s (equivalent to the historical bronze age) when the concept of a relational database (see https://en.wikipedia.org/wiki/Relational_database) and Codd's 12-rules were being developed (see https://en.wikipedia.org/wiki/Codd%27s_12_rules) At the time, the concept of a relation database and a programming language dedicated to manipulating them was revolutionary. The concept was clearly needed, important, and well used; a commercial version of SQL, Oracle, made Larry Ellison more than a quarter billionaire. S, one of the progenitors of R, was developed later. In 1975 by John Chambers, Rick Becker, Trevor Hastie, and William Cleveland (all of whom, I believe worked at Bell Labs) developed S as a general programming language. It was NOT developed specifically for the manipulation of relational databases. S had modest success in academia. S-Plus, a commercial version of R was developed fairly recently in 1988 by a company Statistical Sciences. The founder of Statistical Sciences was R. Douglas Marin who was a professor of statistics at the University of Washington, Seattle. S was also the progenitor of R. R was developed by Ross Ihaka and Robert Gentlemen in 1993, faculty members of the University of Auckland. Given the ubiquity of R in academia, it is clear that S, much like SQL has been extraordinarily successful. John John David Sorkin M.D., Ph.D. Professor of Medicine, University of Maryland School of Medicine; Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center; PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center; Senior Statistician University of Maryland Center for Vascular Research; Division of Gerontology and Paliative Care, 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 Cell phone 443-418-5382 ________________________________________ From: R-help <r-help-bounces at r-project.org> on behalf of akshay kulkarni <akshay_e4 at hotmail.com> Sent: Wednesday, December 11, 2024 8:16 AM To: R help Mailing list Subject: [R] SQL and R dear Members, I have recently started studying SQL and MySQL. My question is, what exactly is SQL used for? That is, whatever can be done by SQL, like subsetting and filtering of data sets, can also be done by R. What's, then, the advantage of SQL? It is OK if you tag this question as offtopic, but I could'nt find any info on the web. Can you please refer me to some online resources that shed some light on this? Finally, how does SQL complement R? Are both dependent? THanking you, Yours sincerely, AKSHAY M KULKARNI [https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.http://www.avast.com/<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
The advantages of SQL are that - it can be used from many languages so if you know SQL you can easily move that part of your code to python, say, and visa versa - it is widely used - it can handle data stored outside of R and possibly otherwise too large for R - some SQL databases support multiple concurrent users - depending on the database it may be used to communicate the data to others - one big difference is that SQL is declarative rather than imperative. You tell it what you want rather than how to get it and the optimizer will figure out how to create a low level query to get the result. For example, in the simple case below it will simply scan the BOD table but in more complex cases there may be many approaches and it will try to find a good one: library(sqldf) sqldf("explain query plan select * from BOD") ## id parent notused detail ## 1 2 0 0 SCAN BOD sqldf("explain select * from BOD") ## addr opcode p1 p2 p3 p4 p5 comment ## 1 0 Init 0 10 0 <NA> 0 NA ## 2 1 OpenRead 0 2 0 2 0 NA ## 3 2 Rewind 0 9 0 <NA> 0 NA ## 4 3 Column 0 0 1 <NA> 0 NA ## 5 4 RealAffinity 1 0 0 <NA> 0 NA ## 6 5 Column 0 1 2 <NA> 0 NA ## 7 6 RealAffinity 2 0 0 <NA> 0 NA ## 8 7 ResultRow 1 2 0 <NA> 0 NA ## 9 8 Next 0 3 0 <NA> 1 NA ## 10 9 Halt 0 0 0 <NA> 0 NA ## 11 10 Transaction 0 0 1 0 1 NA ## 12 11 Goto 0 1 0 <NA> 0 NA On Wed, Dec 11, 2024 at 8:17?AM akshay kulkarni <akshay_e4 at hotmail.com> wrote:> > dear Members, > I have recently started studying SQL and MySQL. My question is, what exactly is SQL used for? That is, whatever can be done by SQL, like subsetting and filtering of data sets, can also be done by R. What's, then, the advantage of SQL? It is OK if you tag this question as offtopic, but I could'nt find any info on the web. Can you please refer me to some online resources that shed some light on this? Finally, how does SQL complement R? Are both dependent? > > THanking you, > Yours sincerely, > AKSHAY M KULKARNI > > [https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com