Phillip Heinrich
2020-Jan-10 18:31 UTC
[R] Data Carpentry - Creating a New SQLite Database
Working my way through a tutorial named Data Carpentry (https://datacarpentry.org/R-ecology-lesson/). for the most part it is excellent but I?m stuck on the very last section (https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html). First, below are the packages I have loaded: [1] "forcats" "stringr" "purrr" "readr" "tidyr" "tibble" "ggplot2" "tidyverse" "dbplyr" "RMySQL" "DBI" [12] "dplyr" "RSQLite" "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base" > Second, Second, is the text of the last section of the last chapter titled ?Creating a New SQLite Database?. Second, below is the text from the tutorial. The black type is from the tutorial. The green and blue is the suggested R code. My comments are in red. Creating a new SQLite database So far, we have used a previously prepared SQLite database. But we can also use R to create a new database, e.g. from existing csv files. Let?s recreate the mammals database that we?ve been working with, in R. First let?s download and read in the csv files. We?ll import tidyverse to gain access to the read_csv() function. download.file("https://ndownloader.figshare.com/files/3299483", "data_raw/species.csv") download.file("https://ndownloader.figshare.com/files/10717177", "data_raw/surveys.csv") download.file("https://ndownloader.figshare.com/files/3299474", "data_raw/plots.csv") library(tidyverse) species <- read_csv("data_raw/species.csv")No problem here. I?m pulling three databases from the Web and saving them to a folder on my hard drive. (...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv") plots <- read_csv("data_raw/plots.csv")Again no problem. I?m just creating an R data files. But here is where I loose it. I?m creating something named my_db_file from another file named portal-database-output with an sqlite extension and then creating my_db from the My_db_file. Not sure where the sqlite extension file came from. Creating a new SQLite database with dplyr is easy. You can re-use the same command we used above to open an existing .sqlite file. The create = TRUE argument instructs R to create a new, empty database instead. Caution: When create = TRUE is added, any existing database at the same location is overwritten without warning. my_db_file <- "data/portal-database-output.sqlite" my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database is empty, it doesn?t contain any tables: my_db#> src: sqlite 3.29.0 [data/portal-database-output.sqlite] #> tbls:To add tables, we copy the existing data.frames into the database one by one: copy_to(my_db, surveys) copy_to(my_db, plots) my_dbI can follow the directions to fill in my_db but I have no idea how to access the tables. The text from the tutorial below says to check the location of our database. Huh! Can someone give me some direction. Thanks. If you check the location of our database you?ll see that data is automatically being written to disk. R and dplyr not only provide easy ways to query existing databases, they also allows you to easily create your own databases from flat files! Here is where I loose it. [[alternative HTML version deleted]]
On Fri, 10 Jan 2020 11:31:58 -0700 "Phillip Heinrich" <herd_dog at cox.net> wrote:> below is the text from the tutorial. The black type is from the > tutorial. The green and blue is the suggested R code. My comments > are in redR-help is a plain text mailing list, so the markup has been stripped off (and since HTML-enabled mail clients don't quite care how the plain text version of the e-mail looks, some paragraph breaks had to go, too).> etc.surveys <- read_csv("data_raw/surveys.csv") > plots <- read_csv("data_raw/plots.csv")> Again no problem. I?m just creating an R data files.Note that it is not files that you are creating by running read_csv(), but variables (of type "tibble", which is like "data.frame", either of which should have been covered in earlier chapters in a good tutorial) in the R environment. The files you downloaded previously are opened in read only mode and are never changed.> my_db_file <- "data/portal-database-output.sqlite"> I?m creating something named my_db_file from another file named > portal-database-output with an sqlite extension and then creating > my_db from the My_db_file.This something is just a text string that happens to contain a *path* to a file. Just like the variable `greeting` in the following snippet: greeting <- "Hello world" print(greeting) See [1] for more info on character vectors in R.> Not sure where the sqlite extension file came from.The authors of the tutorial decided that the file to be created should be named like this. Feel free to change the extension (or the path) to anything else: neither R, nor SQLite cares about it much (but the file manager you use may display a different icon for it or become confused if you name it .txt or .pdf).> I can follow the directions to fill in my_db but I have no idea > how to access the tables.What exactly do you mean by "access"? At this point my_db should be a dplyr "src" object, so the tools described in dplyr vignettes [2] should be applicable. Try calling tbl() on it and passing the name of one of the tables you have just created. Also try running: example("src_sqlite")> The text from the tutorial below says to check the location of our > database. Huh! Can someone give me some direction.The variable my_db_file contains the location of the file where the database is stored. This is the same variable that you passed to the src_sqlite() function. -- Best regards, Ivan [1] https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Character-vectors [2] https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html https://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html
Please note that tidyverse packages have their own support resources at RStudio, whence they came; e.g. here: https://education.rstudio.com/learn/beginner/ You may also do better asking about issues that concern them at their support site: https://support.rstudio.com/hc/en-us though, as you already found out, there are folks here who may help also. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Jan 10, 2020 at 10:32 AM Phillip Heinrich <herd_dog at cox.net> wrote:> Working my way through a tutorial named Data Carpentry ( > https://datacarpentry.org/R-ecology-lesson/). for the most part it is > excellent but I?m stuck on the very last section ( > https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html). > > First, below are the packages I have loaded: > [1] "forcats" "stringr" "purrr" "readr" "tidyr" "tibble" > "ggplot2" "tidyverse" "dbplyr" "RMySQL" "DBI" > [12] "dplyr" "RSQLite" "stats" "graphics" "grDevices" "utils" > "datasets" "methods" "base" > > > > > > > Second, > Second, is the text of the last section of the last chapter titled > ?Creating a New SQLite Database?. > Second, below is the text from the tutorial. The black type is from the > tutorial. The green and blue is the suggested R code. My comments are in > red. > Creating a new SQLite database > So far, we have used a previously prepared SQLite database. But we can > also use R to create a new database, e.g. from existing csv files. Let?s > recreate the mammals database that we?ve been working with, in R. First > let?s download and read in the csv files. We?ll import tidyverse to gain > access to the read_csv() function. > > download.file("https://ndownloader.figshare.com/files/3299483", > "data_raw/species.csv") > download.file("https://ndownloader.figshare.com/files/10717177", > "data_raw/surveys.csv") > download.file("https://ndownloader.figshare.com/files/3299474", > "data_raw/plots.csv") > library(tidyverse) > species <- read_csv("data_raw/species.csv")No problem here. I?m pulling > three databases from the Web and saving them to a folder on my hard drive. > (...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv") > plots <- read_csv("data_raw/plots.csv")Again no problem. I?m just creating > an R data files. But here is where I loose it. I?m creating something > named my_db_file from another file named portal-database-output with an > sqlite extension and then creating my_db from the My_db_file. Not sure > where the sqlite extension file came from. Creating a new SQLite database > with dplyr is easy. You can re-use the same command we used above to open > an existing .sqlite file. The create = TRUE argument instructs R to create > a new, empty database instead. > > Caution: When create = TRUE is added, any existing database at the same > location is overwritten without warning. > > my_db_file <- "data/portal-database-output.sqlite" > my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database > is empty, it doesn?t contain any tables: > > my_db#> src: sqlite 3.29.0 [data/portal-database-output.sqlite] > #> tbls:To add tables, we copy the existing data.frames into the database > one by one: > > copy_to(my_db, surveys) > copy_to(my_db, plots) > my_dbI can follow the directions to fill in my_db but I have no idea how > to access the tables. The text from the tutorial below says to check the > location of our database. Huh! Can someone give me some direction. > Thanks. > > > > > > If you check the location of our database you?ll see that data is > automatically being written to disk. R and dplyr not only provide easy ways > to query existing databases, they also allows you to easily create your own > databases from flat files! > > > > Here is where I loose it. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Phillip, Skipping to the last few lines of your email, did you download a program to look at Sqlite databases (independent of R) as listed below? Maybe that program ("DB Browser for SQLite") and/or the instructions below can help you locate your database directory: https://datacarpentry.org/semester-biology/computer-setup/ https://datacarpentry.org/semester-biology/materials/sql-for-dplyr-users/ If you do have that program, and you're still seeing an error, you might consider looking for similar issues at the appropriate 'datacarpentry' repository on Github (or posting a new issue yourself): https://github.com/datacarpentry/R-ecology-lesson/issues Finally, I really feel you'll benefit from reading over the documents pertaining to "R Data Import/Export" on the www.r-project.org website. No disrespect to the people at 'datacarpentry', but you'll find similar (and possibly, easier) R code to follow at section 4.3.1 'Packages using DBI' : https://cran.r-project.org/doc/manuals/r-release/R-data.html HTH, Bill. W. Michels, Ph.D. On Fri, Jan 10, 2020 at 10:32 AM Phillip Heinrich <herd_dog at cox.net> wrote:> > Working my way through a tutorial named Data Carpentry (https://datacarpentry.org/R-ecology-lesson/). for the most part it is excellent but I?m stuck on the very last section (https://datacarpentry.org/R-ecology-lesson/05-r-and-databases.html). > > First, below are the packages I have loaded: > [1] "forcats" "stringr" "purrr" "readr" "tidyr" "tibble" "ggplot2" "tidyverse" "dbplyr" "RMySQL" "DBI" > [12] "dplyr" "RSQLite" "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base" > > > > > > > Second, > Second, is the text of the last section of the last chapter titled ?Creating a New SQLite Database?. > Second, below is the text from the tutorial. The black type is from the tutorial. The green and blue is the suggested R code. My comments are in red. > Creating a new SQLite database > So far, we have used a previously prepared SQLite database. But we can also use R to create a new database, e.g. from existing csv files. Let?s recreate the mammals database that we?ve been working with, in R. First let?s download and read in the csv files. We?ll import tidyverse to gain access to the read_csv() function. > > download.file("https://ndownloader.figshare.com/files/3299483", > "data_raw/species.csv") > download.file("https://ndownloader.figshare.com/files/10717177", > "data_raw/surveys.csv") > download.file("https://ndownloader.figshare.com/files/3299474", > "data_raw/plots.csv") > library(tidyverse) > species <- read_csv("data_raw/species.csv")No problem here. I?m pulling three databases from the Web and saving them to a folder on my hard drive. (...data_raw/species.csv) etc.surveys <- read_csv("data_raw/surveys.csv") plots <- read_csv("data_raw/plots.csv")Again no problem. I?m just creating an R data files. But here is where I loose it. I?m creating something named my_db_file from another file named portal-database-output with an sqlite extension and then creating my_db from the My_db_file. Not sure where the sqlite extension file came from. Creating a new SQLite database with dplyr is easy. You can re-use the same command we used above to open an existing .sqlite file. The create = TRUE argument instructs R to create a new, empty database instead. > > Caution: When create = TRUE is added, any existing database at the same location is overwritten without warning. > > my_db_file <- "data/portal-database-output.sqlite" > my_db <- src_sqlite(my_db_file, create = TRUE)Currently, our new database is empty, it doesn?t contain any tables: > > my_db#> src: sqlite 3.29.0 [data/portal-database-output.sqlite] > #> tbls:To add tables, we copy the existing data.frames into the database one by one: > > copy_to(my_db, surveys) > copy_to(my_db, plots) > my_dbI can follow the directions to fill in my_db but I have no idea how to access the tables. The text from the tutorial below says to check the location of our database. Huh! Can someone give me some direction. Thanks. > > > > > > If you check the location of our database you?ll see that data is automatically being written to disk. R and dplyr not only provide easy ways to query existing databases, they also allows you to easily create your own databases from flat files! > > > > Here is where I loose it. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.