thr3ads.net - R help - [R] exporting tables from an access database using parallel foreach [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Vivek Sutradhara

2015-Nov-20 17:38 UTC

[R] exporting tables from an access database using parallel foreach

Hi
I want to extract data from a Microsoft access database having many tables
with more than 1e7 rows. I find that the following code works to export a
table to a rds file :
#####################
setwd('C:/sFolder')
library(RODBC);library(DBI)
ch<-odbcConnect("sample")

#No. of rows in the table not known
rowN<-1e6  # no. of rows defined
db<-sqlFetch(ch,"Table1",max=rowN,as.is=TRUE)
file<-paste0('Table1',1,'.rds')
df1<-saveRDS(db,file1)

rm(db);gc()   # garbage collection to free up the memory

# To successively obtain more chunks from the access database
for (i in 2:10) {
  rm(df);gc()
  df<-sqlFetchMore(ch,"Table1",max=rowN,as.is=TRUE)
  file<-paste0('Table1',i,'.rds')
  df1<-saveRDS(df,file)
  if (dim(df)[1]<rowN)
    break
}
rm(df);gc()
odbcCloseAll()
##############################

I would like to know the following :
1. Is there any way to extract data from a table by just specifying the row
number range. I have extracted data before. Instead of repeating the
operations, I would just like to obtain data from, let's say, 8e6 to 9e6
row range. I cannot do this now. I have to successively use the
sqlfetchMore command. I would like to know if it is possible to straight
away go to the 8e6 to 9e6 row range.
2. Is it possible to use the foreach package in the extraction step (in
place of the for loop above). I am planning to use the foreach command in
parallel later for processing the data in the multiple files. I just wonder
if it is possible to do parallel processing for the data extraction also.
Thanks,
Vivek Sutradhara

	[[alternative HTML version deleted]]

John McKown

2015-Nov-20 18:09 UTC

head link

[R] exporting tables from an access database using parallel foreach

A possibility could be to not use ODBC, but the CRAN package openslsx (
https://cran.revolutionanalytics.com/web/packages/openxlsx/index.html ).
Then use the read.xlsx() function.
<quote>
Description Read data from an Excel file or Workbook object into a
data.frame

Usage read.xlsx(xlsxFile, sheet = 1, startRow = 1, colNames = TRUE,
rowNames = FALSE, detectDates = FALSE, skipEmptyRows = TRUE, rows = NULL,
cols = NULL, check.names = FALSE, namedRegion = NULL)

Arguments xlsxFile An xlsx file or Workbook object sheet The name or index
of the sheet to read data from.
startRow first row to begin looking for data. Empty rows at the top of a
file are always skipped, regardless of the value of startRow.
colNames If TRUE, the first row of data will be used as column names.
rowNames If TRUE, first column of data will be used as row names.
detectDates If TRUE, attempt to recognise dates and perform conversion.
skipEmptyRows If TRUE, empty rows are skipped else empty rows after the
first row containing data will return a row of NAs.
rows A numeric vector specifying which rows in the Excel file to read. If
NULL, all rows are read.
cols A numeric vector specifying which columns in the Excel file to read.
If NULL, all columns are read.
check.names logical. If TRUE then the names of the variables in the data
frame are checked to ensure that they are syntactically valid variable
names
namedRegion A named region in the Workbook. If not NULL startRow, rows and
cols paramters are ignored.
</quote>

On Fri, Nov 20, 2015 at 11:38 AM, Vivek Sutradhara <viveksutra at
gmail.com>
wrote:
> Hi
> I want to extract data from a Microsoft access database having many tables
> with more than 1e7 rows. I find that the following code works to export a
> table to a rds file :
> #####################
> setwd('C:/sFolder')
> library(RODBC);library(DBI)
> ch<-odbcConnect("sample")
>
> #No. of rows in the table not known
> rowN<-1e6  # no. of rows defined
> db<-sqlFetch(ch,"Table1",max=rowN,as.is=TRUE)
> file<-paste0('Table1',1,'.rds')
> df1<-saveRDS(db,file1)
>
> rm(db);gc()   # garbage collection to free up the memory
>
> # To successively obtain more chunks from the access database
> for (i in 2:10) {
>   rm(df);gc()
>   df<-sqlFetchMore(ch,"Table1",max=rowN,as.is=TRUE)
>   file<-paste0('Table1',i,'.rds')
>   df1<-saveRDS(df,file)
>   if (dim(df)[1]<rowN)
>     break
> }
> rm(df);gc()
> odbcCloseAll()
> ##############################
>
> I would like to know the following :
> 1. Is there any way to extract data from a table by just specifying the row
> number range. I have extracted data before. Instead of repeating the
> operations, I would just like to obtain data from, let's say, 8e6 to
9e6
> row range. I cannot do this now. I have to successively use the
> sqlfetchMore command. I would like to know if it is possible to straight
> away go to the 8e6 to 9e6 row range.
> 2. Is it possible to use the foreach package in the extraction step (in
> place of the for loop above). I am planning to use the foreach command in
> parallel later for processing the data in the multiple files. I just wonder
> if it is possible to do parallel processing for the data extraction also.
> Thanks,
> Vivek Sutradhara
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 

Schrodinger's backup: The condition of any backup is unknown until a
restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

	[[alternative HTML version deleted]]

Vivek Sutradhara

2015-Nov-20 18:32 UTC

head link

[R] exporting tables from an access database using parallel foreach

Hi John,
Thanks a lot for your quick reply. And thanks for drawing my attention to
the openslsx package. I will certainly look into it when I work with Excel.
But right now, my problems are with Microsoft Access.

There are huge tables there which I am not able to export to excel, csv or
text files with native access methods. The only solution that has worked so
far is to incrementally extract data with the the help of RODBC. This was a
huge leap in my attempts to export the tables. Once I have the data in form
of rds files (which are compressed as well), I have found that it is much
easier to work with them.

But my wishes have suddenly expanded and I want to find out if it is
possible to go beyond the normal capabilities of RODBC (the sqlFetch
command does not have a provision for specifying the row number range). I
am a newbie with parallel methods (using the 4 cores on my pc) but I am
hoping to progress with that for processing the data from the multiple
chunks of data (the first step will be just to filter and gather the data
of relevance).

I hope that I have explained what I am looking for.
Thanks,
Vivek

2015-11-20 19:09 GMT+01:00 John McKown <john.archie.mckown at gmail.com>:
> A possibility could be to not use ODBC, but the CRAN package openslsx (
> https://cran.revolutionanalytics.com/web/packages/openxlsx/index.html ).
> Then use the read.xlsx() function.
> <quote>
> Description Read data from an Excel file or Workbook object into a
> data.frame
>
> Usage read.xlsx(xlsxFile, sheet = 1, startRow = 1, colNames = TRUE,
> rowNames = FALSE, detectDates = FALSE, skipEmptyRows = TRUE, rows = NULL,
> cols = NULL, check.names = FALSE, namedRegion = NULL)
>
> Arguments xlsxFile An xlsx file or Workbook object sheet The name or index
> of the sheet to read data from.
> startRow first row to begin looking for data. Empty rows at the top of a
> file are always skipped, regardless of the value of startRow.
> colNames If TRUE, the first row of data will be used as column names.
> rowNames If TRUE, first column of data will be used as row names.
> detectDates If TRUE, attempt to recognise dates and perform conversion.
> skipEmptyRows If TRUE, empty rows are skipped else empty rows after the
> first row containing data will return a row of NAs.
> rows A numeric vector specifying which rows in the Excel file to read. If
> NULL, all rows are read.
> cols A numeric vector specifying which columns in the Excel file to read.
> If NULL, all columns are read.
> check.names logical. If TRUE then the names of the variables in the data
> frame are checked to ensure that they are syntactically valid variable
> names
> namedRegion A named region in the Workbook. If not NULL startRow, rows and
> cols paramters are ignored.
> </quote>
>
> On Fri, Nov 20, 2015 at 11:38 AM, Vivek Sutradhara <viveksutra at
gmail.com>
> wrote:
>
>> Hi
>> I want to extract data from a Microsoft access database having many
tables
>> with more than 1e7 rows. I find that the following code works to export
a
>> table to a rds file :
>> #####################
>> setwd('C:/sFolder')
>> library(RODBC);library(DBI)
>> ch<-odbcConnect("sample")
>>
>> #No. of rows in the table not known
>> rowN<-1e6  # no. of rows defined
>> db<-sqlFetch(ch,"Table1",max=rowN,as.is=TRUE)
>> file<-paste0('Table1',1,'.rds')
>> df1<-saveRDS(db,file1)
>>
>> rm(db);gc()   # garbage collection to free up the memory
>>
>> # To successively obtain more chunks from the access database
>> for (i in 2:10) {
>>   rm(df);gc()
>>   df<-sqlFetchMore(ch,"Table1",max=rowN,as.is=TRUE)
>>   file<-paste0('Table1',i,'.rds')
>>   df1<-saveRDS(df,file)
>>   if (dim(df)[1]<rowN)
>>     break
>> }
>> rm(df);gc()
>> odbcCloseAll()
>> ##############################
>>
>> I would like to know the following :
>> 1. Is there any way to extract data from a table by just specifying the
>> row
>> number range. I have extracted data before. Instead of repeating the
>> operations, I would just like to obtain data from, let's say, 8e6
to 9e6
>> row range. I cannot do this now. I have to successively use the
>> sqlfetchMore command. I would like to know if it is possible to
straight
>> away go to the 8e6 to 9e6 row range.
>> 2. Is it possible to use the foreach package in the extraction step (in
>> place of the for loop above). I am planning to use the foreach command
in
>> parallel later for processing the data in the multiple files. I just
>> wonder
>> if it is possible to do parallel processing for the data extraction
also.
>> Thanks,
>> Vivek Sutradhara
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
>
> Schrodinger's backup: The condition of any backup is unknown until a
> restore is attempted.
>
> Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.
>
> He's about as useful as a wax frying pan.
>
> 10 to the 12th power microphones = 1 Megaphone
>
> Maranatha! <><
> John McKown
>
	[[alternative HTML version deleted]]

John McKown

2015-Nov-20 18:33 UTC

head link

[R] exporting tables from an access database using parallel foreach

My apologies, you wrote "access" and I read "Excel". I
really should not
play a game on my smartphone while speed reading emails.

On Fri, Nov 20, 2015 at 11:38 AM, Vivek Sutradhara <viveksutra at
gmail.com>
wrote:
> Hi
> I want to extract data from a Microsoft access database having many tables
> with more than 1e7 rows. I find that the following code works to export a
> table to a rds file :
> #####################
> setwd('C:/sFolder')
> library(RODBC);library(DBI)
> ch<-odbcConnect("sample")
>
> #No. of rows in the table not known
> rowN<-1e6  # no. of rows defined
> db<-sqlFetch(ch,"Table1",max=rowN,as.is=TRUE)
> file<-paste0('Table1',1,'.rds')
> df1<-saveRDS(db,file1)
>
> rm(db);gc()   # garbage collection to free up the memory
>
> # To successively obtain more chunks from the access database
> for (i in 2:10) {
>   rm(df);gc()
>   df<-sqlFetchMore(ch,"Table1",max=rowN,as.is=TRUE)
>   file<-paste0('Table1',i,'.rds')
>   df1<-saveRDS(df,file)
>   if (dim(df)[1]<rowN)
>     break
> }
> rm(df);gc()
> odbcCloseAll()
> ##############################
>
> I would like to know the following :
> 1. Is there any way to extract data from a table by just specifying the row
> number range. I have extracted data before. Instead of repeating the
> operations, I would just like to obtain data from, let's say, 8e6 to
9e6
> row range. I cannot do this now. I have to successively use the
> sqlfetchMore command. I would like to know if it is possible to straight
> away go to the 8e6 to 9e6 row range.
> 2. Is it possible to use the foreach package in the extraction step (in
> place of the for loop above). I am planning to use the foreach command in
> parallel later for processing the data in the multiple files. I just wonder
> if it is possible to do parallel processing for the data extraction also.
> Thanks,
> Vivek Sutradhara
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 

Schrodinger's backup: The condition of any backup is unknown until a
restore is attempted.

Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! <><
John McKown

	[[alternative HTML version deleted]]

R help - Nov 2015 - exporting tables from an access database using parallel foreach

[R] exporting tables from an access database using parallel foreach

[R] exporting tables from an access database using parallel foreach

[R] exporting tables from an access database using parallel foreach

[R] exporting tables from an access database using parallel foreach