thr3ads.net - similar to: "read.table() versus scan()"

Displaying 20 results from an estimated 40000 matches similar to: "read.table() versus scan()"

Repeating the same calculation across multiple pairs of variables

2011 Mar 05

Repeating the same calculation across multiple pairs of variables

Hi all, I frequently encounter datasets that require me to repeat the same calculation across many variables. For example, given a dataset with total employment variables and manufacturing employment variables for the years 1990-2010, I might have to calculate manufacturing's share of total employment in each year. I find it cumbersome to have to manually define a share for each year and

Seeking a more efficient way to read in a file

2008 Jan 02

Seeking a more efficient way to read in a file

Hi. I have a matrix stored in a large, tab-delimited flat file. The first row contains column names. Because the matrix is symmetric, the file has lower triangular format, so the second row contains one number, the third row two numbers, etc. In general, row k+1 contains k numbers; the matrix has 3000 rows, so the file has 3001 rows. The file has variable length records, so each row ends

Efficient way to determine if a data frame has missing observations

2011 Feb 02

Efficient way to determine if a data frame has missing observations

I have a data set covering a large number of cities with values for characteristics such as land area, population, and employment. The problem I have is that some cities lack observations for some of the characteristics and I'd like a quick way to determine which cities have missing data. For example:

moving onto returning a data.frame?

2003 May 21

moving onto returning a data.frame?

I've been studying some of the code and I'm still a little shakey on the proper method for returning a data.frame from a C function (which is my ultimate goal here). I've started some code that I've "stolen" from the archives and I'm running into crashes, etc. I've been trying to gleen some insight from the src/main/scan.c file and didn't find many comments in

How does the cex parameter scale circles?

2011 Mar 09

How does the cex parameter scale circles?

I'm wondering how the cex parameter is used to scale circles (i.e. does it scale the radius, diameter, area, circumference, etc.?). In my case I'm using lattice with filled circles (pch=19). Based on example, it looks like R scales the radius of the circle: library(lattice) dta<-data.frame(x=rep(1,6),y=rep(1,6),sz=c(1,2,4,8,16,32))

sqldf for Very Large Tab Delimited Files

2012 Feb 02

sqldf for Very Large Tab Delimited Files

Hi All, I have a very (very) large tab-delimited text file without headers. There are only 8 columns and millions of rows. I want to make numerous pieces of this file by sub-setting it for individual stations. Station is given as in the first column. I am trying to learn and use sqldf package for this but am stuck in a couple of places. To simulate my requirement, I have taken iris dataset as an

CairoPDF, Fonts, and Windows 7

2011 Apr 16

CairoPDF, Fonts, and Windows 7

Hi All: I have some basic questions about Cairo graphics engine. I'm trying to use the Cairo package to produce PDF output, mainly because I perceive it to be easy to use with a wide variety of fonts. But right now, I'm stuck trying to figure out what fonts are available to be used with Cairo, specifically the CairoPDF function. I've been able to successfully produce some test PDFs

polr probit versus stata oprobit

2004 Nov 11

polr probit versus stata oprobit

Dear All, I have been struggling to understand why for the housing data in MASS library R and stata give coef. estimates that are really different. I also tried to come up with many many examples myself (see below, of course I did not have the set.seed command included) and all of my `random' examples seem to give verry similar output. For the housing data, I have changed the data into numeric

Details of subassignment (for vectors and data frames)

2011 Aug 12

Details of subassignment (for vectors and data frames)

Hi All: I'm looking to find out a bit more about how subassignment actually works and am hoping someone with knowledge of the details can fill me in (I've looked at the source code, but my knowledge of C is lacking). In the case of vectors, my reading of ?"[" would indicate that for a vector, vec <- 1:25, vec[c(1,5,25)] <- c(101,102,103)is functionally the same as

Convert the output of by() to a data frame

2011 Feb 08

Convert the output of by() to a data frame

I'd like to summarize several variables in a data frame, for multiple groups, and store the results in a data.frame. To do so, I'm using by(). For example: df<-data.frame(a=1:10,b=11:20,c=21:30,grp1=c("x","y"),grp2=c("x","y"),grp3=c("x","y")) dfsum<-by(df[c("a","b","c")],

How does the data.frame function generate column names?

2011 Jan 23

How does the data.frame function generate column names?

Hi all, I'm a new R user and am confused about how R behaves when converting a vector to a data frame when using the data.frame function. I'm specifically interested in cases where the vector is expressed as a subset of another data frame. For example, say I want to create a data frame from the last three rows of the third column of the data frame, d, that I've created below:

More on scan: extra field at end of line

2000 Dec 26

More on scan: extra field at end of line

Suppose, I have a file "data1" containing: 450 390 467 654 30 542 334 432 421 357 497 493 550 549 467 575 578 342 446 547 534 495 979 479 I can read this file with: scan("data1") Read 24 items [1] 450 390 467 654 30 542 334 432 421 357 497 493 550 549 467 575 578 342 446 [20] 547 534 495 979 479

Advice needed on awkward tables

2010 May 11

Advice needed on awkward tables

Dear r-help list members, I am quite new to R, and hope to seek advice from you about a problem I have been cracking my head over. Apologies if this seems like a simple problem. I have essentially two tables. The first (Table A) is a standard patient clinicopathological data table, where rows correspond to patient IDs and columns correspond to clinical features. Records in this table are stored

Preferred way to create bubble plots?

2011 Mar 07

Preferred way to create bubble plots?

I have to create a number of bubble plots, and am wondering what methods folks prefer for this task. I've been experimenting with the symbols() function, with text() to provide plot labels. Any opinions on the relative merits of this method versus others? One criterion would be the ability to fine-tune the placement of text labels. I would like to use lattice, but haven't found a way to

suggestions regarding reading in a messy file

2011 Jul 12

suggestions regarding reading in a messy file

I have a file in stata format, which I have read in, and I am trying to create a text file. I have exported the data using various delimiters, but I'm unable to read it back in. I originally read in the file with: library(foreign) myData <- read.dta("mydata.dta") I then exported it with write.table using comma, tab, and exclamation marks as a delimiter. When I was unable to

Problem with read.table and scan

2001 Sep 24

Problem with read.table and scan

I have just installed R on a Windows NT system. Unfortunately I am unable to open any of the data files I wish to work with. I have tried using read.table and scan and to the best of my knowledge am using the correct syntax. The error message I receive is Error in file(file, "r") : cannot open file [file name] I have the data in text files in white-space delimited form. I put them

scan() vs readChar() speed

2012 Apr 01

scan() vs readChar() speed

Dear list, I am trying to find a fast solution to read moderately large (1 -- 10 million entries) text files containing only tab-delimited numeric values. My test file is the following, nr <- 1000 nc <- 5000 m <- matrix(round(rnorm(nr*nc),3),nr=nr) write.table(m, file = "a.txt", append=FALSE, row.names = FALSE, col.names = FALSE) scan() is faster than

Very odd problem

2010 Nov 01

Very odd problem

I had previously tried to migrate our PDC to a new machine by simply copying the config over and such. That failed miserably but luckily the various home servers (BDC's in samba speak I think) took up the slack. So after much debate, this weekend we moved the PDC back to the original machine. We never moved LDAP off of the original machine, as only samba functions moved. I now know I did

UTF-16 input and read.delim/scan

2012 May 18

UTF-16 input and read.delim/scan

Hi all, I am running 64-bit R 2.15.0 on windows 7. I am trying to use read.delim to read from a file that has 2-byte unicode (CJK) characters. Here is an example of the data (it is tab-delimited if that gets messed up): HITId HITTypeId Title 2Q69Z6KW4ZMAGKKFRT6Q4ONO6MJF68 2LVJ1LY58B72OP36GNBHH16YF7RS7Z 看看句子，写写想法请看以下的句子，再回答问 So read.delim (code below) doesn't read in correctly. It reads

WG: Samba 2.2.7a and XP pro

2003 Apr 12

WG: Samba 2.2.7a and XP pro

Dear Michael, I just have the same problem you have with samba 2.2.3a on a Suse 8.0 machine and win xp pro. Besides I performed all the steps below. >From my postings I learned that the prob might have something to do with a wrong mapping in smbusers file. It seems to me that the workstation account is mapped to user nobody. Up to now I was not able to solve the prob. Do you have any ideas

similar to: read.table() versus scan()