Hi all, I have a simple problem. I get stuck in using the imported spss data (.sav) using "read.spss". I imported data (z) without any problem. After importing, the first column doesn't contain any "NA". but when I choose a subset of it (like: z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the first column). The (.sav) file is the output of Compustat (WRDS). It is terrible, I can't find the mistake. Thank you in advance for your help, Elham [[alternative HTML version deleted]]
Hi Elham, You are not giving us much to go on here. Show us the commands that (a) confirm there are no NA's in the first column of z and (b) output a row of z that has an NA in the first column. Here's how one might do this: (a) sum(is.na(z[,1])) (b) z[ match(TRUE, z[,8] %in% c("11","12","14")), ] Eric On Wed, Aug 26, 2020 at 3:56 PM Elham Daadmehr <e.daadmehr at gmail.com> wrote:> Hi all, > > I have a simple problem. I get stuck in using the imported spss data (.sav) > using "read.spss". > I imported data (z) without any problem. After importing, the first column > doesn't contain any "NA". but when I choose a subset of it (like: > z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the > first column). > > The (.sav) file is the output of Compustat (WRDS). > > It is terrible, I can't find the mistake. > > Thank you in advance for your help, > Elham > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thanks for your reply. You're right, here is what I did:> library(foreign)> sz201401=read.spss("/Users/e.daadmehr/Desktop/Term/LastLast/untitledfolder/2014/1.sav", to.data.frame=TRUE) Warning message: In read.spss("/Users/e.daadmehr/Desktop/Term/LastLast/untitled folder/2014/1.sav", : /Users/e.daadmehr/Desktop/Term/LastLast/untitled folder/2014/1.sav: Compression bias (0) is not the usual value of 100> z =sz201401> is.list(z)[1] TRUE> z=as.data.frame(z)> is.data.frame(z)[1] TRUE> z=z[,-c(10)]> sum(is.na(z[,1]))[1] 0> z1=z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]> sum(is.na(z1[,1]))[1] 399 my file is not compressed. Thank you in advance, Elham On Wed, Aug 26, 2020 at 3:31 PM Eric Berger <ericjberger at gmail.com> wrote:> Hi Elham, > You are not giving us much to go on here. > Show us the commands that (a) confirm there are no NA's in the first > column of z > and (b) output a row of z that has an NA in the first column. > Here's how one might do this: > (a) sum(is.na(z[,1])) > (b) z[ match(TRUE, z[,8] %in% c("11","12","14")), ] > > Eric > > > On Wed, Aug 26, 2020 at 3:56 PM Elham Daadmehr <e.daadmehr at gmail.com> > wrote: > >> Hi all, >> >> I have a simple problem. I get stuck in using the imported spss data >> (.sav) >> using "read.spss". >> I imported data (z) without any problem. After importing, the first column >> doesn't contain any "NA". but when I choose a subset of it (like: >> z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the >> first column). >> >> The (.sav) file is the output of Compustat (WRDS). >> >> It is terrible, I can't find the mistake. >> >> Thank you in advance for your help, >> Elham >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]
Offhand, I suspect that the NAs are in the 8th column.> On 26 Aug 2020, at 10:57 , Elham Daadmehr <e.daadmehr at gmail.com> wrote: > > Hi all, > > I have a simple problem. I get stuck in using the imported spss data (.sav) > using "read.spss". > I imported data (z) without any problem. After importing, the first column > doesn't contain any "NA". but when I choose a subset of it (like: > z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the > first column). > > The (.sav) file is the output of Compustat (WRDS). > > It is terrible, I can't find the mistake. > > Thank you in advance for your help, > Elham > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Good point! :-) On Wed, Aug 26, 2020 at 5:55 PM peter dalgaard <pdalgd at gmail.com> wrote:> Offhand, I suspect that the NAs are in the 8th column. > > > On 26 Aug 2020, at 10:57 , Elham Daadmehr <e.daadmehr at gmail.com> wrote: > > > > Hi all, > > > > I have a simple problem. I get stuck in using the imported spss data > (.sav) > > using "read.spss". > > I imported data (z) without any problem. After importing, the first > column > > doesn't contain any "NA". but when I choose a subset of it (like: > > z[z[,8]=="11"|z[,8]=="12"|z[,8]=="14",]), lots of NA appears (even in the > > first column). > > > > The (.sav) file is the output of Compustat (WRDS). > > > > It is terrible, I can't find the mistake. > > > > Thank you in advance for your help, > > Elham > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]