thr3ads.net - R help - [R] Stack overflow in R 2.10.0 with sub() [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Kenneth Roy Cabrera Torres

2009-Oct-27 12:15 UTC

[R] Stack overflow in R 2.10.0 with sub()

Hi R developers:

Congratulations for the new R 2.10.0 version.

It is a huge effort! Thank you for your work and dedication.

I just want to ask how to make this "strip blank" function
to work again (it works on R.2.9.2).

alumnos$AL_NUME_ID<-sub("(^ +)|(
+$)","",alumnos$AL_NUME_ID),)

"alumnos" is a data base with 900.000 rows and 72 columns.
and "alumnos$AL_NUME_ID" is a character variable read form
a "mysql" database.

The system shows me this message:

Error: C produce desborde de pila en 'segfault'

It seems a "stack overflow" problem, but it works on R 2.9.2!

Thank you for your help, and again, thank you for your work!!!

Kenneth

Duncan Murdoch

2009-Oct-27 12:53 UTC

head link

[R] Stack overflow in R 2.10.0 with sub()

On 10/27/2009 8:15 AM, Kenneth Roy Cabrera Torres wrote:> Hi R developers:
> 
> Congratulations for the new R 2.10.0 version.
> 
> It is a huge effort! Thank you for your work and dedication.
> 
> I just want to ask how to make this "strip blank" function
> to work again (it works on R.2.9.2).
> 
> alumnos$AL_NUME_ID<-sub("(^ +)|(
+$)","",alumnos$AL_NUME_ID),)
> 
> "alumnos" is a data base with 900.000 rows and 72 columns.
> and "alumnos$AL_NUME_ID" is a character variable read form
> a "mysql" database.
> 
> The system shows me this message:
> 
> Error: C produce desborde de pila en 'segfault'
> 
> It seems a "stack overflow" problem, but it works on R 2.9.2!
> 
> Thank you for your help, and again, thank you for your work!!!
I just tried that (after fixing the typo at the end of the line) and it 
worked on these vectors:

x <- c("a", " a", "a ", " a ")
y <- rep(x, 900000)

So there is something about your dataset that is causing the problem. 
Can you narrow it down?  Here are some tests:

1.  Check that it is the value that is causing the problem, not the 
manner of getting it:

x <- alumnos$AL_NUME_ID
y <- sub("(^ +)|( +$)","",x)

2.  See if it is in the first half of the data:

x <- alumnos$AL_NUME_ID
x <- x[seq_len(length(x)/2)]
y <- sub("(^ +)|( +$)","",x)

3.  See if it is in the second half:

x <- alumnos$AL_NUME_ID
x <- x[-seq_len(length(x)/2)]
y <- sub("(^ +)|( +$)","",x)

If you can narrow it down to a particularly short vector that causes the 
error, that would be very helpful.  It's likely to be somewhat tedious, 
because I imagine those segfaults will terminate R; I'd suggest using 
save.image() a lot when things are working, so you can restart after a 
crash.

Duncan Murdoch

Kenneth Roy Cabrera Torres

2009-Oct-27 14:46 UTC

head link

[R] Stack overflow in R 2.10.0 with sub()

Dr. Murdoch:

I am puzzled!
As you adviced me I do this:

x <- as.character(alumnos$AL_NUME_ID)
x <- x[-seq_len(length(x)/2)]
y <- gsub("(^ +)|( +$)","",x)

And it fails,

But, trying to locate the problem I do:

x <- as.character(alumnos$AL_NUME_ID)
x <- x[-seq_len(length(x)/2)]
x <- x[seq_len(length(x)/2)]
y <- gsub("(^ +)|( +$)","",x)

works

x <- as.character(alumnos$AL_NUME_ID)
x <- x[-seq_len(length(x)/2)]
x <- x[-seq_len(length(x)/2)]
y <- gsub("(^ +)|( +$)","",x)

works

Now, both works!!!

So, I am puzzle!!! I cannot locate the problem.
Thank you for your advice.

Kenneth

Kenneth Roy Cabrera Torres

2009-Oct-27 18:16 UTC

head link

[R] Stack overflow in R 2.10.0 with sub()

El mar, 27-10-2009 a las 10:47 -0700, Phil Spector
escribi?:> What happens if you type
> 
> Sys.setlocale('LC_ALL','C')
> 
> before using gsub or grep?
When I do that, R hangs and  don't show any message.> 
>  					- Phil Spector
>  					 Statistical Computing Facility
>  					 Department of Statistics
>  					 UC Berkeley
>  					 spector at stat.berkeley.edu
> 
> 
> On Tue, 27 Oct 2009, Kenneth Roy Cabrera Torres wrote:
> 
> > Thank you very much for your interest.
> >
> > I make this:
> > x <- as.character(alumnos$AL_NUME_ID)
> > x <- x[-seq_len(length(x)/2)]
> > save(x, file="x.RData")
> >
> > I exit form R, and then restart R and I make this:
> >
> > load("x.RData")
> > y <- gsub("(^ +)|( +$)","",x)
> >
> > It shows me:
> >
> > Error en gsub("(^ +)|( +$)", "", x) :
> >  input string 66644 is invalid in this locale
> >
> > I delete that string (it is a string with a non usual character (?))
> >
> > So, I retype without that observation.
> >
> > y <- gsub("(^ +)|( +$)","",x[-c(66644)])
> >
> > I got this:
> > Error en gsub("(^ +)|( +$)", "", x[-c(66644)]) :
> >  input string 160689 is invalid in this locale
> >
> > I retype again with this invalid string this way (I use the
> >  160690 position, because the lag of the x vector)
> >
> >> y <- gsub("(^ +)|(
+$)","",x[-c(66644,160690)])
> > Error: C produce desborde de pila en 'segfault'
> >
> > And it fails.
> >
> > I also repeat all the process with this conversion first.
> >
> > x <-
iconv(as.character(alumnos$AL_NUME_ID),"latin1","UTF-8")
> > x <- x[-seq_len(length(x)/2)]
> > save(x, file="x.RData")
> >
> > And I exit, and restart R, and then I type
> >
> > load("x.RData")
> > y <- gsub("(^ +)|( +$)","",x)
> >
> > And it fails again without showing me the "invalid string"
errors.
> >
> > I then make this:
> >
> > load("x.RData")
> > y <- gsub("(^ +)|( +$)","",x[1:160690])
> >
> > and it works, then I type
> >
> > y <- gsub("(^ +)|( +$)","",x[1:200000]) #(x
length is 454035)
> >
> > and it works...
> >
> > But I start to make a manual binary search,
> > I found something that stills puzzle me.
> >
> > y <- gsub("(^ +)|( +$)","",x[1:261570])
> >
> > works, but sometimes fails (after I restart R),
> > it always fails with index greather than 262000.
> >
> > I see that there are not something inusual arround 261570.
> >
> > x[261560:261580]
> > [1] "21444777             " "1147585             
" "255202522
> > "
> > [4] "25852100             " "24258550            
" "A8D0251207
> > "
> > [7] "34681811             " "19121345            
" "16921329
> > "
> > [10] "20442195             " "14506482            
" "44332211
> > "
> > [13] "35049122             " "34326340            
" "35182366
> > "
> > [16] "33288742             " "34958795            
" "1017147202
> > "
> > [19] "3306985              " "33048501            
" "33295073
> > "
> >
> > I am sending you the x.Rdata file to see if you can
> > reproduce my problem.
> >
> > This infomation may be useful:
> >
> > sessionInfo()
> >
> > R version 2.10.0 (2009-10-26)
> > x86_64-unknown-linux-gnu
> >
> > locale:
> > [1] LC_CTYPE=es_CO.UTF-8       LC_NUMERIC=C
> > [3] LC_TIME=es_CO.UTF-8        LC_COLLATE=es_CO.UTF-8
> > [5] LC_MONETARY=C              LC_MESSAGES=es_CO.UTF-8
> > [7] LC_PAPER=es_CO.UTF-8       LC_NAME=C
> > [9] LC_ADDRESS=C               LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > R.Version()
> >
> > $platform
> > [1] "x86_64-unknown-linux-gnu"
> > $arch
> > [1] "x86_64"
> > $os
> > [1] "linux-gnu"
> > $system
> > [1] "x86_64, linux-gnu"
> > $status
> > [1] ""
> > $major
> > [1] "2"
> > $minor
> > [1] "10.0"
> > $year
> > [1] "2009"
> > $month
> > [1] "10"
> > $day
> > [1] "26"
> > $`svn rev`
> > [1] "50208"
> > $language
> > [1] "R"
> > $version.string
> > [1] "R version 2.10.0 (2009-10-26)"
> >
> > gcc --version and g++ --verision shows me:
> >
> > gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3
> > Copyright (C) 2008 Free Software Foundation, Inc.
> > Esto es software libre; vea el c?digo para las condiciones de copia. 
NO
> > hay
> > garant?a; ni siquiera para MERCANTIBILIDAD o IDONEIDAD PARA UN
PROP?SITO
> > EN
> > PARTICULAR
> >
> > When I compile R I use this option in configuration (nothing more)
> >
> > ./configure --enable-R-shlib
> > make
> > sudo make install
> >
> > At the moment I have 22Gb of swap partition (keeping monitor tracking
> > the systems is not using it) and 4GB of RAM.
> >
> > Again, thank you very much for your help.
> >
> > Kenneth
> >
> >
> >
> >
> >
> >

Apparently Analagous Threads

Search for more reasonably related threads

R help - Oct 2009 - Stack overflow in R 2.10.0 with sub()

[R] Stack overflow in R 2.10.0 with sub()

[R] Stack overflow in R 2.10.0 with sub()

[R] Stack overflow in R 2.10.0 with sub()

[R] Stack overflow in R 2.10.0 with sub()

Apparently Analagous Threads