Schwab,Wilhelm K
2010-Oct-13 17:13 UTC
[R] [OT] (slightly) - OpenOffice Calc and text files
Hello all, I had a very strange looking problem that turned out to be due to unexpected (by me at least) format changes to one of my data files. We have a small lab study in which each run is represented by a row in a tab-delimited file; each row identifies a repetition of the experiment and associates it with some subjective measurements and times from our notes that get used to index another file with lots of automatically collected data. In short, nothing shocking. In a moment of weakness, I opened the file using (I think it's version 3.2) of OpenOffice Calc to edit something that I had mangled when I first entered it, saved it (apparently the mistake), and reran my analysis code. The results were goofy, and the problem was in my code that runs before R ever sees the data. That code was confused by things that I would like to ensure don't happen again, and I suspect that some of you might have thoughts on it. The problems specifically: (1) OO seems to be a little stingy about producing tab-delimited text; there is stuff online about using the csv and editing the filter and folks (presumably like us) saying that it deserves to be a separate option. (2) Dates that I had formatted as YYYY got chopped to YY (did we not learn anything last time?<g>) and times that I had formatted in 24 hours ended up AM/PM. Have any of you found a nice (or at least predictable) way to use OO Calc to edit files like this? If it insists on thinking for me, I wish it would think in 24 hour time and 4 digit years :) I work on Linux, so Excel is off the table, but another spreadsheet or text editor would be a viable option, as would configuration changes to Calc. Bill
How about emacs? albyn On Wed, Oct 13, 2010 at 01:13:03PM -0400, Schwab,Wilhelm K wrote:> Hello all, > <.....> > Have any of you found a nice (or at least predictable) way to use OO Calc to edit files like this? If it insists on thinking for me, I wish it would think in 24 hour time and 4 digit years :) I work on Linux, so Excel is off the table, but another spreadsheet or text editor would be a viable option, as would configuration changes to Calc. > > Bill > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Albyn Jones Reed College jones at reed.edu
Peter Langfelder
2010-Oct-13 17:41 UTC
[R] [OT] (slightly) - OpenOffice Calc and text files
On Wed, Oct 13, 2010 at 10:13 AM, Schwab,Wilhelm K <bschwab at anest.ufl.edu> wrote:> Hello all, > > I had a very strange looking problem that turned out to be due to unexpected (by me at least) format changes to one of my data files. ?We have a small lab study in which each run is represented by a row in a tab-delimited file; each row identifies a repetition of the experiment and associates it with some subjective measurements and times from our notes that get used to index another file with lots of automatically collected data. ?In short, nothing shocking. > > In a moment of weakness, I opened the file using (I think it's version 3.2) of OpenOffice Calc to edit something that I had mangled when I first entered it, saved it (apparently the mistake), and reran my analysis code. ?The results were goofy, and the problem was in my code that runs before R ever sees the data. ?That code was confused by things that I would like to ensure don't happen again, and I suspect that some of you might have thoughts on it. > > The problems specifically: > > (1) OO seems to be a little stingy about producing tab-delimited text; there is stuff online about using the csv and editing the filter and folks (presumably like us) saying that it deserves to be a separate option. > > (2) Dates that I had formatted as YYYY got chopped to YY (did we not learn anything last time?<g>) and times that I had formatted in 24 hours ended up AM/PM. > > Have any of you found a nice (or at least predictable) way to use OO Calc to edit files like this? ?If it insists on thinking for me, I wish it would think in 24 hour time and 4 digit years :) ?I work on Linux, so Excel is off the table, but another spreadsheet or text editor would be a viable option, as would configuration changes to Calc.No idea about Calc, I use it regularly but only to view files (and that mostly csv, not tab-delinited). The most primitive solution is to use a plain text editor such as vi that will save everything as it loaded it except for what you change. The second most primitive idea (or maybe not so primitive after all) is to read the table into R and manually fix it there such as table$column[row] = "ABCD" (this is my favorite way of changing things :)). The third most primitive idea which I have actually never used but which may be viable is to load it into R and use the fix() function that pulls up a rather primitive but functional data editor. Peter
On Oct 13, 2010, at 12:13 PM, Schwab,Wilhelm K wrote:> Hello all, > > I had a very strange looking problem that turned out to be due to unexpected (by me at least) format changes to one of my data files. We have a small lab study in which each run is represented by a row in a tab-delimited file; each row identifies a repetition of the experiment and associates it with some subjective measurements and times from our notes that get used to index another file with lots of automatically collected data. In short, nothing shocking. > > In a moment of weakness, I opened the file using (I think it's version 3.2) of OpenOffice Calc to edit something that I had mangled when I first entered it, saved it (apparently the mistake), and reran my analysis code. The results were goofy, and the problem was in my code that runs before R ever sees the data. That code was confused by things that I would like to ensure don't happen again, and I suspect that some of you might have thoughts on it. > > The problems specifically: > > (1) OO seems to be a little stingy about producing tab-delimited text; there is stuff online about using the csv and editing the filter and folks (presumably like us) saying that it deserves to be a separate option. > > (2) Dates that I had formatted as YYYY got chopped to YY (did we not learn anything last time?<g>) and times that I had formatted in 24 hours ended up AM/PM. > > Have any of you found a nice (or at least predictable) way to use OO Calc to edit files like this? If it insists on thinking for me, I wish it would think in 24 hour time and 4 digit years :) I work on Linux, so Excel is off the table, but another spreadsheet or text editor would be a viable option, as would configuration changes to Calc. > > BillI don't use OpenOffice (soon to be LibreOffice) much these days, but one of the things that you can try, is when you go to save the file as a CSV and edit the filter, there is an option there "Save cell content as shown". If that is checked, then any cell formatting that has been applied, either by default or by your actions, will be retained in the exported data. If that is unchecked, then the 'raw' data is exported to the file. I just tried it here (on OSX) and with the option checked, the years were exported with the default two digits. The years were exported with four digits with the box unchecked. Unfortunately, I had no joy with a time field. The AM/PM formatting was retained with the box checked or unchecked.>From what I can tell from a quick search, these default formats are determined by the language/locale settings.On Linux, a spreadsheet based alternative would be Gnumeric (http://projects.gnome.org/gnumeric/) and of course, there is always Emacs, which I have now used on Windows, Linux and OSX. HTH, Marc Schwartz
On Wed, Oct 13, 2010 at 10:13 AM, Schwab,Wilhelm K <bschwab at anest.ufl.edu> wrote:> Hello all, > > I had a very strange looking problem that turned out to be due to unexpected (by me at least) format changes to one of my data files. ?We have a small lab study in which each run is represented by a row in a tab-delimited file; each row identifies a repetition of the experiment and associates it with some subjective measurements and times from our notes that get used to index another file with lots of automatically collected data. ?In short, nothing shocking. > > In a moment of weakness, I opened the file using (I think it's version 3.2) of OpenOffice Calc to edit something that I had mangled when I first entered it, saved it (apparently the mistake), and reran my analysis code. ?The results were goofy, and the problem was in my code that runs before R ever sees the data. ?That code was confused by things that I would like to ensure don't happen again, and I suspect that some of you might have thoughts on it. > > The problems specifically: > > (1) OO seems to be a little stingy about producing tab-delimited text; there is stuff online about using the csv and editing the filter and folks (presumably like us) saying that it deserves to be a separate option.It is, but it is doable (you can manually edit the extension to .txt and edit the field and then choose tab or a _few_ other options that your heart desires. Importing should be similar.> > (2) Dates that I had formatted as YYYY got chopped to YY (did we not learn anything last time?<g>) and times that I had formatted in 24 hours ended up AM/PM.The "general" cell format can be quite convenient, but usually seems one of the most awful creations in both Excel and Calc. Rants aside, try forcing the cell format (I like text because it generally just treats it asis then).> > Have any of you found a nice (or at least predictable) way to use OO Calc to edit files like this? ?If it insists on thinking for me, I wish it would think in 24 hour time and 4 digit years :) ?I work on Linux, so Excel is off the table, but another spreadsheet or text editor would be a viable option, as would configuration changes to Calc. > > Bill > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Schwab,Wilhelm K
2010-Oct-13 18:28 UTC
[R] [OT] (slightly) - OpenOffice Calc and text files
It will get a good look, as will gnumeric - thanks to all! Bill ________________________________________ From: Albyn Jones [jones at reed.edu] Sent: Wednesday, October 13, 2010 2:14 PM To: Schwab,Wilhelm K Subject: Re: [R] [OT] (slightly) - OpenOffice Calc and text files emacs shows you exactly what is there, nothing more nor less. it isn't a spreadsheet, but tabs will align columns. albyn On Wed, Oct 13, 2010 at 01:53:46PM -0400, Schwab,Wilhelm K wrote:> Albyn, > > I'll look into it. In fact, I have a small book on it that I bought in my very early days of using Linux. I quickly found TeX Maker (for the obvious), Code::Blocks for C/C++ and I would not have started the move without a working Smalltalk (http://pharo-project.org/home). > > For editing data files, I really just want something that shows data in an understandable grid and does not do weird stuff thinking it's being helpful. > > Bill > > > ________________________________________ > From: Albyn Jones [jones at reed.edu] > Sent: Wednesday, October 13, 2010 1:39 PM > To: Schwab,Wilhelm K > Cc: r-help at r-project.org > Subject: Re: [R] [OT] (slightly) - OpenOffice Calc and text files > > How about emacs? > > albyn > > On Wed, Oct 13, 2010 at 01:13:03PM -0400, Schwab,Wilhelm K wrote: > > Hello all, > > <.....> > > Have any of you found a nice (or at least predictable) way to use OO Calc to edit files like this? If it insists on thinking for me, I wish it would think in 24 hour time and 4 digit years :) I work on Linux, so Excel is off the table, but another spreadsheet or text editor would be a viable option, as would configuration changes to Calc. > > > > Bill > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Albyn Jones > Reed College > jones at reed.edu > >-- Albyn Jones Reed College jones at reed.edu
David Winsemius
2010-Oct-13 18:52 UTC
[R] [OT] (slightly) - OpenOffice Calc and text files
On Oct 13, 2010, at 1:13 PM, Schwab,Wilhelm K wrote:> Hello all, > > I had a very strange looking problem that turned out to be due to > unexpected (by me at least) format changes to one of my data files. > We have a small lab study in which each run is represented by a row > in a tab-delimited file; each row identifies a repetition of the > experiment and associates it with some subjective measurements and > times from our notes that get used to index another file with lots > of automatically collected data. In short, nothing shocking. > > In a moment of weakness, I opened the file using (I think it's > version 3.2) of OpenOffice Calc to edit something that I had mangled > when I first entered it, saved it (apparently the mistake), and > reran my analysis code. The results were goofy, and the problem was > in my code that runs before R ever sees the data. That code was > confused by things that I would like to ensure don't happen again, > and I suspect that some of you might have thoughts on it. > > The problems specifically: > > (1) OO seems to be a little stingy about producing tab-delimited > text; there is stuff online about using the csv and editing the > filter and folks (presumably like us) saying that it deserves to be > a separate option.You have been little stingy yourself about describing what you did. I see no specifics about the actual data used as input nor the specific operations. I just opened an OO.o Calc workbook and dropped a character vector, "1969-12-31 23:59:50" copied from help(POSIXct) into a2. I then copied it to a3 and formatted it to be in the precanned format, MM/DD/YYYY HH:MM:SS , noticed that it had not been interpreted as a data-time vlaue at all so entered =TODAY()+TIME(13;0;0) in a4 and =TIME(13;0;0) in a5, formated to a user specified custom time format of YYYY-MM-DD HH:MM:SS Copied a5 to c1:c5 saved to a text-csv file specifying the field separator as tab and the text-delimiter as '"' and got: ""time" 1899-12-30 13:00:00 "1969-12-31 23:59:50" 1899-12-30 13:00:00 "1969-12-31 23:59:50" 1899-12-30 13:00:00 2010-10-13 13:00:00 1899-12-30 13:00:00 1899-12-30 13:00:00 1899-12-30 13:00:00 This handling of dates and times does not seem particularly difficult to elicit andseems to represent dates in YYYY and times in "military time".> > (2) Dates that I had formatted as YYYY got chopped to YY (did we not > learn anything last time?<g>) and times that I had formatted in 24 > hours ended up AM/PM. > > Have any of you found a nice (or at least predictable) way to use OO > Calc to edit files like this?I didn't do anything I thought was out of the ordinary and so cannot reproduce your problem. (This was on a Mac, but OO.o is probably going to behave the same across *NIX cultures.) -- David> If it insists on thinking for me, I wish it would think in 24 hour > time and 4 digit years :)Is it possible that you have not done enough thinking for _it_?> I work on Linux, so Excel is off the table, but another spreadsheet > or text editor would be a viable option, as would configuration > changes to Calc. > > Bill > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
---------- Forwarded message ---------- From: John C Frain <frainj at gmail.com> Date: 14 October 2010 18:20 Subject: Re: [R] [OT] (slightly) - OpenOffice Calc and text files To: "Schwab,Wilhelm K" <bschwab at anest.ufl.edu> Just a basic comment on Open Office Calc. ?I find Open Office Calc and excellent program for viewing and minor processing of data - much better than Excel. In particular it is a very flexible tool for transferring data between different packages. ?However one must take a certain amount of care in its use. If you import a csv or text file into open office it is reformatted by ?Calc. ?If you save this as csv or text or similar then the default is to save exactly as displayed. ?If I wish to retain more significant data or dates in a particular format use the various format facilities in Calc to ensure that what you require is displayed in Calc before you save as csv or similar. ?There is also a useful facility to edit the output facilities before saving. ?Calc or Excel or Write or Word do add formatting that any plain text editor (Scite Notepad++ Emacs etc.) do not. Best Regards John On 13 October 2010 21:27, Schwab,Wilhelm K <bschwab at anest.ufl.edu> wrote:> I know *what* happened (Calc reformatted the data in ways I did not want or expect). ? It is not end-of-line conventions; they reformatted the data leaving the structure intact. ?As to why/how, that could depend on the sequence of operations, so I thought to ask here to see if you had collectively either found something specific to do or to avoid. > > Gnumeric is now freshly installed and will get some testing; if I don't care for it, I'll look more at emacs. ?I don't ask much of a spreadsheet (show/edit a grid and maybe hide/show columns for complex data sets), but it would be nice if it did not reformat everything every time I open a file :( > > So far, gnumeric successfully opened a file; I will be a little less trusting when it comes to saving one. ?Thanks!! > > Bill > > > > > ________________________________________ > From: Mike Marchywka [marchywka at hotmail.com] > Sent: Wednesday, October 13, 2010 3:10 PM > To: dwinsemius at comcast.net; Schwab,Wilhelm K > Cc: r-help at r-project.org > Subject: RE: [R] [OT] (slightly) - OpenOffice Calc and text files > > ---------------------------------------- >> From: dwinsemius at comcast.net >> To: bschwab at anest.ufl.edu >> Date: Wed, 13 Oct 2010 14:52:21 -0400 >> CC: r-help at r-project.org >> Subject: Re: [R] [OT] (slightly) - OpenOffice Calc and text files >> >> >> On Oct 13, 2010, at 1:13 PM, Schwab,Wilhelm K wrote: >> >> > Hello all, >> > >> > I had a very strange looking problem that turned out to be due to >> > unexpected (by me at least) format changes to one of my data files. >> > We have a small lab study in which each run is represented by a row >> > in a tab-delimited file; each row identifies a repetition of the >> > experiment and associates it with some subjective measurements and >> > times from our notes that get used to index another file with lots >> > of automatically collected data. In short, nothing shocking. >> > >> > In a moment of weakness, I opened the file using (I think it's >> > version 3.2) of OpenOffice Calc to edit something that I had mangled >> > when I first entered it, saved it (apparently the mistake), and >> > reran my analysis code. The results were goofy, and the problem was >> > in my code that runs before R ever sees the data. That code was >> > confused by things that I would like to ensure don't happen again, >> > and I suspect that some of you might have thoughts on it. >> > >> > The problems specifically: >> > >> > (1) OO seems to be a little stingy about producing tab-delimited >> > text; there is stuff online about using the csv and editing the >> > filter and folks (presumably like us) saying that it deserves to be >> > a separate option. >> >> You have been little stingy yourself about describing what you did. I >> see no specifics about the actual data used as input nor the specific >> operations. I just opened an OO.o Calc workbook and dropped a >> character vector, "1969-12-31 23:59:50" copied from help(POSIXct) into > > > >> > Have any of you found a nice (or at least predictable) way to use OO >> > Calc to edit files like this? >> >> I didn't do anything I thought was out of the ordinary and so cannot >> reproduce your problem. (This was on a Mac, but OO.o is probably going >> to behave the same across *NIX cultures.) >> >> -- >> David >> >> > If it insists on thinking for me, I wish it would think in 24 hour >> > time and 4 digit years :) >> >> Is it possible that you have not done enough thinking for _it_? >> >> > I work on Linux, so Excel is off the table, but another spreadsheet >> > or text editor would be a viable option, as would configuration >> > changes to Calc. >> > >> > Bill > > Probably instead of guessing and seeing how various things react, you > could go get a utility like octal dump or open in an editor that > has a hex mode and see what happened. This could be anything- crlf convention, > someone turned it to unicode, etc. On linux or cygwin I think you have > "od" available. Then of course, if you know what R likes, you can use > sed to fix it... > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- John C Frain Economics Department Trinity College Dublin Dublin 2 Ireland www.tcd.ie/Economics/staff/frainj/home.html mailto:frainj at tcd.ie mailto:frainj at gmail.com -- John C Frain Economics Department Trinity College Dublin Dublin 2 Ireland www.tcd.ie/Economics/staff/frainj/home.html mailto:frainj at tcd.ie mailto:frainj at gmail.com