With help from several people, I used file.choose() to get my file name, and read.csv() to read in the file as KurtzData. Then when I print KurtzData, the last several lines look like this: 39 5/31/22 16.0 341 1.75525 0.0201 0.0214 7.00 40 6/28/22 2:00 PM 0.0 215 0.67950 0.0156 0.0294 NA 41 7/25/22 11:00 AM 11.9 1943.5 NA NA 0.0500 7.80 42 8/31/22 0 220.5 NA NA 0.0700 30.50 43 9/28/22 0.067 10.9 NA NA 0.0700 10.20 44 10/26/22 0.086 237 NA NA 0.1550 45.00 45 1/12/23 1:00 PM 36.26 24196 NA NA 0.7500 283.50 46 2/14/23 1:00 PM 20.71 55 NA NA 0.0500 2.40 47 NA NA NA NA 48 NA NA NA NA 49 NA NA NA NA Then the NA?s go down to one numbered 973. Where did those extras likely come from, and how do I get rid of them? I assume I need to get rid of all the lines after #46, to do calculations and graphics, no? David [[alternative HTML version deleted]]
Dear David To get the first 46 rows just do KurtzData[1:43,] However really you want to find out why it happened. It looks as though the .csv file you read has lots of blank lines at the end. I would open it in an editor to check that. Michael On 23/09/2023 23:55, Parkhurst, David wrote:> With help from several people, I used file.choose() to get my file name, and read.csv() to read in the file as KurtzData. Then when I print KurtzData, the last several lines look like this: > 39 5/31/22 16.0 341 1.75525 0.0201 0.0214 7.00 > 40 6/28/22 2:00 PM 0.0 215 0.67950 0.0156 0.0294 NA > 41 7/25/22 11:00 AM 11.9 1943.5 NA NA 0.0500 7.80 > 42 8/31/22 0 220.5 NA NA 0.0700 30.50 > 43 9/28/22 0.067 10.9 NA NA 0.0700 10.20 > 44 10/26/22 0.086 237 NA NA 0.1550 45.00 > 45 1/12/23 1:00 PM 36.26 24196 NA NA 0.7500 283.50 > 46 2/14/23 1:00 PM 20.71 55 NA NA 0.0500 2.40 > 47 NA NA NA NA > 48 NA NA NA NA > 49 NA NA NA NA > > Then the NA?s go down to one numbered 973. Where did those extras likely come from, and how do I get rid of them? I assume I need to get rid of all the lines after #46, to do calculations and graphics, no? > > David > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Michael
On 23/09/2023 6:55 p.m., Parkhurst, David wrote:> With help from several people, I used file.choose() to get my file name, and read.csv() to read in the file as KurtzData. Then when I print KurtzData, the last several lines look like this: > 39 5/31/22 16.0 341 1.75525 0.0201 0.0214 7.00 > 40 6/28/22 2:00 PM 0.0 215 0.67950 0.0156 0.0294 NA > 41 7/25/22 11:00 AM 11.9 1943.5 NA NA 0.0500 7.80 > 42 8/31/22 0 220.5 NA NA 0.0700 30.50 > 43 9/28/22 0.067 10.9 NA NA 0.0700 10.20 > 44 10/26/22 0.086 237 NA NA 0.1550 45.00 > 45 1/12/23 1:00 PM 36.26 24196 NA NA 0.7500 283.50 > 46 2/14/23 1:00 PM 20.71 55 NA NA 0.0500 2.40 > 47 NA NA NA NA > 48 NA NA NA NA > 49 NA NA NA NA > > Then the NA?s go down to one numbered 973. Where did those extras likely come from, and how do I get rid of them? I assume I need to get rid of all the lines after #46, to do calculations and graphics, no?Many Excel spreadsheets have a lot of garbage outside the range of the data. Sometimes it is visible if you know where to look, sometimes it is blank cells. Perhaps at some point you (or the file creator) accidentally entered a number in line 973. Then Excel will think the sheet has 973 lines. I don't know the best way to tell Excel that those lines are pure garbage. That's why old fogies like me recommend that you do as little as possible in Excel. Get the data into a reliable form as soon as possible. Once it is an R dataframe, you can delete lines using negative indices. In this case use fixed <- KurtzData[-(47:nrow(KurtzData)), ] which will create a new dataframe with only rows 1 to 46. Duncan Murdoch
David,
You have choices depending on your situation and plans.
Obviously the ideal solution is to make any CSV you save your EXCEL data in to
have exactly what you want. So if your original EXCEL file contains things like
a blank character down around row 973, get rid of it or else all lines to there
may be picked up and made into an NA. I suggest deleting all extra lines as a
first try.
The other method to try is simply to read in the file and only keep complete
cases. But your data shows you can have an NA in some columns, such as for 7/25
so using complete.cases() is not a good choice.
So since your first column (or maybe second) seems to be a date and I think
that is not optional, simply filter your data.frame to remove all rows where
is.na(DF$COL) is TRUE or some similar stratagem such as checking if all columns
are NA.
My guess is you may have re-used an EXCEL file and put new shorter data in it,
or that the file has been edited and something was left where it should not be,
perhaps something non-numeric.
Another idea is to NOT use the CSV route and use one of many packages carefully
to read the data from a native EXCEL format such as an XSLX file where you can
specify which tab you want and where on the page you want to read from. You can
point it at the precise rectangular area you want.
And, of course, there are an assortment of cut/paste ways to get the data into
tour R program, albeit if the data can change and you need to run the analysis
again, these are less useful. Here is an example making use of the fact that on
Windows, the copied text is tab separated.
text="A B
1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
10 9
"
df=read.csv(text=text, sep="\t")
df
A B
1 1 0
2 2 1
3 3 2
4 4 3
5 5 4
6 6 5
7 7 6
8 8 7
9 9 8
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Parkhurst,
David
Sent: Saturday, September 23, 2023 6:55 PM
To: r-help at r-project.org
Subject: [R] Odd result
With help from several people, I used file.choose() to get my file name, and
read.csv() to read in the file as KurtzData. Then when I print KurtzData, the
last several lines look like this:
39 5/31/22 16.0 341 1.75525 0.0201 0.0214 7.00
40 6/28/22 2:00 PM 0.0 215 0.67950 0.0156 0.0294 NA
41 7/25/22 11:00 AM 11.9 1943.5 NA NA 0.0500 7.80
42 8/31/22 0 220.5 NA NA 0.0700 30.50
43 9/28/22 0.067 10.9 NA NA 0.0700 10.20
44 10/26/22 0.086 237 NA NA 0.1550 45.00
45 1/12/23 1:00 PM 36.26 24196 NA NA 0.7500 283.50
46 2/14/23 1:00 PM 20.71 55 NA NA 0.0500 2.40
47 NA NA NA NA
48 NA NA NA NA
49 NA NA NA NA
Then the NA?s go down to one numbered 973. Where did those extras likely come
from, and how do I get rid of them? I assume I need to get rid of all the lines
after #46, to do calculations and graphics, no?
David
[[alternative HTML version deleted]]
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Parkhurst,
David
Sent: Saturday, September 23, 2023 6:55 PM
To: r-help at r-project.org
Subject: [R] Odd result
With help from several people, I used file.choose() to get my file name, and
read.csv() to read in the file as KurtzData. Then when I print KurtzData, the
last several lines look like this:
39 5/31/22 16.0 341 1.75525 0.0201 0.0214 7.00
40 6/28/22 2:00 PM 0.0 215 0.67950 0.0156 0.0294 NA
41 7/25/22 11:00 AM 11.9 1943.5 NA NA 0.0500 7.80
42 8/31/22 0 220.5 NA NA 0.0700 30.50
43 9/28/22 0.067 10.9 NA NA 0.0700 10.20
44 10/26/22 0.086 237 NA NA 0.1550 45.00
45 1/12/23 1:00 PM 36.26 24196 NA NA 0.7500 283.50
46 2/14/23 1:00 PM 20.71 55 NA NA 0.0500 2.40
47 NA NA NA NA
48 NA NA NA NA
49 NA NA NA NA
Then the NA?s go down to one numbered 973. Where did those extras likely come
from, and how do I get rid of them? I assume I need to get rid of all the lines
after #46, to do calculations and graphics, no?
David
[[alternative HTML version deleted]]