Ranjeet Kumar Jha
2022-Jul-25 13:02 UTC
[R] Need to insert various rows of data from a data frame after particular rows from another dataframe
Hello Everyone, I have dataset in a particular format in "dacnet_yield_update till 2019.xlsx" file, where I need to insert the data of rows 2018-2019 and 2019-2020 for the districts those data are available in "Kharif crops yield_18-19.xlsx". I need to insert these two rows of data belonging to every district, if data is available in a later excel file, just after the particular crop group data for the particular district. I have put the data file in the given link. https://drive.google.com/drive/u/0/folders/1dNmGTI8_c9PK1QqmfIjnpbyzuiCXgxFC Please help solving this problem. Regards and Thanks, Ranjeet [[alternative HTML version deleted]]
CALUM POLWART
2022-Jul-27 06:22 UTC
[R] Need to insert various rows of data from a data frame after particular rows from another dataframe
Not very clear what you are trying to do. But I'd have thought possibly dplyr left_join might be a solution for you. The base R equivalent is merge(). It might be a rbind or cbind can do it too. On Wed, 27 Jul 2022, 03:30 Ranjeet Kumar Jha, <ranjeetjhaiitkgp at gmail.com> wrote:> Hello Everyone, > > I have dataset in a particular format in "dacnet_yield_update till > 2019.xlsx" file, where I need to insert the data of rows 2018-2019 and > 2019-2020 for the districts those data are available in "Kharif crops > yield_18-19.xlsx". I need to insert these two rows of data belonging to > every district, if data is available in a later excel file, just after the > particular crop group data for the particular district. > > I have put the data file in the given link. > > https://drive.google.com/drive/u/0/folders/1dNmGTI8_c9PK1QqmfIjnpbyzuiCXgxFC > > Please help solving this problem. > > Regards and Thanks, > Ranjeet > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
PIKAL Petr
2022-Jul-27 06:30 UTC
[R] Need to insert various rows of data from a data frame after particular rows from another dataframe
Hi.>From what you say, plain "rbind" could be used, if the columns in both setsare the same and in the same order. After that you can reorder the resulting data frame as you wish by "order". AFAIK for most functions row order in data frame does not matter. Cheers Petr> -----Original Message----- > From: R-help <r-help-bounces at r-project.org> On Behalf Of Ranjeet Kumar Jha > Sent: Monday, July 25, 2022 3:03 PM > To: R-help <r-help at r-project.org> > Subject: [R] Need to insert various rows of data from a data frame after > particular rows from another dataframe > > Hello Everyone, > > I have dataset in a particular format in "dacnet_yield_update till2019.xlsx" file,> where I need to insert the data of rows 2018-2019 and > 2019-2020 for the districts those data are available in "Kharif cropsyield_18-> 19.xlsx". I need to insert these two rows of data belonging to everydistrict, if> data is available in a later excel file, just after the particular cropgroup data for> the particular district. > > I have put the data file in the given link. > https://drive.google.com/drive/u/0/folders/1dNmGTI8_c9PK1QqmfIjnpbyzuiC > XgxFC > > Please help solving this problem. > > Regards and Thanks, > Ranjeet > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Ebert,Timothy Aaron
2022-Jul-27 12:34 UTC
[R] Need to insert various rows of data from a data frame after particular rows from another dataframe
I think what you want is full_join() from the dplyr package.
https://www.rdocumentation.org/packages/dplyr/versions/0.7.8/topics/join
The only requirement is that both data frames must have a column in common
wherein the data are entered in the same way. So the column labeled
"state" needs to have "ANANTAPUR" in both data frames rather
than "ANANTAPUR" in one data frame and "Anantapur" in the
other.
Reformat your excel spreadsheet to remove headers. Your first column should be
three columns: State, then Crop, then district rather than headings. The first
row can contain variable names. I would make variable names simple (like
"Area" and "Production") but some like more information so
"Area_Ha" and "Production_TN_per_Ha" would also work. It is
best not to use special symbols in variable names and keep variable names as one
string of characters (no spaces). Including such will eventually cause problems.
https://www.w3schools.com/r/r_variables.asp#:~:text=Variable%20Names&text=Rules%20for%20R%20variables%20are,be%20followed%20by%20a%20digit.
One exception to the rules in the link is that you can make a variable name T or
F. R defaults to interpreting these as TRUE and FALSE. The problem happens when
the programmer reassigns these and then tries to use T or F as Boolean in other
parts of the program.
Tim
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Ranjeet Kumar
Jha
Sent: Monday, July 25, 2022 9:03 AM
To: R-help <r-help at r-project.org>
Subject: [R] Need to insert various rows of data from a data frame after
particular rows from another dataframe
[External Email]
Hello Everyone,
I have dataset in a particular format in "dacnet_yield_update till
2019.xlsx" file, where I need to insert the data of rows 2018-2019 and
2019-2020 for the districts those data are available in "Kharif crops
yield_18-19.xlsx". I need to insert these two rows of data belonging to
every district, if data is available in a later excel file, just after the
particular crop group data for the particular district.
I have put the data file in the given link.
https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_drive_u_0_folders_1dNmGTI8-5Fc9PK1QqmfIjnpbyzuiCXgxFC&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9B32l682GguXDLEFdPm6j5JZNatveGSlY7lnwLYFVOW2TX1tNLeHbDE49MYxSh_Q&s=4_bhl2_drIA0Pn3LHMcoAd02lX0t6bAx2wSlhVAJelA&e
Please help solving this problem.
Regards and Thanks,
Ranjeet
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9B32l682GguXDLEFdPm6j5JZNatveGSlY7lnwLYFVOW2TX1tNLeHbDE49MYxSh_Q&s=MAGsb78RBOWV0usgeNnmHsZcYoQI959dmihJ9Ycs8Lo&ePLEASE
do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=9B32l682GguXDLEFdPm6j5JZNatveGSlY7lnwLYFVOW2TX1tNLeHbDE49MYxSh_Q&s=PSiyw67xhInkZo69l1HojQKGOqthbxYpGL5Q14cPo8w&eand
provide commented, minimal, self-contained, reproducible code.
Richard O'Keefe
2022-Jul-28 06:41 UTC
[R] Need to insert various rows of data from a data frame after particular rows from another dataframe
I'm retired, and I had an hour on my hands while tea cooked and my
granddaughter did her homework, and I just *love* showing off how helpful I
am.
Good news: someone finally looked at your data.
(That would be me.)
Bad news: it's going to be a lot of work to do what you want to, and YOU
SHOULDN'T EVEN TRY because it won't make any sense, as we would all have
known at once had you been clear about the structure of your data in the
beginning.
You have two files.
dacnet_yield_update till 2019.csv
is a straightforward "data" file with structure
crop factor(arhar,bajra,cotton,gram,maize,moong,mustard,
potato,rice,soyabean,sugarcane,urad,wheat)
season factor(kharif,rabi),
state.id integer(1201..1235),
state.name factor, # 34 states
district.id integer(15001..15648),
district.name factor,
year integer(1998..2017),
yield decimal(0.001 .. 314.736, precision=3)
The one problem is that the file name is misleading. It says "till
2019",
but includes no data for 2019 or 2018.
Ah, but the other file! That's not a "data" file intended for
machine use
at all. It's a "display" file intended for human beings to look
at and go
"wow, gosh, lookit them numbahs". It's the kind of thing that
gets
included as an appendix in an official report which seems as if perversely
designed to impede the development of insight as much as possible.
Amongst other difficulties:
- the same column contains state names, district names, crop names, and
assorted junk;
- state names are not coded the same way in the two files;
- district names are in UPPERCASE in the .xls files and have numbers
prefixed to them for no apparent reason;
- crop names are not coded the same way in the two files;
- yields are not coded the same way in the two files (3 digit precision in
one, 2 digit precision in the other) and I have some doubt as to whether
they are measuring the same thing;
- above all, years appear to be CALENDAR years in the .csv file (e.g.,
2017) but FINANCIAL years in the .xsl file (e.g., 2018-19)
Now I could wrangle the .xls file into something closer to the .csv file
easily enough. I'd do it by converting the .xsl to .csv, then writing a
script in AWK. *BUT* my uncertainty that "yield" means the same thing
in
the two files and my certainty that "year" does NOT mean the same
thing
make it unrewarding to do so.
The .xls file is the end product of some process that derived it from data
better structured for computation. It seems like a better use of your time
to go and look for the original data.
It also seems like a good use of your time to make certain you know what
the fields of the .csv file actually mean. ARE those calendar yields, or
just part of a financial year? Why are only the yields of interest and not
the area planted? How are the yields computed?
On Wed, 27 Jul 2022 at 14:31, Ranjeet Kumar Jha <ranjeetjhaiitkgp at
gmail.com>
wrote:
> Hello Everyone,
>
> I have dataset in a particular format in "dacnet_yield_update till
> 2019.xlsx" file, where I need to insert the data of rows 2018-2019 and
> 2019-2020 for the districts those data are available in "Kharif crops
> yield_18-19.xlsx". I need to insert these two rows of data belonging
to
> every district, if data is available in a later excel file, just after the
> particular crop group data for the particular district.
>
> I have put the data file in the given link.
>
>
https://drive.google.com/drive/u/0/folders/1dNmGTI8_c9PK1QqmfIjnpbyzuiCXgxFC
>
> Please help solving this problem.
>
> Regards and Thanks,
> Ranjeet
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]