Ulisses.Camargo
2011-Apr-05 15:06 UTC
[R] Help to check data before putting it in a database
The example scene: I have a database with stats about each goal made by my soccer team. This database (a data frame in R) is organized in lines (goals) with a set of columns containing data about these goals (player name, tactic position, etc). For now, this database will be called "data.frame1". What I need is to feed this "data.frame1" with new information about my team goals. I will call this new information "data.frame2". This set of new goals is organized in the same way as in "data.frame1" (equal numbers of cols). Where help is needed: I need help in finding a way to check the player-name column in "data.frame2" before feeding "data.frame1" with it. What I need is a way to verify the name of the player on each line of "data.frame2" with the names of players that already exist on a col in "data.frame1". Moreover, I need R to make two main things: First, the lines of ?data.frame2? with player names that already exists in ?data.frame1? must be added to ?data.frame1?. Second: lines of ?data.frame2? with player names that does not exist on ?data.frame1? must be listed in an output to be manually checked and corrected. After this verification, corrected lines and new-player-names lines must be incorporated in "data.frame1". What I want is to guarantee that will be no lines with wrong player names in my database. At the same time, my script must permit new information to be added (new player names). Is there somebody who could help me with this? Thanks for your attention Best wishes Ulisses -- View this message in context: http://r.789695.n4.nabble.com/Help-to-check-data-before-putting-it-in-a-database-tp3428318p3428318.html Sent from the R help mailing list archive at Nabble.com.
Hi Ulisses, Look at the functions ?match and ?rbind If you do not want to do it by hand, you can make a little function as below. HTH, Josh d1 <- data.frame(goals = 4:1, players = LETTERS[1:4]) d2 <- data.frame(goals = c(1, 3, 2, 5), players = LETTERS[3:6]) f <- function(old, new, check) { index <- new[, check] %in% old[, check] dat <- rbind(old, new[index, ]) tocheck <- new[!index, ] list(merged = dat, tocheck = tocheck) } dmerged <- f(d1, d2, "players") ## check "tocheck" and once it is correct dfinal <- do.call("rbind", dmerged) On Tue, Apr 5, 2011 at 8:06 AM, Ulisses.Camargo <moliterno.camargo at gmail.com> wrote:> The example scene: > > I have a database with stats about each goal made by my soccer team. This > database (a data frame in R) is organized in lines (goals) with a set of > columns containing data about these goals (player name, tactic position, > etc). For now, this database will be called "data.frame1". > > What I need is to feed this "data.frame1" with new information about my team > goals. I will call this new information "data.frame2". This set of new goals > is organized in the same way as in "data.frame1" (equal numbers of cols). > > Where help is needed: > > I need help in finding a way to check the player-name column in > "data.frame2" before feeding "data.frame1" with it. What I need is a way to > verify the name of the player on each line of "data.frame2" with the names > of players that already exist on a col in "data.frame1". Moreover, I need R > to make two main things: > > First, the lines of ?data.frame2? with player names that already exists in > ?data.frame1? must be added to ?data.frame1?. > > Second: lines of ?data.frame2? with player names that does not exist on > ?data.frame1? must be listed in an output to be manually checked and > corrected. > After this verification, corrected lines and new-player-names lines must be > incorporated in "data.frame1". > > What I want is to guarantee that will be no lines with wrong player names in > my database. > At the same time, my script must permit new information to be added (new > player names). > > Is there somebody who could help me with this? > > Thanks for your attention > > Best wishes > Ulisses > > -- > View this message in context: http://r.789695.n4.nabble.com/Help-to-check-data-before-putting-it-in-a-database-tp3428318p3428318.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Jeff Newmiller
2011-Apr-05 15:36 UTC
[R] Help to check data before putting it in a database
I would recommend using R to check your input and identify bad input and to only load data that passes validation. Then go back to some other tool for editing the data and save/reload/reverify the edited data. The merge command with the all.x argument and is.na() can be used, or the ! and %in% logical operators can be used, to find non-matching values. If you are determined to modify the data in R, then you probably need the tk library, the use of which is not really a topic for this forum. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. "Ulisses.Camargo" <moliterno.camargo@gmail.com> wrote: The example scene: I have a database with stats about each goal made by my soccer team. This database (a data frame in R) is organized in lines (goals) with a set of columns containing data about these goals (player name, tactic position, etc). For now, this database will be called "data.frame1". What I need is to feed this "data.frame1" with new information about my team goals. I will call this new information "data.frame2". This set of new goals is organized in the same way as in "data.frame1" (equal numbers of cols). Where help is needed: I need help in finding a way to check the player-name column in "data.frame2" before feeding "data.frame1" with it. What I need is a way to verify the name of the player on each line of "data.frame2" with the names of players that already exist on a col in "data.frame1". Moreover, I need R to make two main things: First, the lines of “data.frame2” with player names that already exists in “data.frame1” must be added to “data.fram e1”. Second: lines of “data.frame2” with player names that does not exist on “data.frame1” must be listed in an output to be manually checked and corrected. After this verification, corrected lines and new-player-names lines must be incorporated in "data.frame1". What I want is to guarantee that will be no lines with wrong player names in my database. At the same time, my script must permit new information to be added (new player names). Is there somebody who could help me with this? Thanks for your attention Best wishes Ulisses -- View this message in context: http://r.789695.n4.nabble.com/Help-to-check-data-before-putting-it-in-a-database-tp3428318p3428318.html Sent from the R help mailing list archive at Nabble.com._____________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]