Hi, 1. I have scraped some data from the web, subset shown below> dput(temp.data)c("Armenia", "Armenia", "43827", "39200", "35700", "36700", "39341", "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0", "0", "0", "0", "0", "Austria", "Austria", "135417", "166200", "144500", "147300", "163211", "162536", "155412", "133667", "134962", "146440", "131188", "100001", "100000", "80000", "35000") 2. The corresponding list of countries, is as follows> dput(raw.country)c("Armenia", "Austria", "Belarus", "Belgium", "Brazil", "Bulgaria", "Canada", "Castile-Leon (Hiszania)", "Catalonia", "Chile", "Colombia", "Costarica", "Croatia", "Cyprus", "Czech Republic", "Ecuador", "Estonia", "Finland", "France", "Georgia", "Germany", "Ghana", "Greece", "Hungary", "Indonesia", "Iran", "Ireland", "Israel", "Italy", "Kazakhstan", "Kyrgyzstan", "Latvia", "Lithuania", "Macedonia", "Malaysia", "Mexico", "Moldova", "Mongolia", "Netherland", "Norway", "Pakistan", "Panama", "Paraguay", "Peru", "Poland", "Portugal", "Puertorico", "Romania", "Russia", "Serbia", "Slovakia", "Slovenia", "Spain", "Sweden", "Switzerland", "Tunisia", "Ukraine", "United Kingdom", "USA", "Venezuela", "Vltava", "World Total") 3. I want to organize the data into a data frame, where each row will contain the 20 values for the corresponding country. It needs to ignore the country name which appears twice.Something like: Armenia "43827", "39200", "35700", "36700", "39341", "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0", "0", "0", "0", "0", "Austria", "135417", "166200", "144500", "147300", "163211", "162536", "155412", "133667", "134962", "146440", "131188", "100001", "100000", "80000", "35000" and so on Thanks / [[alternative HTML version deleted]]
Your data rows have different numbers of columns. Thus your problem is not sufficiently specified. B. On Mar 24, 2016, at 6:30 AM, Burhan ul haq <ulhaqz at gmail.com> wrote:> Hi, > > 1. I have scraped some data from the web, subset shown below > >> dput(temp.data) > c("Armenia", "Armenia", "43827", "39200", "35700", "36700", "39341", > "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0", > "0", "0", "0", "0", "Austria", "Austria", "135417", "166200", > "144500", "147300", "163211", "162536", "155412", "133667", "134962", > "146440", "131188", "100001", "100000", "80000", "35000") > > 2. The corresponding list of countries, is as follows > >> dput(raw.country) > c("Armenia", "Austria", "Belarus", "Belgium", "Brazil", "Bulgaria", > "Canada", "Castile-Leon (Hiszania)", "Catalonia", "Chile", "Colombia", > "Costarica", "Croatia", "Cyprus", "Czech Republic", "Ecuador", > "Estonia", "Finland", "France", "Georgia", "Germany", "Ghana", > "Greece", "Hungary", "Indonesia", "Iran", "Ireland", "Israel", > "Italy", "Kazakhstan", "Kyrgyzstan", "Latvia", "Lithuania", "Macedonia", > "Malaysia", "Mexico", "Moldova", "Mongolia", "Netherland", "Norway", > "Pakistan", "Panama", "Paraguay", "Peru", "Poland", "Portugal", > "Puertorico", "Romania", "Russia", "Serbia", "Slovakia", "Slovenia", > "Spain", "Sweden", "Switzerland", "Tunisia", "Ukraine", "United Kingdom", > "USA", "Venezuela", "Vltava", "World Total") > > > 3. I want to organize the data into a data frame, where each row will > contain the 20 values for the corresponding country. > It needs to ignore the country name which appears twice.Something like: > > Armenia "43827", "39200", "35700", "36700", "39341", > "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0", > "0", "0", "0", "0", > > "Austria", "135417", "166200", > "144500", "147300", "163211", "162536", "155412", "133667", "134962", > "146440", "131188", "100001", "100000", "80000", "35000" > > and so on > > > Thanks / > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi Burhan, As all of your values seem to be character, perhaps: country.df<-as.data.frame(matrix(temp.data,ncol=22,byrow=TRUE)[,2:21]) if there really are 2 country names and 20 values for each country. As Boris has pointed out, there are different numbers of values following the country names in your example. Jim On Thu, Mar 24, 2016 at 9:30 PM, Burhan ul haq <ulhaqz at gmail.com> wrote:> Hi, > > 1. I have scraped some data from the web, subset shown below > >> dput(temp.data) > c("Armenia", "Armenia", "43827", "39200", "35700", "36700", "39341", > "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0", > "0", "0", "0", "0", "Austria", "Austria", "135417", "166200", > "144500", "147300", "163211", "162536", "155412", "133667", "134962", > "146440", "131188", "100001", "100000", "80000", "35000") > > 2. The corresponding list of countries, is as follows > >> dput(raw.country) > c("Armenia", "Austria", "Belarus", "Belgium", "Brazil", "Bulgaria", > "Canada", "Castile-Leon (Hiszania)", "Catalonia", "Chile", "Colombia", > "Costarica", "Croatia", "Cyprus", "Czech Republic", "Ecuador", > "Estonia", "Finland", "France", "Georgia", "Germany", "Ghana", > "Greece", "Hungary", "Indonesia", "Iran", "Ireland", "Israel", > "Italy", "Kazakhstan", "Kyrgyzstan", "Latvia", "Lithuania", "Macedonia", > "Malaysia", "Mexico", "Moldova", "Mongolia", "Netherland", "Norway", > "Pakistan", "Panama", "Paraguay", "Peru", "Poland", "Portugal", > "Puertorico", "Romania", "Russia", "Serbia", "Slovakia", "Slovenia", > "Spain", "Sweden", "Switzerland", "Tunisia", "Ukraine", "United Kingdom", > "USA", "Venezuela", "Vltava", "World Total") > > > 3. I want to organize the data into a data frame, where each row will > contain the 20 values for the corresponding country. > It needs to ignore the country name which appears twice.Something like: > > Armenia "43827", "39200", "35700", "36700", "39341", > "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0", > "0", "0", "0", "0", > > "Austria", "135417", "166200", > "144500", "147300", "163211", "162536", "155412", "133667", "134962", > "146440", "131188", "100001", "100000", "80000", "35000" > > and so on > > > Thanks / > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi! As Boris explained, if you do not always have the same number of values per country, you need to provide more details, e.g. should the empty cells be filled with NA? But if you do always have 20 values per country (unlike in your sample data), then this could work for you: mydf <- data.frame(matrix(temp.data, nrow=2, ncol=22, byrow=TRUE)) You can then subset to remove the 1st column: mydf[-1] HTH, Ivan -- Ivan Calandra, PhD University of Reims Champagne-Ardenne GEGENAA - EA 3795 CREA - 2 esplanade Roland Garros 51100 Reims, France +33(0)3 26 77 36 89 ivan.calandra at univ-reims.fr -- https://www.researchgate.net/profile/Ivan_Calandra https://publons.com/author/705639/ Le 24/03/2016 11:30, Burhan ul haq a ?crit :> Hi, > > 1. I have scraped some data from the web, subset shown below > >> dput(temp.data) > c("Armenia", "Armenia", "43827", "39200", "35700", "36700", "39341", > "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0", > "0", "0", "0", "0", "Austria", "Austria", "135417", "166200", > "144500", "147300", "163211", "162536", "155412", "133667", "134962", > "146440", "131188", "100001", "100000", "80000", "35000") > > 2. The corresponding list of countries, is as follows > >> dput(raw.country) > c("Armenia", "Austria", "Belarus", "Belgium", "Brazil", "Bulgaria", > "Canada", "Castile-Leon (Hiszania)", "Catalonia", "Chile", "Colombia", > "Costarica", "Croatia", "Cyprus", "Czech Republic", "Ecuador", > "Estonia", "Finland", "France", "Georgia", "Germany", "Ghana", > "Greece", "Hungary", "Indonesia", "Iran", "Ireland", "Israel", > "Italy", "Kazakhstan", "Kyrgyzstan", "Latvia", "Lithuania", "Macedonia", > "Malaysia", "Mexico", "Moldova", "Mongolia", "Netherland", "Norway", > "Pakistan", "Panama", "Paraguay", "Peru", "Poland", "Portugal", > "Puertorico", "Romania", "Russia", "Serbia", "Slovakia", "Slovenia", > "Spain", "Sweden", "Switzerland", "Tunisia", "Ukraine", "United Kingdom", > "USA", "Venezuela", "Vltava", "World Total") > > > 3. I want to organize the data into a data frame, where each row will > contain the 20 values for the corresponding country. > It needs to ignore the country name which appears twice.Something like: > > Armenia "43827", "39200", "35700", "36700", "39341", > "30571", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", " 0", > "0", "0", "0", "0", > > "Austria", "135417", "166200", > "144500", "147300", "163211", "162536", "155412", "133667", "134962", > "146440", "131188", "100001", "100000", "80000", "35000" > > and so on > > > Thanks / > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >