dear Jim, Thanks for your reply and your proposition. I forgot to provide the header of the dataframe, here it is: ===============================================================================Byte-by-byte Description of file: lvg_table2.dat -------------------------------------------------------------------------------- Bytes Format Units Label Explanations -------------------------------------------------------------------------------- 1- 18 A18 --- Name Galaxy name in well-known catalogs 20- 21 I2 h RAh Hour of Right Ascension (J2000) 22- 23 I2 min RAm Minute of Right Ascension (J2000) 24- 27 F4.1 s RAs Second of Right Ascension (J2000) 28 A1 --- DE- Sign of the Declination (J2000) 29- 30 I2 deg DEd Degree of Declination (J2000) 31- 32 I2 arcmin DEm Arcminute of Declination (J2000) 33- 34 I2 arcsec DEs Arcsecond of Declination (J2000) 36- 40 F5.2 kpc a26 ? Major linear diameter (1) 42- 43 I2 deg inc ? Inclination 45- 47 I3 km/s Vm ? Amplitude of rotational velocity (2) 49- 52 F4.2 mag AB ? Internal B band extinction (3) 54- 58 F5.1 mag BMag ? Absolute B band magnitude (4) 60- 63 F4.1 mag/arcsec2 SBB ? Average B band surface brightness (5) 65- 69 F5.2 [solLum] logKLum ? Log K_S_ band luminosity (6) 71- 75 F5.2 [solMass] logM26 ? Log mass within Holmberg radius (7) 77 A1 --- l_logMHI Limit flag on logMHI 78- 82 F5.2 [solMass] logMHI ? Log hydrogen mass (8) 84- 87 I4 km/s VLG ? Radial velocity (9) 89- 92 F4.1 --- Theta1 ? Tidal index (10) 94-116 A23 --- MD Main disturber name (11) 118-121 F4.1 --- Theta5 ? Another tidal index (12) 123-127 F5.2 [-] Thetaj ? Log K band luminosity density (13) -------------------------------------------------------------------------------- The idea for me is to select only the galaxy name and the logMHI values for these galaxies, so quite a simple job when the dataset is tidy enough. I was thinking as usual to use select from dplyr. That is why I was just asking how to read this kind of files which, for me so far, are uncommon. Doing what you propose, it formats most of the columns correctly except few ones, I will see how I can change some width to get it correctly: X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 (chr) (chr) (dbl) (int) (dbl) (dbl) (chr) (dbl) (chr) (chr) (int) (chr) (chr) (chr) (chr) (dbl) (chr) 1 UGC12894 000022.5+392944 2.78 33 21 0 -13.3 25.2 7.5 8 8.1 7 7.9 2 61 9 -1. 3 NGC7640 -1 0 0.12 2 WLM 000158.1-152740 3.25 90 22 0 -14.1 24.8 7.7 0 8.2 7 7.8 4 -1 6 0. 0 MESSIER031 0 2 1.75 3 And XVIII 000214.5+450520 0.69 17 9 0 -8.7 26.8 6.4 4 6.7 8 < 6.6 5 -4 4 0. 5 MESSIER031 0 6 1.54 4 PAndAS-03 000356.4+405319 0.10 17 NA 0 -3.6 27.8 4.3 8 NA NA NA 2. 8 MESSIER031 2 8 1.75 5 PAndAS-04 000442.9+472142 0.05 22 NA 0 -6.6 23.1 5.5 9 NA NA -10 8 2. 5 MESSIER031 2 5 1.75 6 PAndAS-05 000524.1+435535 0.06 31 NA 0 -4.5 25.6 4.7 5 NA NA 10 3 2. 8 MESSIER031 2 8 1.75 7 ESO409-015 000531.8-280553 3.00 78 23 0 -14.6 24.1 8.1 0 8.2 5 8.1 0 76 9 -2. 0 NGC0024 -1 5 -2.05 8 AGC748778 000634.4+153039 0.61 70 3 0 -10.4 24.9 6.3 9 5.7 0 6.6 4 48 6 -1. 9 NGC0253 -1 5 -2.72 9 And XX 000730.7+350756 0.20 33 5 0 -5.8 27.1 5.2 6 5.7 0 NA -18 2 2. 4 MESSIER031 2 4 1.75 Cheers, thanks again Jean-Philippe On 05/10/2017 16:49, jim holtman wrote:> start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 69, 75, 77, 82, 87, > + 92, 114, 121, 127) > > read_fwf(input, fwf_widths(diff(start)))-- Jean-Philippe Fontaine PhD Student in Astroparticle Physics, Gran Sasso Science Institute (GSSI), Viale Francesco Crispi 7, 67100 L'Aquila, Italy Mobile: +393487128593, +33615653774
Since you have an authoritative description of the format, by all means use that - not a guess based on a visual inspection of where data appears in a sample row. B.> On Oct 5, 2017, at 11:02 AM, jean-philippe <jeanphilippe.fontaine at gssi.infn.it> wrote: > > dear Jim, > > Thanks for your reply and your proposition. > > I forgot to provide the header of the dataframe, here it is: > ===============================================================================> Byte-by-byte Description of file: lvg_table2.dat > -------------------------------------------------------------------------------- > Bytes Format Units Label Explanations > -------------------------------------------------------------------------------- > 1- 18 A18 --- Name Galaxy name in well-known catalogs > 20- 21 I2 h RAh Hour of Right Ascension (J2000) > 22- 23 I2 min RAm Minute of Right Ascension (J2000) > 24- 27 F4.1 s RAs Second of Right Ascension (J2000) > 28 A1 --- DE- Sign of the Declination (J2000) > 29- 30 I2 deg DEd Degree of Declination (J2000) > 31- 32 I2 arcmin DEm Arcminute of Declination (J2000) > 33- 34 I2 arcsec DEs Arcsecond of Declination (J2000) > 36- 40 F5.2 kpc a26 ? Major linear diameter (1) > 42- 43 I2 deg inc ? Inclination > 45- 47 I3 km/s Vm ? Amplitude of rotational velocity (2) > 49- 52 F4.2 mag AB ? Internal B band extinction (3) > 54- 58 F5.1 mag BMag ? Absolute B band magnitude (4) > 60- 63 F4.1 mag/arcsec2 SBB ? Average B band surface brightness (5) > 65- 69 F5.2 [solLum] logKLum ? Log K_S_ band luminosity (6) > 71- 75 F5.2 [solMass] logM26 ? Log mass within Holmberg radius (7) > 77 A1 --- l_logMHI Limit flag on logMHI > 78- 82 F5.2 [solMass] logMHI ? Log hydrogen mass (8) > 84- 87 I4 km/s VLG ? Radial velocity (9) > 89- 92 F4.1 --- Theta1 ? Tidal index (10) > 94-116 A23 --- MD Main disturber name (11) > 118-121 F4.1 --- Theta5 ? Another tidal index (12) > 123-127 F5.2 [-] Thetaj ? Log K band luminosity density (13) > -------------------------------------------------------------------------------- > > The idea for me is to select only the galaxy name and the logMHI values for these galaxies, so quite a simple job when the dataset is tidy enough. I was thinking as usual to use select from dplyr. > That is why I was just asking how to read this kind of files which, for me so far, are uncommon. > > Doing what you propose, it formats most of the columns correctly except few ones, I will see how I can change some width to get it correctly: > > X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 > (chr) (chr) (dbl) (int) (dbl) (dbl) (chr) (dbl) (chr) (chr) (int) (chr) (chr) (chr) (chr) (dbl) (chr) > 1 UGC12894 000022.5+392944 2.78 33 21 0 -13.3 25.2 7.5 8 8.1 7 7.9 2 61 9 -1. 3 NGC7640 -1 0 0.12 > 2 WLM 000158.1-152740 3.25 90 22 0 -14.1 24.8 7.7 0 8.2 7 7.8 4 -1 6 0. 0 MESSIER031 0 2 1.75 > 3 And XVIII 000214.5+450520 0.69 17 9 0 -8.7 26.8 6.4 4 6.7 8 < 6.6 5 -4 4 0. 5 MESSIER031 0 6 1.54 > 4 PAndAS-03 000356.4+405319 0.10 17 NA 0 -3.6 27.8 4.3 8 NA NA NA 2. 8 MESSIER031 2 8 1.75 > 5 PAndAS-04 000442.9+472142 0.05 22 NA 0 -6.6 23.1 5.5 9 NA NA -10 8 2. 5 MESSIER031 2 5 1.75 > 6 PAndAS-05 000524.1+435535 0.06 31 NA 0 -4.5 25.6 4.7 5 NA NA 10 3 2. 8 MESSIER031 2 8 1.75 > 7 ESO409-015 000531.8-280553 3.00 78 23 0 -14.6 24.1 8.1 0 8.2 5 8.1 0 76 9 -2. 0 NGC0024 -1 5 -2.05 > 8 AGC748778 000634.4+153039 0.61 70 3 0 -10.4 24.9 6.3 9 5.7 0 6.6 4 48 6 -1. 9 NGC0253 -1 5 -2.72 > 9 And XX 000730.7+350756 0.20 33 5 0 -5.8 27.1 5.2 6 5.7 0 NA -18 2 2. 4 MESSIER031 2 4 1.75 > > > Cheers, thanks again > > > Jean-Philippe > On 05/10/2017 16:49, jim holtman wrote: >> start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 69, 75, 77, 82, 87, >> + 92, 114, 121, 127) >> > read_fwf(input, fwf_widths(diff(start))) > > -- > Jean-Philippe Fontaine > PhD Student in Astroparticle Physics, > Gran Sasso Science Institute (GSSI), > Viale Francesco Crispi 7, > 67100 L'Aquila, Italy > Mobile: +393487128593, +33615653774 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
You should be able to use that header information to create the correct parameters to the read_fwf function to read in the data. Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Oct 5, 2017 at 11:02 AM, jean-philippe <jeanphilippe.fontaine at gssi.infn.it> wrote:> dear Jim, > > Thanks for your reply and your proposition. > > I forgot to provide the header of the dataframe, here it is: > ===============================================================================> Byte-by-byte Description of file: lvg_table2.dat > -------------------------------------------------------------------------------- > Bytes Format Units Label Explanations > -------------------------------------------------------------------------------- > 1- 18 A18 --- Name Galaxy name in well-known catalogs > 20- 21 I2 h RAh Hour of Right Ascension (J2000) > 22- 23 I2 min RAm Minute of Right Ascension (J2000) > 24- 27 F4.1 s RAs Second of Right Ascension (J2000) > 28 A1 --- DE- Sign of the Declination (J2000) > 29- 30 I2 deg DEd Degree of Declination (J2000) > 31- 32 I2 arcmin DEm Arcminute of Declination (J2000) > 33- 34 I2 arcsec DEs Arcsecond of Declination (J2000) > 36- 40 F5.2 kpc a26 ? Major linear diameter (1) > 42- 43 I2 deg inc ? Inclination > 45- 47 I3 km/s Vm ? Amplitude of rotational velocity (2) > 49- 52 F4.2 mag AB ? Internal B band extinction (3) > 54- 58 F5.1 mag BMag ? Absolute B band magnitude (4) > 60- 63 F4.1 mag/arcsec2 SBB ? Average B band surface brightness (5) > 65- 69 F5.2 [solLum] logKLum ? Log K_S_ band luminosity (6) > 71- 75 F5.2 [solMass] logM26 ? Log mass within Holmberg radius (7) > 77 A1 --- l_logMHI Limit flag on logMHI > 78- 82 F5.2 [solMass] logMHI ? Log hydrogen mass (8) > 84- 87 I4 km/s VLG ? Radial velocity (9) > 89- 92 F4.1 --- Theta1 ? Tidal index (10) > 94-116 A23 --- MD Main disturber name (11) > 118-121 F4.1 --- Theta5 ? Another tidal index (12) > 123-127 F5.2 [-] Thetaj ? Log K band luminosity density (13) > -------------------------------------------------------------------------------- > > The idea for me is to select only the galaxy name and the logMHI values for > these galaxies, so quite a simple job when the dataset is tidy enough. I was > thinking as usual to use select from dplyr. > That is why I was just asking how to read this kind of files which, for me > so far, are uncommon. > > Doing what you propose, it formats most of the columns correctly except few > ones, I will see how I can change some width to get it correctly: > > X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 > X11 X12 X13 X14 X15 X16 X17 > (chr) (chr) (dbl) (int) (dbl) (dbl) (chr) (dbl) (chr) > (chr) (int) (chr) (chr) (chr) (chr) (dbl) (chr) > 1 UGC12894 000022.5+392944 2.78 33 21 0 -13.3 25.2 7.5 8 8.1 > 7 7.9 2 61 9 -1. 3 NGC7640 -1 0 0.12 > 2 WLM 000158.1-152740 3.25 90 22 0 -14.1 24.8 7.7 0 8.2 > 7 7.8 4 -1 6 0. 0 MESSIER031 0 2 1.75 > 3 And XVIII 000214.5+450520 0.69 17 9 0 -8.7 26.8 6.4 4 6.7 > 8 < 6.6 5 -4 4 0. 5 MESSIER031 0 6 1.54 > 4 PAndAS-03 000356.4+405319 0.10 17 NA 0 -3.6 27.8 4.3 8 > NA NA NA 2. 8 MESSIER031 2 8 1.75 > 5 PAndAS-04 000442.9+472142 0.05 22 NA 0 -6.6 23.1 5.5 9 > NA NA -10 8 2. 5 MESSIER031 2 5 1.75 > 6 PAndAS-05 000524.1+435535 0.06 31 NA 0 -4.5 25.6 4.7 5 > NA NA 10 3 2. 8 MESSIER031 2 8 1.75 > 7 ESO409-015 000531.8-280553 3.00 78 23 0 -14.6 24.1 8.1 0 8.2 > 5 8.1 0 76 9 -2. 0 NGC0024 -1 5 -2.05 > 8 AGC748778 000634.4+153039 0.61 70 3 0 -10.4 24.9 6.3 9 5.7 > 0 6.6 4 48 6 -1. 9 NGC0253 -1 5 -2.72 > 9 And XX 000730.7+350756 0.20 33 5 0 -5.8 27.1 5.2 6 5.7 > 0 NA -18 2 2. 4 MESSIER031 2 4 1.75 > > > Cheers, thanks again > > > Jean-Philippe > On 05/10/2017 16:49, jim holtman wrote: >> >> start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 69, 75, 77, 82, 87, >> + 92, 114, 121, 127) >> > read_fwf(input, fwf_widths(diff(start))) > > > -- > Jean-Philippe Fontaine > PhD Student in Astroparticle Physics, > Gran Sasso Science Institute (GSSI), > Viale Francesco Crispi 7, > 67100 L'Aquila, Italy > Mobile: +393487128593, +33615653774 >
dear Jim, Yes I fixed the problem. Thanks again all of you for your contribution! This worked : start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 70, 76, 78, 83, 88, + 93, 114, 122, 127) data1<-read_fwf("lvg_table2.txt",skip=70, fwf_widths(diff(start))) Well now I know how to deal with fixed-width files :) Cheers Jean-Philippe On 05/10/2017 18:42, jim holtman wrote:> You should be able to use that header information to create the > correct parameters to the read_fwf function to read in the data. > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > > On Thu, Oct 5, 2017 at 11:02 AM, jean-philippe > <jeanphilippe.fontaine at gssi.infn.it> wrote: >> dear Jim, >> >> Thanks for your reply and your proposition. >> >> I forgot to provide the header of the dataframe, here it is: >> ===============================================================================>> Byte-by-byte Description of file: lvg_table2.dat >> -------------------------------------------------------------------------------- >> Bytes Format Units Label Explanations >> -------------------------------------------------------------------------------- >> 1- 18 A18 --- Name Galaxy name in well-known catalogs >> 20- 21 I2 h RAh Hour of Right Ascension (J2000) >> 22- 23 I2 min RAm Minute of Right Ascension (J2000) >> 24- 27 F4.1 s RAs Second of Right Ascension (J2000) >> 28 A1 --- DE- Sign of the Declination (J2000) >> 29- 30 I2 deg DEd Degree of Declination (J2000) >> 31- 32 I2 arcmin DEm Arcminute of Declination (J2000) >> 33- 34 I2 arcsec DEs Arcsecond of Declination (J2000) >> 36- 40 F5.2 kpc a26 ? Major linear diameter (1) >> 42- 43 I2 deg inc ? Inclination >> 45- 47 I3 km/s Vm ? Amplitude of rotational velocity (2) >> 49- 52 F4.2 mag AB ? Internal B band extinction (3) >> 54- 58 F5.1 mag BMag ? Absolute B band magnitude (4) >> 60- 63 F4.1 mag/arcsec2 SBB ? Average B band surface brightness (5) >> 65- 69 F5.2 [solLum] logKLum ? Log K_S_ band luminosity (6) >> 71- 75 F5.2 [solMass] logM26 ? Log mass within Holmberg radius (7) >> 77 A1 --- l_logMHI Limit flag on logMHI >> 78- 82 F5.2 [solMass] logMHI ? Log hydrogen mass (8) >> 84- 87 I4 km/s VLG ? Radial velocity (9) >> 89- 92 F4.1 --- Theta1 ? Tidal index (10) >> 94-116 A23 --- MD Main disturber name (11) >> 118-121 F4.1 --- Theta5 ? Another tidal index (12) >> 123-127 F5.2 [-] Thetaj ? Log K band luminosity density (13) >> -------------------------------------------------------------------------------- >> >> The idea for me is to select only the galaxy name and the logMHI values for >> these galaxies, so quite a simple job when the dataset is tidy enough. I was >> thinking as usual to use select from dplyr. >> That is why I was just asking how to read this kind of files which, for me >> so far, are uncommon. >> >> Doing what you propose, it formats most of the columns correctly except few >> ones, I will see how I can change some width to get it correctly: >> >> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 >> X11 X12 X13 X14 X15 X16 X17 >> (chr) (chr) (dbl) (int) (dbl) (dbl) (chr) (dbl) (chr) >> (chr) (int) (chr) (chr) (chr) (chr) (dbl) (chr) >> 1 UGC12894 000022.5+392944 2.78 33 21 0 -13.3 25.2 7.5 8 8.1 >> 7 7.9 2 61 9 -1. 3 NGC7640 -1 0 0.12 >> 2 WLM 000158.1-152740 3.25 90 22 0 -14.1 24.8 7.7 0 8.2 >> 7 7.8 4 -1 6 0. 0 MESSIER031 0 2 1.75 >> 3 And XVIII 000214.5+450520 0.69 17 9 0 -8.7 26.8 6.4 4 6.7 >> 8 < 6.6 5 -4 4 0. 5 MESSIER031 0 6 1.54 >> 4 PAndAS-03 000356.4+405319 0.10 17 NA 0 -3.6 27.8 4.3 8 >> NA NA NA 2. 8 MESSIER031 2 8 1.75 >> 5 PAndAS-04 000442.9+472142 0.05 22 NA 0 -6.6 23.1 5.5 9 >> NA NA -10 8 2. 5 MESSIER031 2 5 1.75 >> 6 PAndAS-05 000524.1+435535 0.06 31 NA 0 -4.5 25.6 4.7 5 >> NA NA 10 3 2. 8 MESSIER031 2 8 1.75 >> 7 ESO409-015 000531.8-280553 3.00 78 23 0 -14.6 24.1 8.1 0 8.2 >> 5 8.1 0 76 9 -2. 0 NGC0024 -1 5 -2.05 >> 8 AGC748778 000634.4+153039 0.61 70 3 0 -10.4 24.9 6.3 9 5.7 >> 0 6.6 4 48 6 -1. 9 NGC0253 -1 5 -2.72 >> 9 And XX 000730.7+350756 0.20 33 5 0 -5.8 27.1 5.2 6 5.7 >> 0 NA -18 2 2. 4 MESSIER031 2 4 1.75 >> >> >> Cheers, thanks again >> >> >> Jean-Philippe >> On 05/10/2017 16:49, jim holtman wrote: >>> start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 69, 75, 77, 82, 87, >>> + 92, 114, 121, 127) >>> > read_fwf(input, fwf_widths(diff(start))) >> >> -- >> Jean-Philippe Fontaine >> PhD Student in Astroparticle Physics, >> Gran Sasso Science Institute (GSSI), >> Viale Francesco Crispi 7, >> 67100 L'Aquila, Italy >> Mobile: +393487128593, +33615653774 >>-- Jean-Philippe Fontaine PhD Student in Astroparticle Physics, Gran Sasso Science Institute (GSSI), Viale Francesco Crispi 7, 67100 L'Aquila, Italy Mobile: +393487128593, +33615653774