Hi all. I've got a bit of a puzzle and I was wondering if anyone has any insight as to what's going on. I added some code to ssh's known_hosts handling that checks if the last byte in the file is a newline, and if not, it adds one before writing the new record. I also wrote a regression test for this and in most cases this works fine. On some platforms (Solaris, OpenIndiana and AIX) however, the test fails because it adds two newlines instead of the expected one. Basically if I fseek to the end, read a byte and write a byte the first byte will be duplicated. I reduced it to this test case: #include <stdio.h> int main(void) { FILE *f = fopen("testfile", "w"); putc('A', f); fclose(f); f = fopen("testfile", "a+"); /* same behaviour for r+ */ fseek(f, -1L, SEEK_END); printf("c=%d\n", fgetc(f)); /* fseek(f, 0, SEEK_END); -- with this it behaves as expected */ /* fflush(f); -- this too */ fputc('B', f); } $ gcc test.c && ./a.out; od -x -c testfile c=65 0000000 4141 0042 A A B 0000003 I wrote two bytes and read one but somehow ended up with three bytes in the file? (This example is from Solaris 11 but AIX does the same thing). On most platforms this behaves as expected: $ cc test.c && ./a.out; od -x -c testfile c=65 0000000 4241 A B 0000002 Now I could just add the fseek, but as far as I can tell I shouldn't have to, and I don't understand why. I've read the specs and the man pages and haven't found anything that would explain this behaviour. I'm curious if anyone has a) any insight as to what's going on, or b) any additional examples of where it fails? Thanks. -- Darren Tucker (dtucker at dtucker.net) GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new) Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Darren Tucker wrote this message on Fri, Feb 17, 2023 at 19:07 +1100:> I've got a bit of a puzzle and I was wondering if anyone has any insight > as to what's going on. > > I added some code to ssh's known_hosts handling that checks if the last > byte in the file is a newline, and if not, it adds one before writing > the new record. I also wrote a regression test for this and in most > cases this works fine. > > On some platforms (Solaris, OpenIndiana and AIX) however, the test fails > because it adds two newlines instead of the expected one. Basically if > I fseek to the end, read a byte and write a byte the first byte will > be duplicated. I reduced it to this test case: > > #include <stdio.h> > int main(void) > { > FILE *f = fopen("testfile", "w"); > putc('A', f); > fclose(f); > > f = fopen("testfile", "a+"); /* same behaviour for r+ */ > fseek(f, -1L, SEEK_END); > printf("c=%d\n", fgetc(f)); > /* fseek(f, 0, SEEK_END); -- with this it behaves as expected */ > /* fflush(f); -- this too */ > fputc('B', f); > } > > $ gcc test.c && ./a.out; od -x -c testfile > c=65 > 0000000 4141 0042 > A A B > 0000003 > > I wrote two bytes and read one but somehow ended up with three bytes in > the file? (This example is from Solaris 11 but AIX does the same thing). > > On most platforms this behaves as expected: > > $ cc test.c && ./a.out; od -x -c testfile > c=65 > 0000000 4241 > A B > 0000002 > > Now I could just add the fseek, but as far as I can tell I shouldn't > have to, and I don't understand why. I've read the specs and the man > pages and haven't found anything that would explain this behaviour. > > I'm curious if anyone has a) any insight as to what's going on, or b) > any additional examples of where it fails?Can you check the return value of fseek to see if it's non-zero? It should be zero if successful, but likely non-zero on the failing systems. I decided to check the C standard for fseek's behavior, and it looks like SEEK_END is not specified on text streams, only binary streams (at least in C99 and C2X). For text streams, this is what it says is supported (C99 7.19.9.2): For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET. Does it work if you open the file in binary mode? a+b or r+b? -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Hi, On Fri, Feb 17, 2023 at 07:07:12PM +1100, Darren Tucker wrote:> On some platforms (Solaris, OpenIndiana and AIX) however, the test fails > because it adds two newlines instead of the expected one. Basically if > I fseek to the end, read a byte and write a byte the first byte will > be duplicated. I reduced it to this test case:Having read the thread, the only explanation I can come up is that mixing fgetc()/fputc() will somehow confuse the stdio buffering. Googling suggests that this might be a "SysV" thing (Wikipedia lists "four System V variants: IBM's AIX, ... HP-UX, and Oracle's Solaris, plus ... illumos"). Of course this made me curious so I ran your (latest) test program on an ancient SCO OpenServer 5 system I have access to, which is SysVR3 with some R4 bits. No gcc there, but "stdio surprises" are more a libc thing anyway. And lo and behold... bigbox:/usr/gert$ cc fgetc-test.c bigbox:/usr/gert$ ./a.out fputc A=65 0000000 0041 A 0000001 fseek=0 c=65 fputc B = 66 bigbox:/usr/gert$ od -x -c testfile 0000000 4141 0042 A A B 0000003 Adding a fflush(f) call after the fgetc(f) will fix the problem the same way as you reported for Solaris/AIX. bigbox:/usr/gert$ od -x -c testfile 0000000 4241 A B 0000002 ... most interesting. gert -- "If was one thing all people took for granted, was conviction that if you feed honest figures into a computer, honest figures come out. Never doubted it myself till I met a computer with a sense of humor." Robert A. Heinlein, The Moon is a Harsh Mistress Gert Doering - Munich, Germany gert at greenie.muc.de