> On Jul 10, 2015 5:29 PM, "poma via Syslinux" <syslinux at zytor.com> wrote: > > > The same as with the ISOLINUX, stable and git. > > Only this time has nothing to do with the menu. > > 1) EXTLINUX is no longer a discrete variant. The installer extlinux now > installs SYSLINUX. > 2) William Kensington already saw a similar behavior wherein an ISOLINUX > image failed before parsing the config. > 3) It feels like this is a moving target where gcc keeps changing and > different results get reported. > 4) We'll likely need to make code akin to the old tracers (on screen or > serial) to log progress. > > --GeneThere are at least 7 source files in Syslinux 6.03 including an expression similar to: >= ' ' and there would be many more files in the list if we were to modify or relax the search criteria by just one character. Among those files (just a sample of them), there are: com32/elflink/ldlinux/cli.c com32/cmenu/libmenu/tui.c gnu-efi/gnu-efi-3.0/lib/console.c memdisk/inflate.c and: com32/elflink/ldlinux/readconfig.c com32/lib/sys/argv.c com32/rosh/rosh.c core/dmi.c core/fs/pxe/urlparse.c There are, of course, more. Not all of them might be relevant to this gcc5 compatibility issue, but they might be candidates for review / improvements (or, in other words, potential source of troubles related to the matter in question). If the Syslinux source code can be made more compatible with gcc v.5+, I think it should, specially if there are no regressions introduced by such potential changes. Having said that, let's not forget that the range of inadequate behaviors are triggered by using gcc v. 5+, and not seen in every single hardware / firmware. This doesn't mean that gcc and/or the firmware are the problem with certainty; just that those can be / are the triggers (which is a positive thing, if we get to see code improvements). Just as Syslinux should try to "workaround" a buggy BIOS issue whenever possible, and it should also "work nicely" with gcc 5+ if possible (or "as nicely as possible"), I think that gcc should also try to "work nicely" with others, and that users should check for firmware ("BIOS" / "UEFI") updates and report their problems to relevant manufacturers. Sure, most of that is not in our hands, but I hope we don't expect "everything" from Syslinux (sometimes without even a clear feedback / report, just with complains and demands). Users should report to the hardware / firmware manufacturers, update their firmware versions, and (IMHO) it would be beneficial if gcc's devs try to help with some tips / clues, or even some workaround parameter (considering that gcc 5+ seems to be triggering the issue whereas prior versions are not). One thing that Syslinux could do is to improve debugging reports / logs / methods. As examples (not necessarily related to the current troubleshooting issue only), output dots every X bytes during kernel's load / hand over, and introducing a "standardized" way for c32 files (plus "ldlinux.*") to include (although, not necessarily "report") the exact version (+ building date + commit + origin + debug_degree_flag +...). Finally, a small reminder. Since the issue is only present on specific hardware / firmware, whatever might seem to "solve" the behavior in one system might not be really a solution for others. "This/that works/fails for me" might be a useful report, rather than "this change works for me so this is the solution that Syslinux must implement right now for everyone". Let's be as helpful and as thorough as possible. Regards, Ady.> _______________________________________________ > Syslinux mailing list > Submissions to Syslinux at zytor.com > Unsubscribe or set options at: > http://www.zytor.com/mailman/listinfo/syslinux >
Hi, Gene Cumm wrote:> > 3) It feels like this is a moving target where gcc keeps changing and > > different results get reported.Do we have indications that different versions of gcc5 cause different behavior on the same build and boot machines ? Ady wrote:> Since the issue is only present on specific > hardware / firmware, whatever might seem to "solve" the behavior in one > system might not be really a solution for others.One should at least determine whether the problem is with the machines which build SYSLINUX or with those which try to boot it. (E.g. is a ISO with gcc5-built ISOLINUX bootable on some machines and not on others ? Does one machine build good ISOs with gcc5 and the other produces bad ones ?) My best theory is still that SYSLINUX would have some C code parts which allow the compilers to produce non-equivalent machine code. I see complaints in the output of cppcheck 1.67, which could match this criterium. (Maybe one should subscribe SYSLINUX to Coverty Scan ?) ----------------------------------------------------------------- [com32/gpllib/dmi/dmi_memory.c:205]: (error) Undefined behavior: Variable 'type' is used as parameter and destination in s[n]printf(). [com32/gpllib/dmi/dmi_memory.c:217]: (error) Undefined behavior: Variable 'connection' is used as parameter and destination in s[n]printf(). [com32/gpllib/dmi/dmi_memory.c:247]: (error) Undefined behavior: Variable 'size' is used as parameter and destination in s[n]printf(). [com32/gpllib/dmi/dmi_memory.c:249]: (error) Undefined behavior: Variable 'size' is used as parameter and destination in s[n]printf(). The complaint says it all: It is an invitation for trouble to use the same memory as destination and source of a copy operation. ----------------------------------------------------------------- [com32/gpllib/dmi/dmi.c:322] -> [com32/gpllib/dmi/dmi.c:321]: (warning) Array 'type[13]' accessed at index 13, which is out of bounds. Otherwise condition 'code<=13' is redundant. is caused by missing subtraction of 1 in strlcpy(dmi->base_board.type, type[code], It should be strlcpy(dmi->base_board.type, type[code - 1], ----------------------------------------------------------------- [com32/gpllib/dmi/dmi_processor.c:381] -> [com32/gpllib/dmi/dmi_processor.c:380]: (warning) Array 'upgrade[23]' accessed at index 24, which is out of bounds. Otherwise condition 'code<=25' is redundant. is caused by missing commas in lines "Socket 939" /* 0x12 */ "Socket LGA1366" /* 0x19 */ ----------------------------------------------------------------- [com32/hdt/hdt-cli.c:609]: (error) Possible null pointer dereference: argv Hard to say whether this is really wrong code. It depends on the two while-loops in parse_command_line() whether the second allocates as many items as the the first one predicted. It is very optimistic about success of malloc(). ----------------------------------------------------------------- [com32/hdt/hdt-cli.c:90]: (error) Memory is allocated but not initialized: new Might be a false positive. ----------------------------------------------------------------- [com32/lib/jpeg/tinyjpeg.c:314] -> [com32/lib/jpeg/tinyjpeg.c:309]: (warning) Array 'DCT[64]' accessed at index 64, which is out of bounds. Otherwise condition 'j>=64' is redundant. Probably a false positive. (What is the reason of macro "__unlikely()" which is defined as "(!!(x))" ?) ----------------------------------------------------------------- [com32/lib/syslinux/pxe_dns.c:56]: (error) Uninitialized variable: q False positive. cppcheck seems to not understand the use of the return value of sscanf(). ----------------------------------------------------------------- [com32/libupload/upload_tftp.c:138]: (error) Uninitialized struct member: tftp.srv_ip Looks like a valid complaint. But from where to get a valid tftp.srv_ip ? ----------------------------------------------------------------- [com32/modules/prdhcp.c:145]: (error) Uninitialized variable: p Possibly a false positive. It depends on what pxe_get_cached_info() does to &p. ----------------------------------------------------------------- Checking com32/rosh/rosh.c: COMMAND_LINE_SIZE... [com32/rosh/rosh.c:1128]: (error) Uninitialized variable: cmdstr Checking com32/rosh/rosh.c: FILENAME_MAX... [com32/rosh/rosh.c:559]: (error) Uninitialized variable: filestr2 ... [com32/rosh/rosh.c:600]: (error) Uninitialized variable: filestr2 Possibly non-functional compile time alternatives due to unqualified macro settings by cppcheck. ----------------------------------------------------------------- [com32/sysdump/memmap.c:50]: (error) Possible null pointer dereference: buf [com32/sysdump/memmap.c:51]: (error) Possible null pointer dereference: buf [com32/sysdump/memmap.c:52]: (error) Possible null pointer dereference: buf Probably false positive. cppcheck has problems to recognize memory allocation on demand. ----------------------------------------------------------------- [core/fs/pxe/dnsresolv.c:107]: (error) Uninitialized variable: ip Probably false positive. ----------------------------------------------------------------- [core/fs/xfs/xfs_dir2.c:588]: (error) Uninitialized variable: irec ... [core/fs/xfs/xfs_dir2.c:584]: (error) Uninitialized struct member: irec.br_blockcount Hard to say. Who understands XFS structure ? ----------------------------------------------------------------- Checking core/lwip/src/core/ipv4/igmp.c: LWIP_DEBUG;LWIP_IGMP... [core/lwip/src/core/ipv4/igmp.c:813]: (error) Possible null pointer dereference: igmp Looks like a valid complaint. Will cause SIGSEGV in case of pbuf_alloc() returning 0. Question is whether the macro combination is supposed to be functional. ----------------------------------------------------------------- Checking core/lwip/src/core/ipv4/igmp.c: LWIP_DEBUG;LWIP_IGMP... [core/lwip/src/core/ipv4/igmp.c:397]: (error) Uninitialized variable: igmp Hard to say. Possibly a non-functional macro combination. ----------------------------------------------------------------- [core/serirq.c:203]: (error) Null pointer dereference Looks like confusion of memcpy() and memset(): memcpy(SerialHead, 0x0, serial_buf_size); should probably be memset(SerialHead, 0x0, serial_buf_size); ----------------------------------------------------------------- [gpxe/src/core/iobuf.c:50]: (error) Possible null pointer dereference: iobuf ... [gpxe/src/core/iobuf.c:51]: (error) Null pointer dereference Probably a false positive. cppcheck seems not to understand __alignof__() which does not dereference *iobuf but just inspects its type. ----------------------------------------------------------------- [gpxe/src/core/uri.c:154]: (error) Possible null pointer dereference: authority Indeed the code path through "else" of /* Identify net/absolute/relative path */ if ( strncmp ( path, "//", 2 ) == 0 ) { does not set variable authority. ----------------------------------------------------------------- [gpxe/src/crypto/axtls/aes.c:251]: (error) Uninitialized variable: t1 Crypto stuff is cryptic by intention. ----------------------------------------------------------------- [gpxe/src/drivers/net/e1000/e1000_hw.c:7395]: (error) Uninitialized variable: phy_data ... [gpxe/src/drivers/net/e1000/e1000_hw.c:7557]: (error) Uninitialized variable: phy_data Might be a false positive. The code path which does not set phy_data might be separated from the code paths which use it. ----------------------------------------------------------------- [gpxe/src/util/efirom.c:68]: (error) Uninitialized variable: len A function which is intended to work by clairvoyance ? static size_t file_size ( FILE *file ) { ssize_t len; return len; } (Its quality as random number generator is mediocre, too. There's too much redundancy in stack usage.) ----------------------------------------------------------------- Checking gpxe/src/util/elf2efi.c... [gpxe/src/util/elf2efi.c:352]: (error) Uninitialized variable: data_start Checking gpxe/src/util/elf2efi.c: MDE_CPU_IA32... Checking gpxe/src/util/elf2efi.c: MDE_CPU_X64... Something new: A non-functional code branch if no macro is set. ----------------------------------------------------------------- Checking gpxe/src/util/nrv2b.c: ENCODE... [gpxe/src/util/nrv2b.c:634]: (error) Array 's.best_pos[1]' accessed at index 2, which is out of bounds. [gpxe/src/util/nrv2b.c:635]: (error) Array 's.best_pos[1]' accessed at index 2, which is out of bounds. Clearly a hardcoded wrong index. #define SWD_BEST_OFF 1 ... struct ucl_swd { ... #if defined(SWD_BEST_OFF) unsigned int best_pos[ SWD_BEST_OFF ]; #endif ... }; ... #if defined(SWD_BEST_OFF) if (s->best_pos[2] == 0) s->best_pos[2] = key + 1; #endif Shall the index be [SWD_BEST_OFF - 1] ? Shall this be done only if SWD_BEST_OFF > 2 ? ----------------------------------------------------------------- Checking linux/syslinux.c: __KLIBC__... [linux/syslinux.c:123]: (error) Uninitialized variable: loop_fd Valid complaint: /* * If DO_DIRECT_MOUNT is 0, call mount(8) * If DO_DIRECT_MOUNT is 1, call mount(2) */ #ifdef __KLIBC__ # define DO_DIRECT_MOUNT 1 #else # define DO_DIRECT_MOUNT 0 /* glibc has broken losetup ioctls */ #endif ... #if DO_DIRECT_MOUNT { if (!S_ISBLK(st.st_mode)) { /* It's file, need to mount it loopback */ ... int loop_fd; for (n = 0; loop_fd < 0; n++) { ... if (ioctl(loop_fd, LOOP_SET_FD, (void *)dev_fd)) { losetup ioctls have few chance if they get called by random decision. This would be a nice suspect if __KLIBC__ is defined. ----------------------------------------------------------------- [win/syslinux.c:506]: (error) Uninitialized variable: sdn Possibly a false positive. ----------------------------------------------------------------- Have a nice day :) Thomas
Thomas Schmitt via Syslinux <syslinux at zytor.com> writes:> My best theory is still that SYSLINUX would have some C code > parts which allow the compilers to produce non-equivalent > machine code. > > I see complaints in the output of cppcheck 1.67, which could > match this criterium.Nice list, thanks! By then way there are known errors in the library code, too, see for example the short thread starting at http://www.syslinux.org/archives/2014-July/022394.html. Unfortunately I still haven't got the heart (and time) to tackle updating the included klibc. -- Regards, Feri.
On Sat, Jul 11, 2015 at 6:11 AM, Thomas Schmitt via Syslinux <syslinux at zytor.com> wrote:> Hi, > > Gene Cumm wrote: >> > 3) It feels like this is a moving target where gcc keeps changing and >> > different results get reported. > > Do we have indications that different versions of gcc5 > cause different behavior on the same build and boot machines ?William Kennington's report is definitely different than poma's original report. poma's original report would certainly be machine-specific. poma's original report that effectively skipping the skip space code resolved the issue while later reports that seemed to indicate poma couldn't recreate the issue leads me to believe there's something else changing. William's report indicated "5.2-rc". -- -Gene