Hi, I'm seeing regular tdb corruption; typical log messages are: tdb(/var/db/samba/sessionid.tdb): tdb_rec_read bad magic 0x42424242 at offset=672032 tdb(/var/db/samba/connections.tdb): tdb_rec_read bad magic 0x0 at offset=1111638594 tdb(/var/db/samba/locking.tdb): tdb_rec_read bad magic 0x42424242 at offset=1034396 which then prevents fileserving from working properly (N.B. the bad magic is not limited to those three tdbs). At the moment I'm running Samba 3.6.6 on FreeBSD 9.0, but I've seen exactly the same behaviour with 3.6.9 and 3.6.13, and also the same behaviour on FreeBSD 9.1 as well. I also currently have the tdb-1.2.9,1 FreeBSD port installed at present, but have seen the same problem with tdb-1.2.11,1 I found a few forum posts that suggested setting "use mmap=no" - I have tried that, but saw no change in behaviour. Restarting samba invariably clears the problem for a while: sometimes it's just a few hours before we get further bad magic messages, sometimes it's continued working fine for ~10 days or so, and pretty much everything in between. There is no obvious pattern of which tdbs are corrupting; I've seen pretty much all of them become corrupt over the last couple of months. The server has multiple IP addresses which samba listens on; first of all we just start smbd with [global] include = /data/config/samba/servers/%i and I've attached the result from running testparm on one of those included files. It's very very slightly redacted to hide IP addresses and group names. We have another similarly-configured server (FreeBSD 9.0, Samba 3.6.6) with the same pattern of "include a config file dependent on the IP address the client connects to", and that has been running smoothly with no problems at all for over a year. I don't think (but have not absolutely conclusively ruled out) that it's a hardware problem on the server itself; the samba service (and the associated IP addresses) is managed by heartbeat, so I've tried running samba on the two nominally identical servers in the HA cluster - I see the same problematic behaviour on both nodes. I've also attached the output of "smbd -b", in case that is informative. I'm kind of running out of ideas of what to try next; any and all advice will thus be gratefully received! It's been especially hard to diagnose because the corruption happens seemingly at random, and I've not been able to identify a definite action that leads to the errors. (Also, because it's a production server, I'm not keen to try to deliberately provoke errors..) Adam -------------- next part -------------- Build environment: Built by: root at freebsd1.ch.private.cam.ac.uk Built on: Fri May 3 14:57:49 BST 2013 Built using: cc Build host: FreeBSD freebsd1.ch.private.cam.ac.uk 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 root at farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 SRCDIR: /usr/ports/net/samba36/work/samba-3.6.6/source3 BUILDDIR: /usr/ports/net/samba36/work/samba-3.6.6/source3 Paths: SBINDIR: /usr/local/sbin BINDIR: /usr/local/bin SWATDIR: /usr/local/share/swat CONFIGFILE: /usr/local/etc/smb.conf LOGFILEBASE: /var/log/samba LMHOSTSFILE: /usr/local/etc/lmhosts LIBDIR: /usr/local/lib MODULESDIR: /usr/local/lib/samba SHLIBEXT: so LOCKDIR: /var/db/samba STATEDIR: /var/db/samba CACHEDIR: /var/db/samba PIDDIR: /var/run/samba SMB_PASSWD_FILE: /usr/local/etc/samba/smbpasswd PRIVATE_DIR: /usr/local/etc/samba NCALRPCDIR: /var/run/samba/ncalrpc NMBDSOCKETDIR: /var/nmbd System Headers: HAVE_SYS_ACL_H HAVE_SYS_CAPABILITY_H HAVE_SYS_CDEFS_H HAVE_SYS_EXTATTR_H HAVE_SYS_FCNTL_H HAVE_SYS_FILE_H HAVE_SYS_FILIO_H HAVE_SYS_IOCTL_H HAVE_SYS_IPC_H HAVE_SYS_MMAN_H HAVE_SYS_MOUNT_H HAVE_SYS_PARAM_H HAVE_SYS_PRIV_H HAVE_SYS_RESOURCE_H HAVE_SYS_SELECT_H HAVE_SYS_SHM_H HAVE_SYS_SOCKET_H HAVE_SYS_SOCKIO_H HAVE_SYS_STATVFS_H HAVE_SYS_STAT_H HAVE_SYS_SYSCALL_H HAVE_SYS_SYSCTL_H HAVE_SYS_SYSLOG_H HAVE_SYS_TIME_H HAVE_SYS_TYPES_H HAVE_SYS_UIO_H HAVE_SYS_UNISTD_H HAVE_SYS_UN_H HAVE_SYS_WAIT_H Headers: HAVE_AIO_H HAVE_ARPA_INET_H HAVE_COM_ERR_H HAVE_CTYPE_H HAVE_DIRENT_H HAVE_DLFCN_H HAVE_EXECINFO_H HAVE_FCNTL_H HAVE_FLOAT_H HAVE_FNMATCH_H HAVE_FREEBSD_SUNACL_H HAVE_GLOB_H HAVE_GRP_H HAVE_GSSAPI_GSSAPI_H HAVE_GSSAPI_H HAVE_IFADDRS_H HAVE_KRB5_H HAVE_LANGINFO_H HAVE_LBER_H HAVE_LDAP_H HAVE_LIBINTL_H HAVE_LIMITS_H HAVE_LOCALE_H HAVE_MEMORY_H HAVE_NETDB_H HAVE_NETINET_IN_H HAVE_NETINET_IN_SYSTM_H HAVE_NETINET_IP_H HAVE_NETINET_TCP_H HAVE_NET_IF_H HAVE_NSSWITCH_H HAVE_NSS_H HAVE_POLL_H HAVE_PTHREAD_H HAVE_PWD_H HAVE_READLINE_HISTORY_H HAVE_READLINE_READLINE_H HAVE_RPCSVC_NIS_H HAVE_RPCSVC_YPCLNT_H HAVE_RPCSVC_YP_PROT_H HAVE_RPC_NETTYPE_H HAVE_RPC_RPC_H HAVE_SECURITY_PAM_APPL_H HAVE_SECURITY_PAM_MODULES_H HAVE_SETJMP_H HAVE_STDARG_H HAVE_STDBOOL_H HAVE_STDINT_H HAVE_STDIO_H HAVE_STDLIB_H HAVE_STRINGS_H HAVE_STRING_H HAVE_SYSLOG_H HAVE_TERMIOS_H HAVE_TIME_H HAVE_UNISTD_H HAVE_UTIME_H HAVE_ZLIB_H UTMP Options: HAVE_UTMPX_H HAVE_* Defines: HAVE_ACL_GET_PERM_NP HAVE_ADDR_TYPE_IN_KRB5_ADDRESS HAVE_AP_OPTS_USE_SUBKEY HAVE_ASPRINTF HAVE_ATEXIT HAVE_BACKTRACE_SYMBOLS HAVE_BER_SCANF HAVE_BER_SOCKBUF_ADD_IO HAVE_BLKCNT_T HAVE_BLKSIZE_T HAVE_BOOL HAVE_BZERO HAVE_C99_VSNPRINTF HAVE_CHECKSUM_IN_KRB5_CHECKSUM HAVE_CHFLAGS HAVE_CHMOD HAVE_CHOWN HAVE_CHROOT HAVE_CLOCK_GETTIME HAVE_CLOCK_MONOTONIC HAVE_CLOCK_REALTIME HAVE_COMPILER_WILL_OPTIMIZE_OUT_FNS HAVE_CONNECT HAVE_COPY_AUTHENTICATOR HAVE_CRYPT HAVE_DECL_ASPRINTF HAVE_DECL_KRB5_AUTH_CON_SET_REQ_CKSUMTYPE HAVE_DECL_KRB5_GET_CREDENTIALS_FOR_USER HAVE_DECL_RL_EVENT_HOOK HAVE_DECL_SNPRINTF HAVE_DECL_VASPRINTF HAVE_DECL_VSNPRINTF HAVE_DEVICE_MAJOR_FN HAVE_DEVICE_MINOR_FN HAVE_DLCLOSE HAVE_DLERROR HAVE_DLOPEN HAVE_DLSYM HAVE_DPRINTF HAVE_DUP2 HAVE_ENCTYPE_ARCFOUR_HMAC HAVE_ENCTYPE_ARCFOUR_HMAC_MD5 HAVE_ENDNETGRENT HAVE_ENDNETGRENT_PROTOTYPE HAVE_ERRNO_DECL HAVE_ETYPE_IN_ENCRYPTEDDATA HAVE_EXECL HAVE_EXPLICIT_LARGEFILE_SUPPORT HAVE_EXTATTR_DELETE_FD HAVE_EXTATTR_DELETE_FILE HAVE_EXTATTR_DELETE_LINK HAVE_EXTATTR_GET_FD HAVE_EXTATTR_GET_FILE HAVE_EXTATTR_GET_LINK HAVE_EXTATTR_LIST_FD HAVE_EXTATTR_LIST_FILE HAVE_EXTATTR_LIST_LINK HAVE_EXTATTR_SET_FD HAVE_EXTATTR_SET_FILE HAVE_EXTATTR_SET_LINK HAVE_E_DATA_POINTER_IN_KRB5_ERROR HAVE_FCHMOD HAVE_FCHOWN HAVE_FCNTL_LOCK HAVE_FDOPENDIR HAVE_FREEADDRINFO HAVE_FREEIFADDRS HAVE_FREE_AP_REQ HAVE_FRSIZE HAVE_FSEEKO HAVE_FSID_INT HAVE_FSYNC HAVE_FTRUNCATE HAVE_FTRUNCATE_EXTEND HAVE_FUNCTION_ATTRIBUTE_DESTRUCTOR HAVE_FUNCTION_MACRO HAVE_FUTIMES HAVE_GAI_STRERROR HAVE_GETADDRINFO HAVE_GETCWD HAVE_GETDENTS HAVE_GETDIRENTRIES HAVE_GETGRENT HAVE_GETGRENT_R HAVE_GETGRENT_R_DECL HAVE_GETGRGID_R HAVE_GETGRNAM HAVE_GETGRNAM_R HAVE_GETGROUPLIST HAVE_GETHOSTBYNAME HAVE_GETIFADDRS HAVE_GETNAMEINFO HAVE_GETNETGRENT HAVE_GETNETGRENT_PROTOTYPE HAVE_GETPAGESIZE HAVE_GETPEEREID HAVE_GETPGRP HAVE_GETPWENT_R HAVE_GETPWENT_R_DECL HAVE_GETPWNAM_R HAVE_GETPWUID_R HAVE_GETRLIMIT HAVE_GETTIMEOFDAY_TZ HAVE_GLOB HAVE_GRANTPT HAVE_GSSAPI HAVE_GSS_DISPLAY_STATUS HAVE_HISTORY_LIST HAVE_HSTRERROR HAVE_ICONV HAVE_IFACE_GETIFADDRS HAVE_IF_NAMETOINDEX HAVE_IMMEDIATE_STRUCTURES HAVE_INET_ATON HAVE_INET_NTOA HAVE_INET_NTOP HAVE_INET_PTON HAVE_INITGROUPS HAVE_INITIALIZE_KRB5_ERROR_TABLE HAVE_INNETGR HAVE_INTPTR_T HAVE_ISATTY HAVE_KRB5 HAVE_KRB5_ADDRESSES HAVE_KRB5_AUTH_CON_SETKEY HAVE_KRB5_CRYPTO HAVE_KRB5_CRYPTO_DESTROY HAVE_KRB5_CRYPTO_INIT HAVE_KRB5_C_ENCTYPE_COMPARE HAVE_KRB5_C_VERIFY_CHECKSUM HAVE_KRB5_DECODE_AP_REQ HAVE_KRB5_DEPRECATED_WITH_IDENTIFIER HAVE_KRB5_ENCTYPES_COMPATIBLE_KEYS HAVE_KRB5_ENCTYPE_TO_STRING HAVE_KRB5_ENCTYPE_TO_STRING_WITH_KRB5_CONTEXT_ARG HAVE_KRB5_FREE_DATA_CONTENTS HAVE_KRB5_FREE_ERROR_CONTENTS HAVE_KRB5_FREE_HOST_REALM HAVE_KRB5_FWD_TGT_CREDS HAVE_KRB5_GET_CREDS HAVE_KRB5_GET_CREDS_OPT_ALLOC HAVE_KRB5_GET_CREDS_OPT_SET_IMPERSONATE HAVE_KRB5_GET_DEFAULT_IN_TKT_ETYPES HAVE_KRB5_GET_HOST_REALM HAVE_KRB5_GET_INIT_CREDS_OPT_ALLOC HAVE_KRB5_GET_INIT_CREDS_OPT_FREE HAVE_KRB5_GET_INIT_CREDS_OPT_GET_ERROR HAVE_KRB5_GET_INIT_CREDS_OPT_SET_PAC_REQUEST HAVE_KRB5_GET_KDC_CRED HAVE_KRB5_GET_PW_SALT HAVE_KRB5_GET_RENEWED_CREDS HAVE_KRB5_KEYBLOCK_KEYVALUE HAVE_KRB5_KEYTAB_ENTRY_KEYBLOCK HAVE_KRB5_KRBHST_GET_ADDRINFO HAVE_KRB5_KRBHST_INIT HAVE_KRB5_KT_COMPARE HAVE_KRB5_KT_FREE_ENTRY HAVE_KRB5_KU_OTHER_CKSUM HAVE_KRB5_MK_REQ_EXTENDED HAVE_KRB5_PRINCIPAL_COMPARE_ANY_REALM HAVE_KRB5_PRINCIPAL_GET_COMP_STRING HAVE_KRB5_PRINCIPAL_GET_REALM HAVE_KRB5_REALM_TYPE HAVE_KRB5_SESSION_IN_CREDS HAVE_KRB5_SET_DEFAULT_IN_TKT_ETYPES HAVE_KRB5_SET_REAL_TIME HAVE_KRB5_STRING_TO_KEY HAVE_KRB5_STRING_TO_KEY_SALT HAVE_KRB5_VERIFY_CHECKSUM HAVE_LBER_LOG_PRINT_FN HAVE_LCHOWN HAVE_LDAP HAVE_LDAP_ADD_RESULT_ENTRY HAVE_LDAP_INIT HAVE_LDAP_INITIALIZE HAVE_LDAP_SASL_WRAPPING HAVE_LDAP_SET_REBIND_PROC HAVE_LIBASN1 HAVE_LIBCOM_ERR HAVE_LIBGSSAPI HAVE_LIBKRB5 HAVE_LIBLBER HAVE_LIBLDAP HAVE_LIBPAM HAVE_LIBREADLINE HAVE_LIBROKEN HAVE_LIBZ HAVE_LINK HAVE_LONGLONG HAVE_LONG_LONG HAVE_LSTAT HAVE_LUTIMES HAVE_MAKEDEV HAVE_MEMCPY HAVE_MEMMEM HAVE_MEMMOVE HAVE_MEMSET HAVE_MKDIR_MODE HAVE_MKDTEMP HAVE_MKNOD HAVE_MKTIME HAVE_MLOCK HAVE_MLOCKALL HAVE_MMAP HAVE_MSGHDR_MSG_CONTROL HAVE_MUNLOCK HAVE_MUNLOCKALL HAVE_NANOSLEEP HAVE_NATIVE_ICONV HAVE_NEW_LIBREADLINE HAVE_NL_LANGINFO HAVE_NO_AIO HAVE_PAM_GET_DATA HAVE_PAM_RHOST HAVE_PAM_TTY HAVE_PATHCONF HAVE_PIPE HAVE_POLL HAVE_POSIX_ACLS HAVE_POSIX_FALLOCATE HAVE_POSIX_MEMALIGN HAVE_PRCTL HAVE_PREAD HAVE_PRINTF HAVE_PTHREAD HAVE_PTRDIFF_T HAVE_PWRITE HAVE_RAND HAVE_RANDOM HAVE_READLINK HAVE_REALPATH HAVE_RENAME HAVE_ROKEN_GETADDRINFO_HOSTSPEC HAVE_SA_FAMILY_T HAVE_SECURE_MKSTEMP HAVE_SELECT HAVE_SENDFILE HAVE_SETBUFFER HAVE_SETEGID HAVE_SETENV HAVE_SETENV_DECL HAVE_SETEUID HAVE_SETGROUPS HAVE_SETLINEBUF HAVE_SETLOCALE HAVE_SETNETGRENT HAVE_SETNETGRENT_PROTOTYPE HAVE_SETPGID HAVE_SETRESGID HAVE_SETRESGID_DECL HAVE_SETRESUID HAVE_SETRESUID_DECL HAVE_SETSID HAVE_SHMGET HAVE_SHM_OPEN HAVE_SIGACTION HAVE_SIGBLOCK HAVE_SIGPROCMASK HAVE_SIGSET HAVE_SIG_ATOMIC_T_TYPE HAVE_SNPRINTF HAVE_SOCKADDR_SA_LEN HAVE_SOCKETPAIR HAVE_SOCKLEN_T HAVE_SOCK_SIN_LEN HAVE_SRAND HAVE_SRANDOM HAVE_SS_FAMILY HAVE_STATVFS_F_FLAG HAVE_STAT_HIRES_TIMESTAMPS HAVE_STAT_ST_BLKSIZE HAVE_STAT_ST_BLOCKS HAVE_STAT_ST_FLAGS HAVE_STRCASECMP HAVE_STRCASESTR HAVE_STRCHR HAVE_STRDUP HAVE_STRERROR HAVE_STRERROR_R HAVE_STRFTIME HAVE_STRLCAT HAVE_STRLCPY HAVE_STRNDUP HAVE_STRNLEN HAVE_STRPBRK HAVE_STRSIGNAL HAVE_STRTOK_R HAVE_STRTOL HAVE_STRTOLL HAVE_STRTOQ HAVE_STRTOULL HAVE_STRTOUQ HAVE_STRUCT_ADDRINFO HAVE_STRUCT_IFADDRS HAVE_STRUCT_SIGEVENT HAVE_STRUCT_SIGEVENT_SIGEV_VALUE_SIGVAL_PTR HAVE_STRUCT_SIGEVENT_SIGEV_VALUE_SIVAL_PTR HAVE_STRUCT_SOCKADDR HAVE_STRUCT_SOCKADDR_IN6 HAVE_STRUCT_SOCKADDR_SA_LEN HAVE_STRUCT_SOCKADDR_STORAGE HAVE_STRUCT_STAT_ST_BIRTHTIME HAVE_STRUCT_STAT_ST_BIRTHTIMESPEC_TV_NSEC HAVE_STRUCT_STAT_ST_MTIMESPEC_TV_NSEC HAVE_STRUCT_STAT_ST_MTIM_TV_NSEC HAVE_STRUCT_STAT_ST_RDEV HAVE_STRUCT_TIMESPEC HAVE_ST_RDEV HAVE_SYMLINK HAVE_SYSCONF HAVE_SYSCTLBYNAME HAVE_SYSLOG HAVE_TIMEGM HAVE_UINTPTR_T HAVE_UNIXSOCKET HAVE_UNSETENV HAVE_USLEEP HAVE_UTIMBUF HAVE_UTIME HAVE_UTIMES HAVE_VASPRINTF HAVE_VA_COPY HAVE_VDPRINTF HAVE_VOLATILE HAVE_VSNPRINTF HAVE_VSYSLOG HAVE_WAIT4 HAVE_WAITPID HAVE_WRFILE_KEYTAB HAVE_YP_GET_DEFAULT_DOMAIN HAVE_ZLIBVERSION HAVE__Bool HAVE__CHDIR HAVE__CLOSE HAVE__DUP HAVE__DUP2 HAVE__ET_LIST HAVE__FCHDIR HAVE__FCNTL HAVE__FORK HAVE__FSTAT HAVE__LSTAT HAVE__OPEN HAVE__READ HAVE__STAT HAVE__VA_ARGS__MACRO HAVE__WRITE HAVE___GETCWD --with Options: WITH_ADS WITH_PAM WITH_PAM_MODULES WITH_SENDFILE WITH_WINBIND Build Options: BROKEN_GETGRNAM COMPILER_SUPPORTS_LL CONFIG_H_IS_FROM_SAMBA DEFAULT_DISPLAY_CHARSET DEFAULT_DOS_CHARSET DEFAULT_UNIX_CHARSET FREEBSD FREEBSD_SENDFILE_API KRB5_CREDS_OPT_FREE_REQUIRES_CONTEXT KRB5_PRINC_REALM_RETURNS_REALM KRB5_VERIFY_CHECKSUM_ARGS LDAP_SET_REBIND_PROC_ARGS LIBREPLACE_NETWORK_CHECKS PACKAGE_BUGREPORT PACKAGE_NAME PACKAGE_STRING PACKAGE_TARNAME PACKAGE_URL PACKAGE_VERSION PAM_GET_DATA_ARG3_CONST_VOID_PP REALPATH_TAKES_NULL REPLACE_GETPASS REPLACE_STRPTIME RETSIGTYPE SEEKDIR_RETURNS_VOID SHLIBEXT SIZEOF_CHAR SIZEOF_INT SIZEOF_LONG SIZEOF_LONG_LONG SIZEOF_OFF_T SIZEOF_SHORT SIZEOF_SIZE_T SIZEOF_SSIZE_T SIZEOF_TIME_T STAT_STATVFS STAT_ST_BLOCKSIZE STDC_HEADERS STRING_STATIC_MODULES SYSCONF_SC_NGROUPS_MAX SYSCONF_SC_NPROCESSORS_ONLN SYSCONF_SC_PAGESIZE TIME_T_MAX TIME_WITH_SYS_TIME USE_SETREUID WITH_ADS WITH_PAM WITH_PAM_MODULES WITH_SENDFILE WITH_WINBIND _GNU_SOURCE auth_script_init charset_CP437_init charset_CP850_init charset_weird_init idmap_ad_init idmap_adex_init idmap_autorid_init idmap_hash_init idmap_rid_init idmap_tdb2_init loff_t offset_t static_decl_auth static_decl_charset static_decl_gpext static_decl_idmap static_decl_nss_info static_decl_pdb static_decl_perfcount static_decl_vfs static_init_auth static_init_charset static_init_gpext static_init_idmap static_init_nss_info static_init_pdb static_init_perfcount static_init_vfs vfs_acl_tdb_init vfs_acl_xattr_init vfs_audit_init vfs_cacheprime_init vfs_cap_init vfs_catia_init vfs_commit_init vfs_crossrename_init vfs_default_quota_init vfs_dirsort_init vfs_expand_msdfs_init vfs_extd_audit_init vfs_fake_perms_init vfs_full_audit_init vfs_linux_xfs_sgid_init vfs_netatalk_init vfs_preopen_init vfs_readahead_init vfs_readonly_init vfs_recycle_init vfs_scannedonly_init vfs_shadow_copy2_init vfs_shadow_copy_init vfs_smb_traffic_analyzer_init vfs_streams_depot_init vfs_streams_xattr_init vfs_syncops_init vfs_time_audit_init vfs_xattr_tdb_init vfs_zfsacl_init Type sizes: sizeof(char): 1 sizeof(int): 4 sizeof(long): 8 sizeof(long long): 8 sizeof(uint8): 1 sizeof(uint16): 2 sizeof(uint32): 4 sizeof(short): 2 sizeof(void*): 8 sizeof(size_t): 8 sizeof(off_t): 8 sizeof(ino_t): 4 sizeof(dev_t): 4 Builtin modules: pdb_ldap pdb_smbpasswd pdb_tdbsam pdb_wbc_sam idmap_ldap idmap_tdb idmap_passdb idmap_nss nss_info_template auth_sam auth_unix auth_winbind auth_wbc auth_server auth_domain auth_builtin vfs_default vfs_posixacl
> [global] > include = /data/config/samba/servers/%i > > and I've attached the result from running testparm on one of those > included files...that attachment seemed be mislaid, so it's attached here instead! Adam
On 05/14/2013 05:59 AM, Adam Thorn wrote:> Hi, > > I'm seeing regular tdb corruption; typical log messages are: > > tdb(/var/db/samba/sessionid.tdb): tdb_rec_read bad magic 0x42424242 at > offset=672032 > > tdb(/var/db/samba/connections.tdb): tdb_rec_read bad magic 0x0 at > offset=1111638594 > > tdb(/var/db/samba/locking.tdb): tdb_rec_read bad magic 0x42424242 at > offset=1034396 > > which then prevents fileserving from working properly (N.B. the bad > magic is not limited to those three tdbs). At the moment I'm running > Samba 3.6.6 on FreeBSD 9.0, but I've seen exactly the same behaviour > with 3.6.9 and 3.6.13, and also the same behaviour on FreeBSD 9.1 as > well. I also currently have the tdb-1.2.9,1 FreeBSD port installed at > present, but have seen the same problem with tdb-1.2.11,1 > > I found a few forum posts that suggested setting "use mmap=no" - I have > tried that, but saw no change in behaviour. > > Restarting samba invariably clears the problem for a while: sometimes > it's just a few hours before we get further bad magic messages, > sometimes it's continued working fine for ~10 days or so, and pretty > much everything in between. There is no obvious pattern of which tdbs > are corrupting; I've seen pretty much all of them become corrupt over > the last couple of months. > > The server has multiple IP addresses which samba listens on; first of > all we just start smbd with > > [global] > include = /data/config/samba/servers/%i > > and I've attached the result from running testparm on one of those > included files. It's very very slightly redacted to hide IP addresses > and group names. We have another similarly-configured server (FreeBSD > 9.0, Samba 3.6.6) with the same pattern of "include a config file > dependent on the IP address the client connects to", and that has been > running smoothly with no problems at all for over a year. > > I don't think (but have not absolutely conclusively ruled out) that it's > a hardware problem on the server itself; the samba service (and the > associated IP addresses) is managed by heartbeat, so I've tried running > samba on the two nominally identical servers in the HA cluster - I see > the same problematic behaviour on both nodes. > > I've also attached the output of "smbd -b", in case that is informative. > > I'm kind of running out of ideas of what to try next; any and all advice > will thus be gratefully received! It's been especially hard to diagnose > because the corruption happens seemingly at random, and I've not been > able to identify a definite action that leads to the errors. (Also, > because it's a production server, I'm not keen to try to deliberately > provoke errors..) > > Adam > >What type of filesystem are you using? Do you have barriers enabled? I know in Linux that you should set barrier=1 on the ext3/ext4 filesystem in order to prevent corruption of sam.ldb in cases of power loss.
On Tue, 2013-05-14 at 10:59 +0100, Adam Thorn wrote:> Hi, > > I'm seeing regular tdb corruption; typical log messages are: > > tdb(/var/db/samba/sessionid.tdb): tdb_rec_read bad magic 0x42424242 at > offset=672032 > > tdb(/var/db/samba/connections.tdb): tdb_rec_read bad magic 0x0 at > offset=1111638594 > > tdb(/var/db/samba/locking.tdb): tdb_rec_read bad magic 0x42424242 at > offset=1034396 > > which then prevents fileserving from working properly (N.B. the bad > magic is not limited to those three tdbs). At the moment I'm running > Samba 3.6.6 on FreeBSD 9.0, but I've seen exactly the same behaviour > with 3.6.9 and 3.6.13, and also the same behaviour on FreeBSD 9.1 as > well. I also currently have the tdb-1.2.9,1 FreeBSD port installed at > present, but have seen the same problem with tdb-1.2.11,1> I don't think (but have not absolutely conclusively ruled out) that it's > a hardware problem on the server itself; the samba service (and the > associated IP addresses) is managed by heartbeat, so I've tried running > samba on the two nominally identical servers in the HA cluster - I see > the same problematic behaviour on both nodes.Can you please clarify: - Is the filesystem on this disk in any way shared? - Is the block device involved in any way shared? - Has the server ever had a unexpected poweroff? - Do Samba processes ever crash? If the answer is no to all these, then I would strongly suspect a hardware or OS/kernel issue. TDB is in general a very reliable database, and these issues are certainly unexpected. Similarly, if the answer is no unexpected poweroff, then the barrier=1 thing is a red herring, that is about data safety on unexpected poweroff. Could you put your TDB files on a different file system, to rule in our out ZFS (or the glue between FreeBSD and ZFS)? Thanks, Andrew Bartlett -- Andrew Bartlett http://samba.org/~abartlet/ Authentication Developer, Samba Team http://samba.org
Possibly Parallel Threads
- Problems with TDBs on CTDB-managed Samba instance
- Issue with Samba version 3.0.25b on Debian
- REPOST: Samba 2.2.5 under Debian 3.0 i386: Still frequent connections.tdb corruption
- Problems with TDBs on CTDB-managed Samba instance
- nmbd Hogging CPU on Mac OSX (10.5.4)