Pete Batard
2016-Feb-26 12:51 UTC
[syslinux] [PATCH 1/5] fat: fix minfatsize for large FAT32
Hi Ady, I won't comment on the reasons why the original computation was wrong, but thanks for the detailed analysis. On 2016.02.26 08:05, Ady via Syslinux wrote:>> Thus we can finally get a formula for Fs that satisfies the above: >> >> Fs = (To - Rs + Nf * Cs) / ((Ss * Cs / Fe) + Nf) + 1 > > I believe such formula is slightly inaccurate too. > > My resulting inequation is > > ( To - Rs ) + ( 2 * Cs ) > Fs >= __________________________ > ( Ss * Cs / Fe ) + Nf >I believe you're right. I assumed that the 2 that eventually ends up in the numerator was the Number of FATs, but looking further, and especially at Wikipedia [1], I see that it is the number of special clusters, which is fixed to 2 regardless of the number of FATs: "The first two entries in a FAT store special values: The first entry (cluster 0 in the FAT) holds the FAT ID (...) The second entry (cluster 1 in the FAT) nominally stores the end-of-cluster-chain marker (...)" So as you point out, this should be hardcoded to 2. Thanks for picking this up. Wouldn't have been a dramatic mistake, since I don't see any possibility of using a number of FATs that's anything but 2 in the context of Rufus, but we should indeed get our variables right.> instead of _always_ adding "+1" (which would be > incorrect and inefficient from the point of view of the resulting > allocatable size).I carefully considered this, and I dispute the fact that this is incorrect. The problem is we're dealing with a fraction, which will be limited by the number of bits the computer uses to store the data *and might be rounded down behind the scenes*. E.g. unless you use a crazy number of bits to store your numbers, something like (10^100 + 1) / (10^100) will produce 1 and the modulo (10^100 + 1) % (10^100) will produce 0. So if you use the modulo to figure out if you need to round up, you're going to miss some cases. And while I'd like to believe that all of "roundup", "ceiling" and friends are smart enough to know the precision they're dealing with, and compensate accordingly, or, more realistically here, that the numbers we're dealing with in this case will never be large enough to test the limits of our precision (especially our numerator and denominator are set to 64bit and our disks will never be larger than 2TB anyway, because FAT32), I'd rather not take any risks here, even more so as I am fixing code that was missing addressable sectors and the last thing I'd want is find out that, because of a dodgy rounding or a wrong assumption, we might still end up missing some sectors after all. And yeah, we could add the actual roundup in the equations themselves (I actually did that when I was trying to figure out if the original wrong computation wasn't due to the roundup), but since the "only going to be *very slightly* wasteful in an exceedingly limited set of cases" +1 is a safe and sure way to address the rounding "issue", I have to admit that I'm not that interested in trying to come up with a more mathematically satisfying solution. Working and relatively efficient code is what I am after, and as far as I'm concerned, using +1 is both a "correct" and "efficient" way of achieving that. Still, I won't prevent you (or anybody else interested) to provide a proper formula if you want. ;) Regards, /Pete [1] https://en.wikipedia.org/wiki/Design_of_the_FAT_file_system#Special_entries
> > instead of _always_ adding "+1" (which would be > > incorrect and inefficient from the point of view of the resulting > > allocatable size). > > I carefully considered this, and I dispute the fact that this is incorrect. >(snip)> Still, I won't prevent you (or anybody else interested) to provide a > proper formula if you want. ;) > > Regards, > > /Pete >You know I am not a developer, so I cannot comment about the difficulties of achieving a more accurate calculation using computer code. As I mentioned in my previous email, I am not saying that some / any code is "wrong"; I am only presenting the math as accurate as I can. Regarding the "+1", when talking about the math (not the computer code to achieve a result as accurate as it can be), I already expressed the difference between using math operators such as "integer", "quotient", "modulo", "remainder"... as opposed to just using "+1". Clearly, if the result is already an integer (in the math, not in whichever computer code), then adding "+1" to that value is incorrect. In fact, the same goes to a fraction: adding "+1" to such value will result also in a non-integer value. Once again (sorry), I am talking math, not computer code. As per the "big" numbers and accuracy, again, I cannot comment on computer code. I do want to remind you that, besides the "generic" determination of an "adequate" minfatsize in Syslinux's code, we should at least test the values that are around or close to the limits between FAT12/16/32. For simplicity, I'll give basic examples (which I have not truly analyzed; developers could / should test relevant code). Example A: Volume_Total_Sectors: 66'463 FAT type: FAT32 Bytes_per_sector: 512 Sectors_per_Cluster: 1 Sectors_per_FAT: 511 ---> this value should _not_ be accepted as valid minfatsize; 512 should be OK. Example B: Volume_Total_Sectors: 4'187'104 FAT type: FAT32 Bytes_per_sector: 512 Sectors_per_Cluster: 64 Sectors_per_FAT: 511 ---> this value should _not_ be accepted as valid minfatsize; 512 should be OK. I have not actually searched potential conflicting values; the above 2 examples are just "around" the lower size limit of FAT32. Let me put the example(s) in other words. Whichever calculation of minfatsize you use, values such as a Volume_Total_Sectors around: 65'518 ... 66'463 should be tested (among others; I could provide more ranges for testing). Such values are close to the lower limit of FAT32, and, depending on additional fields, they can be problematic (and at least some of them should be rejected). I am not saying that these are typical values for FAT32 volumes that users would see / use in real life (but they might); this range is just a source for testing either the acceptance or rejection of a FAT32 volume (in Syslinux's code). BTW, please note that these values are not really "big" (although, for computer code there might still be an "accuracy" problem; I wouldn't know). I have not provided more details (nor performed actual analysis) regarding these values (or some others), because at this time I fear I could be just wasting my time in math (since I am not a developer that can actually translate it into useful code). If there would be any practical development in the code, I am willing to invest my time too (but I would like to know it, before I decide to use my time in such endeavour :). At any rate, I certainly appreciate the people that already dedicated the time, knowledge, resources, skills... in this email thread and in this set of patches. Regards, Ady.
Pete Batard
2016-Feb-26 16:37 UTC
[syslinux] [PATCH 1/5] fat: fix minfatsize for large FAT32
On 2016.02.26 15:32, Ady via Syslinux wrote:> Regarding the "+1", when talking about the math (not the computer code > to achieve a result as accurate as it can be)Well, sorry, but I will not dissociate the context of application from the formula itself. I thought this was implied into what I wrote, which was in the context of fixing a computation algorithm bug. The sole interest I (and I posit others should) have in the formula is how it applies to computer code, as we are not trying to write a mathematical proof in the absolute here, but instead code that meets the requirements of using all potentially addressable sectors. *This* is the only context of correctness that should matter. Anything else, as interesting as it may be, is sidetracking and not something I personally want to participate in.> we should > at least test the values that are around or close to the limits between > FAT12/16/32.There I'm going to reduce the context even further: In the context where this algorithm is applied in Rufus, we don't care about non FAT32 computations, or ones performed for disks that are less than 32 GB in size (for which the sectors per cluster will never be 1 even with 4K sectors as our minimal cluster size is 8K as soon as we reach 31GB). That's not to say that I wouldn't pick on improvements to the new computation algorithm [1] when low values are used, if such improvements are actually needed, and if *somebody else* kindly wants to figure them out, as there is always the possibility that other developers might re-use the code from Rufus and try to apply it to <31GB drives. But, in the context I am concerned with, I don't see any potential issue on the horizon from not performing that homework. So, there again, I will simply say thanks for bringing attention to these matters, while at the same time mention that I'm simply gonna pass on those, since they appear irrelevant with regards to the context that matters in my application... Regards, /Pete [1] https://github.com/pbatard/rufus/blob/b9caf8b6058de12bf028f907471561a6aa50f7e9/src/format.c#L349-L366