thr3ads.net - flac dev - [flac-dev] C API: How to get a seektable for very long files? [Oct 2024]

If this information is useful, please help other people find it:
Share via:

Alistair Buxton

2024-Oct-15 19:26 UTC

[flac-dev] C API: How to get a seektable for very long files?

Another SDR user here. It was me who reported the bug where total samples
wraps around on overflow.

FLAC performs extremely well on SDR samples, both speed and compression
ratio. In my testing it outperforms any other free lossless codec by a
large margin, being 20% smaller and 10% faster than the next best (which
was ffv1). The problem is the metadata, and not just total samples. We also
can't put true values in the sample rate field because it doesn't have
enough bits (I have files with 35468950MHz nominal sample rate for
example), and there is no way to record that samples have been padded eg
from 10 bits to 16 bits, which seems to be very common in SDR applications.
These are just two examples off the top of my head - there are probably
more.

The problems around total samples and seek table allocation could be
alleviated by using multiple files as mentioned previously, but that
introduces a new problem: how do you know if you have the full set of
files? How do you know which file contains the nth sample? This would still
require extra metadata somewhere.

I would like to see this kind of thing put into a secondary metadata block
aimed specifically at SDR. This could be completely ignored by regular
audio players - these files are not meant to be listened to anyway. I could
probably figure out how to implement that, I even started looking into it
once, but I realised that 1. nobody would adopt it if it is just me behind
it and 2. I don't know enough to make it suitable for all use cases. So I
cannot and should not do this alone, as it would just be yet another
half-baked adhoc "standard" that causes more problems than it solves.
For
example I had not even considered the idea of using multiple files.

On the topic of seeking, it is also a problem for my specific use case. I
am interested in digital signals (teletext) hidden in analog video (CVBS
samples), and they just aren't sequential in the same way that audio and
video usually are. I need to seek to an arbitrary video line as fast as
possible, preferably in constant time. A line is usually 2048 samples but
can vary between 1000 and 3000 samples long (always fixed size within a
given file, but can vary depending on the hardware used to record). A
typical recording will have 32 lines per frame (because the picture has
already been discarded), 25 frames per second, and be up to 12 hours long 35
million lines and 70 billion samples. That would result in a ~700MB seek
table. And making the block size = 1 line also has a negative effect on
compression ratio. Finally since the files are huge I am storing them on
cheap USB HDDs which have terrible seek and read times to begin with. Note
though that very high resolution seeking is not really necessary for most
SDR uses and is mainly a quirk of the fact that teletext was one of the
earliest attempts at digital data transmission and has very small and fixed
size data packets that map directly to video lines. As such this would
probably be best handled with a custom, app-specific seek table in a
separate file, built on-demand and possibly stored on a faster disk.

-- 
Alistair Buxton
a.j.buxton at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/flac-dev/attachments/20241015/3f6533e5/attachment.htm>

Stefan Oltmanns

2024-Oct-15 23:18 UTC

head link

[flac-dev] C API: How to get a seektable for very long files?

Am 15.10.24 um 21:26 schrieb Alistair Buxton:> Another SDR user here. It was me who reported the bug where total samples
> wraps around on overflow.
That's a bug in the flac application. I think the correct behavior is
setting it to 0 if total samples > 2^36
>
> FLAC performs extremely well on SDR samples, both speed and compression
> ratio. In my testing it outperforms any other free lossless codec by a
> large margin, being 20% smaller and 10% faster than the next best (which
> was ffv1). The problem is the metadata, and not just total samples. We also
> can't put true values in the sample rate field because it doesn't
have
> enough bits (I have files with 35468950MHz nominal sample rate for
> example), and there is no way to record that samples have been padded eg
> from 10 bits to 16 bits, which seems to be very common in SDR applications.
> These are just two examples off the top of my head - there are probably
> more.
>
> The problems around total samples and seek table allocation could be
> alleviated by using multiple files as mentioned previously, but that
> introduces a new problem: how do you know if you have the full set of
> files? How do you know which file contains the nth sample? This would still
> require extra metadata somewhere.
>
> I would like to see this kind of thing put into a secondary metadata block
> aimed specifically at SDR. This could be completely ignored by regular
> audio players - these files are not meant to be listened to anyway. I could
> probably figure out how to implement that, I even started looking into it
> once, but I realised that 1. nobody would adopt it if it is just me behind
> it and 2. I don't know enough to make it suitable for all use cases. So
I
> cannot and should not do this alone, as it would just be yet another
> half-baked adhoc "standard" that causes more problems than it
solves. For
> example I had not even considered the idea of using multiple files.
A new metadata block could solve the issue. But for bit depth it is not
needed, libflac allows 4 to 32 bit per sample, with all values in between.
Total samples and sample rate are the only fields I can think of.
>
> On the topic of seeking, it is also a problem for my specific use case. I
> am interested in digital signals (teletext) hidden in analog video (CVBS
> samples), and they just aren't sequential in the same way that audio
and
> video usually are. I need to seek to an arbitrary video line as fast as
> possible, preferably in constant time. A line is usually 2048 samples but
> can vary between 1000 and 3000 samples long (always fixed size within a
> given file, but can vary depending on the hardware used to record). A
> typical recording will have 32 lines per frame (because the picture has
> already been discarded), 25 frames per second, and be up to 12 hours long
> 35 million lines and 70 billion samples. That would result in a ~700MB seek
> table. And making the block size = 1 line also has a negative effect on
> compression ratio. Finally since the files are huge I am storing them on
> cheap USB HDDs which have terrible seek and read times to begin with. Note
> though that very high resolution seeking is not really necessary for most
> SDR uses and is mainly a quirk of the fact that teletext was one of the
> earliest attempts at digital data transmission and has very small and fixed
> size data packets that map directly to video lines. As such this would
> probably be best handled with a custom, app-specific seek table in a
> separate file, built on-demand and possibly stored on a faster disk.
>
As far as I understand the format such a massive seek table is not
needed for seeking: I haven't looked into all detail in the code, but I
think it works like this:

If you want to seek to sample 1200, it looks for the seekpoints before
and after that value, let's say sample 1000 at offset 50000 and sample
2000 at offset 70000. It will then calculate the theoritical position
based on the seektable offsets: 54000 and look for frame header there.
The frame header contains the frame/sample number, so the decoder knows
if it needs to scan forward or backward from there.

I would assume that the bitrate for SDR applications is quite constant
compared to normal music, as the signal level doesn't change that much
and the modulation doesn't change (of course only if you don't have
dropouts in the singal).

Best regards
Stefan

brianw

2024-Oct-16 00:02 UTC

head link

[flac-dev] C API: How to get a seektable for very long files?

I am not an SDR user, but I think it would be a great idea to design and
register an application metadata block for FLAC that supports these needs.

The fields in this application-specific metadata block could document how many
files there are in total; which file in the sequence the current file is; what
the actual sample rate is; how many bits in each sample are valid (to a finer
degree than FLAC registers).

One thing to do is start this SDRm block with a revision number, so, if the
community adds more fields later, then supporting software will know which
fields to expect.

I don't think that extra metadata can get around limitations in total
samples or seek table fields. One topic that I never studied is how FLAC differs
between files and streams. For a file, any sizes or references need to have
limits, otherwise handling an unlimited value can be difficult to verify on all
compatible software. For a stream, though, total samples or seek information
would seem completely unusable for a stream. So, I'm wondering what a pure
FLAC stream should even do with such information.

I encourage you to put something together. I would certainly be willing to
review it on general principles. Then, you could at least start using something
and submit it for consideration as an official addition to the FLAC application
metadata standards.

Brian

On Oct 15, 2024, at 12:26 PM, Alistair Buxton wrote:> FLAC performs extremely well on SDR samples, both speed and compression
ratio. In my testing it outperforms any other free lossless codec by a large
margin, being 20% smaller and 10% faster than the next best (which was ffv1).
The problem is the metadata, and not just total samples. We also can't put
true values in the sample rate field because it doesn't have enough bits (I
have files with 35468950MHz nominal sample rate for example), and there is no
way to record that samples have been padded eg from 10 bits to 16 bits, which
seems to be very common in SDR applications. These are just two examples off the
top of my head - there are probably more.
>
> The problems around total samples and seek table allocation could be
alleviated by using multiple files as mentioned previously, but that introduces
a new problem: how do you know if you have the full set of files? How do you
know which file contains the nth sample? This would still require extra metadata
somewhere.
>
> I would like to see this kind of thing put into a secondary metadata block
aimed specifically at SDR. This could be completely ignored by regular audio
players - these files are not meant to be listened to anyway. I could probably
figure out how to implement that, I even started looking into it once, but I
realised that 1. nobody would adopt it if it is just me behind it and 2. I
don't know enough to make it suitable for all use cases. So I cannot and
should not do this alone, as it would just be yet another half-baked adhoc
"standard" that causes more problems than it solves. For example I had
not even considered the idea of using multiple files.

Seemingly Similar Threads

Search for more reasonably related threads

flac dev - Oct 2024 - C API: How to get a seektable for very long files?

[flac-dev] C API: How to get a seektable for very long files?

[flac-dev] C API: How to get a seektable for very long files?

[flac-dev] C API: How to get a seektable for very long files?

Seemingly Similar Threads