thr3ads.net - flac dev - [flac-dev] FLAC as a format for archiving non-audio (SDR) sample data? [Mar 2021]

If this information is useful, please help other people find it:
Share via:

Alistair Buxton

2021-Mar-31 15:51 UTC

[flac-dev] FLAC as a format for archiving non-audio (SDR) sample data?

Hi,

There are several projects devoted to preserving analog video media such as
laserdiscs, vhs tapes etc. These projects use raw sampling and SDR
techniques to recover higher quality versions than what is possible using
normal players and capture cards. In the process of this work we end up
with huge files of raw analog sample data. These files range from 10s to
100s of gigabytes of samples, with a typical sample rate of 25MHz to 40MHz.
Rate and format varies depending on the hardware used for capture.

We've found that FLAC compresses these much better than gzip, lzma etc,
getting 50% ratio vs 80% for general data compression algorithms, and in my
testing it seems fast enough to encode in real-time.

Currently most people are using their own ad-hoc solutions for archiving
data, but as the author of one of the tools people are using, I'd like to
make it a bit more standardized and automatic. Compatibility with existing
playback hardware is not required of course. So I have some questions about
the internals of the FLAC format, suitability, and how it can be stretched
to our needs.

Is there some way I could store a sample rate with 32 bit precision? It
seems like the sample rate doesn't actually make any difference for raw
data and people are just using 48000 as a placeholder, but it would be
really useful if the true sample rate could be stored as it is required by
the decoding tools. It does need more than 16 bits of precision.

What about other metadata? Can I store arbitrary information? I don't need
to store things like artist/title that you would expect for audio tracks. I
need to store things like the format of the capture, hardware used, number
of samples per line and number of lines per field.

I also need a way to mark sections of the file. Think of this like having a
FLAC file of a whole album, and marking in the metadata where each track
begins and ends within the file. In my case, these are the start and end of
different recordings on one VHS tape. These sections will also need their
own metadata. Basically I need to store structured data, not just flat
key=value records.

A lot of this information is not available at capture time and can only be
found by decoding the samples - which cannot be done in real-time. So I
need to be able to insert it after the decoding process, without rewriting
the whole file. I gather that it is possible to pad the metadata block to
allow it to grow later. Are there any limits on how much padding I can
insert?

Would there be any advantage to using OGA container instead of straight
FLAC files for this?

Finally, the programming language of choice for this type of work is
Python. Can you suggest a good binding that supports encode, decode,
metadata manipulation, and fast sample accurate seeking?

I'm interested to hear any thoughts you have about this.

Background information about our projects:

https://zxnet.co.uk/teletext/recovery/
https://github.com/ali1234/vhs-teletext
https://www.domesday86.com/?page_id=978
https://github.com/happycube/ld-decode/

Thanks,

-- 
Alistair Buxton
a.j.buxton at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/flac-dev/attachments/20210331/f539852c/attachment.html>

Martijn van Beurden

2021-Apr-01 13:06 UTC

head link

[flac-dev] FLAC as a format for archiving non-audio (SDR) sample data?

Hi,

Considering the sample rate, I'm pretty sure you'll have to resort to
metadata to store that information. The FLAC format can't be extended to
store such high sample rates in the frame headers. However, the normal
vorbis tags accept any key=value pair you want to use, no restrictions.

For inserting cue marks, using a cuesheet is a rather standard way to do
this for audio, however, as these use CDDA frames for location, this might
not be the best way. Fortunately, you can define your own metadata format
as an 'application' metadata block. See
https://xiph.org/flac/format.html
for more information, see METADATA_BLOCK_APPLICATION. I don't think using
an OGG container will add anything not already possible with FLAC, except
when using multiple streams, which I don't think is the case here.

For not rewriting a whole file, one can add a padding metadata block at
capture, and use this padding to write metadata afterwards.

Considering Python, there are quite a few options actually. I've been using
https://pypi.org/project/SoundFile/, but I think it is a good idea to try a
few and see which one suits your needs best. I only used it for reading
FLAC files, so I can't really comment there.

Kind regards,

Martijn van Beurden
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.xiph.org/pipermail/flac-dev/attachments/20210401/0487ada4/attachment.html>

flac dev - Mar 2021 - FLAC as a format for archiving non-audio (SDR) sample data?

[flac-dev] FLAC as a format for archiving non-audio (SDR) sample data?

[flac-dev] FLAC as a format for archiving non-audio (SDR) sample data?