Some features of a format that could replace SEGY might include:
So now for the question – what do we have to do to move beyond the extremely frustrating SEGY standard? Is there anything out there already we should just start evangelizing? If not, what do you think a replacement format needs to look like?
Thanks for reading. I feel much better after this rant and will get back to debugging why byte number 3603 is wrong.
EDIT: After chilling out over the weekend here are some more thoughts. SEGY is complicated for a number of reasons:
So SEGY is hard to use because the seismic problem domain is an inherently difficult one to model. We are lucky to have the benefit of the wisdom of those who came up with the format.
Even so, a lot of cruft has accumulated in the format over the years. This creates a non-trivial cognitive burden that I could do without but will realistically just have to deal with.
The last point though is interesting. We already have SEGD for storing the raw field data. This format is a lot more challenging than SEGY but is also a good place to bury a lot of the inherent complexity of real world seismic data.
When people suggest HDF as a replacement I guess that (maybe) they are viewing SEGY as a 'processing' format?
Last week I was thinking of SEGY as a vehicle for sharing final seismic volumes with interpreters. I read at the weekend that 80% of SEGY headers are not populated in real world data. So, this is speculation on my part, but perhaps what we need is
To accurately, and reproducibly, load seismic data into interpretation systems we don't need much meta-data but it absolutely does need to be present. The hand-off from processors to interpreters via SEGY is where I think we could use a new approach.
The issue is that SEGY was designed for tape, so it's a single file. In this day and age, that may not be a good thing. It might be better to allow for extended 'metadata' to be housed as a separate file. Basically, the catalog record that you could review before buying the data. And you should be able to add this metadata back into the file.
If this was done as exchange profiles. Basically information groups rather than one large information block then a different format might get more buy in, and compliance over time.
As for HDF, and other formats, it's about how the data structures are laid out. There can be some performance issues with how timeseries data access (~traces) is organized in NetCDF files.
Some features of a format that could replace SEGY might include:
Bits are cheap. Include details of the used dictionary items in the file. In this day and age, we don't need to have a translation book external to the file. But we should use standard reference terms that are online as Linked Open Data.