-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSEED: accumulating record-level micro gaps/overlaps over long time spans #3447
Comments
I'm a little bit confused. From what I can see the sampling interval is stated in the file as exactly 10s but the individual records' start times don't line up, i.e. there are gaps and overlaps on a subsample scale between every record? That correct or am I missing something? We are merging together records that have gap/overlaps on subsample scale, otherwise you would end up with 420 individual Am I correct that you want obspy to assume the file is without gaps and compute an odd sampling rate and treat it as a gapless trace? I'm pretty sure that would mean touching our C code wrapper for libmseed around here, maybe passing in a flag from Python to change behavior in the wrapper. I'm no C wizard, so definitely somebody else would need to do that. |
I am doubtful of that statement.. note how the individual records' start time single digit seconds jumps around "7" but with ".000000" decimal..?!? msi version: 0.9.6
|
could make sense as a switch, but this would mean adding a lot of logic to our libmseed wrapper code and somebody else would have to do it. |
compare code from mseed2sac: |
The basic issue is that the datalogger is supposed to make one measurement every 10s. However, due to the poor quality its implementation, this is not stable, so that there is a slow drift. From one miniseed record to the next, it's always within the tolerance, but on a whole day, it adds up to about seven minutes, which means, that the timestamps are completely wrong after reading the file and writing it to disk. A less surprising behaviour would be to declare a gap or overlap whenever the accumulated time difference is larger than the tolerance. |
I agree that makes sense, I might not be able to work on it anytime soon though, trying to finish up some loose ends and more of those appear than get closed anyway.. I'd still say though, that the initial problem comes from a quite bad file and not us and it's an edge case. |
What you can do in the meantime is this: from obspy import read
from obspy.io.mseed.util import get_start_and_end_time
path = '/tmp/proof.mseed'
st = read(path)
assert len(st) == 1
tr = st[0]
start, end = get_start_and_end_time("/tmp/proof.mseed")
tr.stats.sampling_rate = len(tr) / (end - start) This isn't safe though, and relies heavily on lots of assumptions (ordering of records, no multiplexing, constant record size, ..., see docs for that helper routine). |
Avoid duplicates
Bug Summary
The file linked here contains data from 2024-05-14T23:52:07 to 2024-05-16T00:04:33. However, when using
this is shown:
The problem is that the delta is slightly higher than 10.0 due to the behaviour of the datalogger, however, the difference is within the default tolerance of obspy, so that the difference between the time used in obspy and the actual timestamp grows up to ten minutes throughout the day. The time record for each individual miniseed record is correct. Other software such as qmerge handles this differently. Wouldn't it make more sense to accumulate the time difference between the time indicated in the header and the time that obspy computes based on the time stamp of the first header + delta*number of seen data points and introduce a break once this gets bigger than the tolerance instead of not accumulating it and checking it for each boundary between records individually?
Code to Reproduce
No response
Error Traceback
No response
ObsPy Version?
1.4.1
Operating System?
No response
Python Version?
3.10.12
Installation Method?
pip
The text was updated successfully, but these errors were encountered: