-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panstarrs_get_lightcurve using lsdb #344
Conversation
MAST helpdesk responded that they will fix the file problem on S3 before Oct 7, so this PR will wait for that fix to move forward, so that testing can be completed. |
Now fully functional!
Notes
This will close #165 |
It's a bit long, but for fornax it's totally acceptable unless we hit a VM limit. We currently skip this notebook due to the ZTF private bucket access. (and I don't know why we see the error with the classifier notebook, that should not be affected by this PR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. Mostly minor, but I picked up on some things to make the API nicer (e.g. quantity inputs, or being explicit with parameters)
# num_normal_QSO = 5000 | ||
# zmin, zmax = 0, 10 | ||
# randomize_z = False | ||
#num_normal_QSO = 30 | ||
#zmin, zmax = 0, 10 | ||
#randomize_z = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these changes intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, it doesn't matter what the values of these are since they are commented out. I do change them when I am working on the code and testing different sample sizes, and then most often forget what they originally were and leave them at whatever the last value I tried. Does that cause a problem? Or is it the spaces between the # and the commands that is concerning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the adding unrelated and unnecessary changes. I see why it happens, but it would be nice to figure out a convenient way to change the workflow so only intended and necessary changes get added.
I added this topic to my list for the ipac visit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes! I don't know how to do this, so perfect to discuss later this month.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen a new (to me anyway) "git" tab on the LHS of the Fornax console that gives GUI access to changed and staged files, etc. Maybe there's a way to select and commit individual lines/changes from there? I haven't played around with it at all but seems worth checking out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap, I put on the list for my visit to experiment something out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking it over and the comments, I think I got them all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup changes to be committed and then this is good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, glad it's working! I left a mixture of comments below ranging from minor naming suggestions to code structure questions that will take a little more work to figure out. Anything you can't or don't want to address here can go into an issue and be assigned to me.
Co-authored-by: Troy Raen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have addressed all the comments and made all the requested changes. Let me know if I missed anything.
dict(flux=pd.to_numeric(flux_panstarrs, errors='coerce').astype(np.float64), | ||
err=pd.to_numeric(err_panstarrs, errors='coerce').astype(np.float64), | ||
time=pd.to_numeric(t_panstarrs, errors='coerce').astype(np.float64), | ||
objectid=pd.to_numeric(objectid, errors='coerce').astype(np.int64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This represents a day of my life that I will never get back. It turns out that lsdb returns some interesting data types (eg. double[pyarrow]
), which pandas seamlessly handles, and hides, unless you think to ask about data types. Unfortunately, other codes do not handle them well, namely our plotting functions.... This is the way I found of getting the data types converted into data types in the data frames that can be handled by plotting. Maybe there is another, simpler, way of doing it, but this is functional, and not all combinations of these things are functional, ie., I believe .astype
alone doesn't work. The coerce
I believe was an attempt to handle nan
.
Your comments and experience around working with the dtypes would be a super useful feedback to the lincc team I think. |
LGTM. Thanks! |
message received, I put this in as feedback to lsdb in issue #441 over at astronomy-commons/lsdb |
@bsipocz when you get a chance, can you please merge this PR? I don't know why it is failing ci. |
It seems to be something unrelated, but I'm not sure I can dive into the details before I get back home. |
I'm not in a hurry, thanks. |
We only need distributed extras here, but having any dask triggers the need for dataframe in the classifier notebook
OK, my latest commit should fix the failure. Basically adding some dask would trigger |
Hmm, apparently this is in fact an |
26dd3be
to
fedbd7e
Compare
I'm not sure why the classifier notebook run into VM limits now, I'm doing some more debugs and see if it is still OK with GHA. If yes, then I'll bring over the configs from the IRSA notebooks to we can easily opt in GHA build here, too. |
|
GHA: GitHub Actions. We use GHA for doing the HTML rendering and hosting the pages with github. But for the PRs we use a different system, and their limitations differ a bit. It was already close to the limit but from below, I suspect adding the new dependencies may changed it worse the worse a bit, but we can't do much about it. TL;DR, I may need to tweak the configs but I don't think there is much here to do about it. So I would merge this now and figure out issues with circleCI if/when they arise on main. |
panstarrs_get_lightcurve using lsdb ee9f4f1
having some trouble catching exceptions for FileNotFound in the call to lsdb's crossmatch with the Panstarrs catalog. It would be nice if we could somehow catch this error so that the crossmatch doesn't fail if a file is missing. Instead, it would be ideal if one of the objects fails the crossmatch, that the crossmatch still finishes with the other objects complete, and the final result just looks like no match was found for that trouble-causing object/file.
Helpdesk request submitted to MAST 9/19 about the missing file.