-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: merging Survival packages #1
Comments
Hey Pietro, that sounds like an awesome idea. If you'd be willing to contribute that functionality to this package, that would be awesome! |
Cool! I have Cox working in your package. I'll polish it and make a PR in the next few days. One question though: how do you want to integrate your package with DataFrames (in order to use formulas in Cox regression)? The way I had done it was to simply add a column |
I hadn't intended to add a dependency on DataFrames, but it seems reasonable to add one on DataArrays. An even better solution may be to make everything here as generic as possible so as to allow arbitrary arrays (including |
I see. I got it to work without a dependency in DataFrames, in a way that
works, except for one silly issue for which I need your help. Cox regression doesn't actually give a value for the intercept, so it should be excluded for the formula (otherwise this step fails) . How can I specify in the call |
Ah right, formulas are an important part of things like this... That functionality will eventually be split out of DataFrames but in the meantime that's where it lives so I guess a dependency on DataFrames makes sense. (Though IIRC GLM gets by without it somehow, haven't looked into that.) There was talk about requiring the intercept in formulas to be explicit but in the meantime an intercept is implicitly added. I believe the way to get around that is to add |
I see, then with the hack that the formula needs to be called with the |
Cox models are indeed and interesting case where implicit intercepts do not make sense. Cf. JuliaData/DataFrames.jl#574. Though, would it be possible to handle this in the Cox model code, by dropping the column for the intercept? I guess that's what R does. |
Almost, The issue is that I have to compute the coeftable for the CoxModel (leaving in my package and being independent from DataFrames), without having access to the formula/ModelFrame, so that the coeftable is generate with placeholders as regressor names (see for example here ). After that, the CoxModel gets embedded in a DataFrameModels here ) and finally when displaying the outer layer, i.e. the DataFramenModels, the coeftable is filled in with the correct labels (happening here ). So I got it to work with normal formulas by taking out one column but there are two issues:
I couldn't figure out how either can be fixed without a dependency to DataFrames. I'd recommend that for now we require an explicit
At least for now, I'll try and make a PR with this design so it becomes easier for everybody see what are the issues and if there is some smarter solution that I'm missing. |
I see. Requiring |
100% agreed. I think this only is an issue because a dependency on DataFrames is much heavier than a dependency on some lightweight package like StatsModels. On a separate note, when the formula formalism gets an independent package, it will be important to document it: the only way for me to produce some |
I believe StatsModels has some docs, and I thought there was a section on formulas in the DataFrames docs as well.
😕 Sorry about that, that kinda sucks. I really appreciate the effort though. |
Hi!
A few months back I had started developing a package for Survival analysis in Julia (it's here )
So far it mainly has Kaplan Meier, Cox proportional hazard model and Accelerated Failure Time models.
Cox model is well optimized (last time I benchmarked it was about 3x faster than matlab's version on some test dataset). Accelerated Failure Time models are not as polished (and way less common I guess).
I don't think I will have enough time to dedicate to this in the future to make one fully polished Survival package alone (plus, it makes limited sense to have 2 separate Survival packages). If you think that would be valuable I can polish my version up a bit (get it up to date with Julia v0.6 and so on), make it compatible with your type system/formalism and make a PR to your package to start unifying things. I guess the first thing to contribute would be Cox (as it's more polished/ it's more clear how to do it). Accelerated Failure Time models can wait until they are cleaner and it also depends whether you are interested in them.
Let me know what you think!
Pietro
The text was updated successfully, but these errors were encountered: