Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function to analyze history branch/data #50

Closed
llrs opened this issue Feb 11, 2022 · 6 comments · Fixed by #56
Closed

Function to analyze history branch/data #50

llrs opened this issue Feb 11, 2022 · 6 comments · Fixed by #56

Comments

@llrs
Copy link
Contributor

llrs commented Feb 11, 2022

I know cransays is not really to deliver code, but I have some code to merge all the csv files of the history branch that I think it would be helpful to others (and myself) if it were documented here.
The code solves merging some files with different headers efficiently (previous iterations of the code lasted 30 minutes and now I can do it in just 1).

I think it doesn't have dependencies and wouldn't need to be run or tested but it could help others if they want to analyze the data.

Let me know if it would be helpful/appropriate and I would create a PR with the code.

@Bisaloo
Copy link
Member

Bisaloo commented Feb 11, 2022

I think it's definitely interesting to have this code somewhere. But I'm not 100% sure where.

We have been talking with @maelle about changing the location of the historical data. Possibly to a separate repo. If this happens, then I think your code would be better there than in the main repo(?). Not sure... What do you think?

@llrs
Copy link
Contributor Author

llrs commented Feb 11, 2022

Currently it is on some file of my code for some presentations so it is public (but probably hard to find :)

Yes, she mentioned something on #36 (comment). Currently with this code I haven't found a problem dealing with these many files, the code I previously used was highly inefficient (ultimately too many files might become a problem, but I'm not sure of the OS limits or R limits on this).
I think it would be better to have the data in a database somewhere, a branch with a single SQLite database, a server somewhere? Which I think comes down to maintenance costs and usage of the data you want to promote.

@maelle
Copy link
Member

maelle commented Feb 14, 2022

A package to consult historical data could be called {cransaid} 😁

If the data were in a separate repo shouldn't the package be in a third repo?

@llrs
Copy link
Contributor Author

llrs commented Feb 14, 2022

If the data is in a different repo there is really a need for a new package? To split the functionality between recording data {cransays}, storing data {cranwas} and analyzing data {cransaid}?

@Bisaloo
Copy link
Member

Bisaloo commented Feb 14, 2022

I thought this over again and I think it actually makes sense to have the function to load the historical data inside cransays.

I think having a short analysis of historical data on the cransays website would be useful to give users of the dashboard an idea of a typical path and what they can expect for their submission.

In particular, we could partially address #29 and #40 by dynamically generating a flow diagram with igraph based on historical data.

@llrs
Copy link
Contributor Author

llrs commented Feb 14, 2022

Note that the #40 archive directory not showing up was not solved.

Update of packages already on CRAN are sometimes very fast (<15 minutes) so they aren't captured by the dashboard.
Only new packages that take time to be processed could be actually represented.
However, I think such a report should be careful with time estimations for the process: it could encourage negativity towards the CRAN reviewers if packages don't go through within expectations, and I don't think this would be productive for either part.

I will create a PR with the code I used to get together all the files (maybe I'll need to modify it to be able to parse the new column recently added). There is also a .R file on the history branch https://github.com/r-hub/cransays/blob/history/analysis.R

@Bisaloo Bisaloo linked a pull request Apr 24, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants