-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AI automation for newsletter #1384
Comments
I would love to see this experimented with. The AI-assistance track is well worth exploring, but it's worth noting that this type of automation could also work with a much more basic feed aggregator that simply asks projects to list their update feeds (blog rss, mastodon, github releases etc.) and it'd create summaries for projects by simply linking out to their updates for the past month. |
@erlend-sh that is actually a really really awesome idea!!! |
This week in rust has an interesting bot which might be worth investigating : https://github.com/extrawurst/twir-bot |
@ElhamAryanpur if you're still up for implementing something like this, I'd be very up for reviewing it and getting it merged :) The feature I'd like to see the most would be a short automated summary for content no one has written anything for yet. Maybe it's already enough to feed the raw HTML to GPT and ask it for a summary? I also know that there are services that do this kind of thing for you using GPT like https://notegpt.io/web-summary, idk if they're better than just entering our own prompt though. |
@janhohenheim absolutely, since then I've invested a lot of time in my own side of LLM based software, and can say it's even better than ever to do something like this. We can have four approaches:
Personally I think second option would be the best to start with. RAG helps with auto search of changelogs and summary writing. Pull requests too but perhaps a bit difficult automatically locally than github actions 😅. Let me know which options you'd think is nicer and I can begin. |
I also recommend checking out https://spiderwebai.xyz/ by @j-mendez |
@ElhamAryanpur great to hear! Since the newsletter has historically struggled with maintainer burden, I am more inclined to option 4. You know this stuff better than me though, so if you think that option 2 would be really really good for us, I'm ready to rent a cheap server on DigitalOcean and give you access. |
Yeah they're using RAG too, I assume langchain by most chances |
That is very true, I have wrote a section about my work there in the past and it shocked me how much work the maintainers did every month... Yeah we can start locally for development, get some early testing on the newsletter, if the results were great, we can then move to hosting or keep it locally. I just don't wish to burden you for paying the servers or API 😅 trying to get a solution that anyone can use and contribute instead of hurting your wallet, especially at this stage |
@ElhamAryanpur alright then! Do you need anything from me to start? How do you want to organize yourself? If you create a repo with a readme on how to run the model, I can ensure it runs on my machine in the background (or on a machine I rented anyway to host a Minecraft server, hehe) |
For sure, it'll probably be a repo. I'm not sure much on else yet, will keep you updated here. Thank you! |
Hi! If the bandwidth is minimal and simply a page or two ( it would take a lot of request to get to 1$ ), we also do not pad the cost for GPT from OpenAI. The dashboard is very early stage and being actively improved. The service is more flushed from an API perspective atm. I recommend testing a basic prompt on the GPT playground and works it works off a small set of HTML - use the GPT configuration to extract what is needed etc. Lmk if you have any questions. Thanks @erlend-sh! |
Hm? We did talk about it in the options listed |
If you create an account I can add a dollar to the account to experiment. The service goal is pretty much putting this project on a server to scale https://github.com/spider-rs/spider. |
Oh the issue isn't that it can't be done through OpenAI, we're just exploring different options. I'm potentially looking into making it run locally as to cut down the charges from ever occuring. Because it won't just be a page or two of review, it'll also be crawling the changelogs and releases of different project and compile them too, so we're looking at a lot of tokens being used. But yeah the code should be able to be used by any service, including OpenAI in the future. But for now I'm keeping things simple during development. |
Hey folks 👋 I noticed last weekend we have not been publishing any newsletters recently, stumbled upon this and the other discussion about maintenance burden, and I wanted to try out an experiment to see if we can improve this. I sort of reached very similar conclusions to the ideas in this thread that more automation is needed to scan stuff, some AI to summarise stuff (or this needs to be done by a human in the meantime), and in general something that can ease the maintenance burden, for example having a basic script that can prepare a draft that needs to be edited, rather than fully created. Take a look at my experiment here - https://github.com/iolivia/newsletter-bot Current things it can do:
There is an example output of the local script
And an example of the markdown file it produces here. Let me know what you think about this, maybe this is a good starting point 😄 |
@iolivia wooooah, that's cool! I'll take a closer look once I have time :) |
@iolivia amazing work! I can help with the AI part for summary text, will open a PR |
@iolivia I checked out the repo, and it looks really nice! Good work! One thing of note is that right now, the bot is a bit too good. Many of the news provided are, in my opinion, not significant enough to be included in the newsletter. Removing them by hand is trivial though :) Other than that, we could ignore all posts below a certain amount of upvotes / hearts / retweets etc. and all crate updates that only change the patch version. Another thing I'm wondering is how to use the bot in practice. Running it at the beginning of the newsletter (the 3rd of the month) seems useless, since it would only aggregate news of the last 3 days. Running it in the middle is a bit arbitrary and will miss quite a few cool updates. Maybe we could add a GitHub Action to run it right at the freeze period to add all news no one has written about yet? If we want this completely automated, we should add the output of the bot to the newsletter only if the newsletter does not already include that content. Another nice thing would be Discord integration like in TWIR, but that's very much optional. For the moment, the bot is definitely good enough to be used manually. Again, great work! |
The discord integration can be added through webhooks, I have done a few projects with it so I can assist with that. And a solution for the news can be:
I have pushed a PR using LLaMa library for summary, and if we use the model I recommend, dolphin mistral 7B v0.2 GGUF, we should be fine with pretty much any size of gathered release notes as the model supports upto 32k token context length (to compare, ChatGPT at launch had only 2k context length). The model needs ~4.1GB VRAM so pretty much anyone can run it too. Hence gathering the news periodically and summarizing them at the end should be OK. What do you guys think? |
@ElhamAryanpur sounds great! Couldn't we summarize them at the point they get gathered? For example, the bot could aggregate news every 3 days, add them to the GH issue and write a generated summary into the current newsletter markdown file. @iolivia would you be available for implementing that part? Or would you like some help? |
it is possible, just that the model could be too large for github actions to run, and locally it has no batching support yet. So it could take some time to summarize everything. also having it summarized at the end can help the bot have a complete picture of all the development and make a better summary. But we absolutely can do the every 3 days too. can set the bot on a cron job in a server somewhere. |
@ElhamAryanpur I've got a fedora server ready to run it :) |
hell yeah! |
So happy to see all the progress, thanks so much everyone for the awesome contributions already! 🔥
Agreed, this was my observation as well! I tried to experiment with removing releases with notes less than x characters, but then you miss all the major releases that have a link to a blog post for notes. Maybe another idea is to create a mini-section for minor releases at the end of each section with mostly a link to the repo and the release version and a one liner, this could help discover repos that are active.
No strong feelings on this tbh, but trying to keep it simple the options I see:
|
@iolivia since the newsletter died last time because of maintainer burden I'm weary of adding anything that adds any friction to the process, so I'll be automating as much as possible. I'll try to add a script for all of this after this newsletter so your bot is integrated in the next cycle 🚀 |
@janhohenheim typesafe 🔥 blazingly fast 🔥 automation to the moon. |
If future newsletter entries are going to be AI edited or generated, I'd like to request that no content that I work on for the Rust community be included in the newsletter. I appreciate the good intentions of everyone involved in this effort. The point raised by @17cupsofcoffee in #1417 (comment) is very salient and captures my sentiment well. I like the format from This Week in Graphics. The author is funded via Patreon and writes very short bullet point summaries. I'm interested in helping push for a grant from the Rust Foundation to fund a writer or editor to help with the newsletter as an alternative to involving AI! |
I think @LPGhatguy's comment raises a good point that hasn't been made in these threads already - using AI in the production of the newsletter will discourage some people from reading/contributing1 (there are a lot of people who aren't massive fans of this tech - in creative spaces, especially!), and I hope this is weighed up versus any potential benefits of using it. Footnotes
|
That's fair. It cuts both ways though. Conversely, I got to a point where I dreaded having to make PRs for the newsletters because I was already writing posts for my project's blog, mastodon, discord etc., which meant my marketing-energy was already spent. As much as I loved the newsletter it also felt like a burden sometimes, since I knew I had all these updates that I should share but didn't, as I simply didn't have Yet Another Post left in me. Also, due to the immense workload of manual curation, the alternative we've implicitly opted for during the past several months has been no newsletter. I'm on the record as an AI critic, but that doesn't mean I think it should be unilaterally shunned as a technology, especially not for one of the very few things it's actually good for, namely text summarization/consolidation. I could get behind an objection against proprietary, cloud-based AI, but I really don't have many qualms about the self-hosted variety, in particular when the final publishing is still subject to human review. The |
Definitely valid points! @LPGhatguy @17cupsofcoffee I appreciate the tone of both of your feedbacks; it's obvious that your criticisms were written in good faith :) |
PS: as @erlend-sh notes, the current plan is to have everything self-hosted and fully under our control. The bot using the tech is open sourced as well. |
I'd like to add that the summary in my opinion is not the only thing it can be helpful with. Originally the idea was to help the maintainers with the already submitted entries. Such as spell checks, grammar check, rephrasing things, better title, following guideline, ... Those were things almost completely done by the maintainers, which is a huge burden especially when there are a lot of entries. Automations of such things aren't there to replace the original and final human authors, but rather to assist in cleanups. |
Automation and a self-hosted bot is fine with me. If the newsletter is going to have any AI-generated content though, I'd prefer that my content not be present at all. If the group is interested in paying a part time editor to be involved and do this work, I think that is an acceptable alternative to me. |
Hey folks! Just to chime in here as the author of the newsletter bot, my intent is to automate the gathering of release notes, social media posts etc to help with the content gathering, very similar to TWIR. After this initial gathering there needs to be an editing process, and the AIsummarisation feature was just an idea mentioned in this thread on how to help with shortening lengthy release notes when they are not relevant. For me the whole point of the newsletter is to help build the community around Rust game development and showcase the amazing work of so many people. I don't think it's worth sacrificing this mission for shortening some text which a human can do within minutes. So while I understand the appeal of using AI and the fun factor, given the feedback I would focus the project on gathering content and making it clear a human-centric editing process still needs to happen. Thoughts? |
@iolivia creating a summary only takes a few minutes in a vacuum. As @erlend-sh mentioned, most of us do this in the spare few hours we have and the mental burden of "this one more thing I need to do on time" is not to be underestimated. I don't want to reveal personal information, but I know of at least one maintainer that greatly struggled mentally with this. As @ElhamAryanpur noted, getting rid of formatting and spelling stuff is just a great for making sure that maintainer burden stays low, which is historically the one thing the newsletter struggled most with. Based on the feedback, I think the most productive way forward is this:
Would that be okay for you, @iolivia and @ElhamAryanpur? Since this process leaves 95% of the content to humans and we never automatically include anything a bot writes automatically, I'd happily count it as a human-centric workflow with some tools to help the humans do their thing. |
I don't think this is a fair characterization. There are lots of people who are paid full time to be editors and I don't buy into the idea that AI tools should replace their labor. In a time where many people, especially in the games industry, are concerned about companies buying into AI at the expense of humans, AI is unarguably the wrong direction. If no one is invested enough to summarize a piece of content to include in the newsletter, is it worth including that content at all, or can it be a footnote? Does every small piece of news that happens in the community need a featured section, or can those be editorialized to just a few to reduce maintainer burden? To be clear, I'm not exciting about pulling my content or contributions from the newsletter. Including AI generated summaries in a gamedev-focused newsletter would be very disappointing and I'm certain I'm not alone in that sentiment. |
It is a good idea to mark AI content being generated so users know it is not from a human. Lots of opportunities to use AI without it being tied to curation. If being used to curate,send a PR that can be linted and manually merged if or auto merged if checks and balances are good. |
@j-mendez the line there is very fuzzy. When I'm editing stuff I have GitHub copilot running in the background helping me out. An AI might go over the entire document later to fix typos or weird sentences. Generated summaries are reviewed and, if needed, changed by humans. But as a minimum step we could add a little asterisk next to summaries that have initially been generated. What do you think? @LPGhatguy I definitely agree to the sentiment that we should be very careful about automating peoples jobs away. I still sense that you are coming from a position of good faith, and I appreciate how you phrase your concerns, so I hope I don't sound to harsh when I say this. Again, I hope I am not overly harsh (please tell me if I come off otherwise). I appreciate your input on this and understand your reasoning. We will also respect your wish to not be included in the newsletter from now on. |
I am a (somewhat) prominent member of the Rust gamedev community. I'm building a commercial game using Rust. I'm both personally and professionally invested in the perception of our community. The newsletter is one of the main communication channels that goes out from our community and into the public sphere. It's also published under an officially-labeled working group of the Rust project. If there is a "hill to die on," I think the involvement of AI generated content in this context is a great cue to step in and make sure my voice and the sentiment of my peers in the games industry is heard. If there is a better public space to raise these issues, please let me know.
Frankly, I do not think that you or others in this thread have seriously entertained the proposed alternatives. The options are not "AI" or "no content" and calling this labor "busywork" is reductionist. This comes across to me as a human problem that the group is trying to solve with technology. How many hours of work is this per newsletter? Alternatively, why die on the hill of involving AI in a project struggling with maintainer burden and community engagement? The risk of creating bad press is fairly high and the upside is low. I'd like to reiterate that I am personally volunteering to negotiate with The Rust Foundation or potential corporate sponsors like Embark to secure funding for this role. If those avenues aren't successful, I am also willing to hire and manage a writer or community manager that could assist with this work. |
@LPGhatguy I indeed didn't realize you volunteered for this, thanks. I'm completely fine with someone being paid to do this, I just don't personally think it is worth doing so in this case. |
I am up for it, we can implement those in the newsletter-bot |
Hello yes: If LLMs are used to generate content for this newsletter, I will boycott it and encourage others to do the same. I definitely won't ever be submitting to the newsletter again if this goes through. |
@junkmail22 can we please have a discussion without resorting to these tactics? I am open to talking everything through, but to say "Don't use this technology or I will try to sabotage the project" is just not an okay way to converse in an open source environment. Neither would it be if I said "Accept AI or I will quit being a maintainer". |
@janhohenheim I've voiced my disapproval on the survey, as well as in other places. Please understand - this isn't an underhanded tactic or anything. It is simply the case that I don't want to have anything to do with a project that uses LLM technology. My disapproval to this is very strong, and I am voicing that disapproval. Even if I didn't voice my feelings in this thread, I would still boycott the newsletter if LLMs were used to generate articles. Frankly, I think this is a very good place to voice my disapproval, instead of just an anonymous survey. |
@junkmail22 that's all fine, please do voice your disapproval, I want this exchange to happen. |
Hello!
After some conversation with Ozkriff and looking at how much work goes into editing the newsletter each month, I was wondering if it'd be a good idea to start using some automation tools.
For editorial roles, something like chatgpt/gpt4 can assist a lot. What I had in mind was the bot running on each pull request, and checking the content that was added and feed that to the AI for editing and auditing, and returning either a fixed version or list of things to do to fix them or update them.
For example one of my PR had too much repetition and extra information, and the title wasn't good. In such a case it can do all of that given the newsletter guidelines, or notify me to do them the way required.
Cost wise, since the newsletter is a monthly release, I don't think we can exceed a dollar at the busiest month given how cheap the api is, and since each section is small, the context of gpt3.5 won't be a problem either.
One other thing I remembered is that it can also assist in writing for projects that were announced but had no one to write for them, for example we had Rusty Jam #3. It can also write a section for it without taking time off the editors, and it can just be reviewed for fixes and added.
These are just my suggestions so I'm not sure if it can be appealing to use, I have some experience with OpenAI so I can assist in implementing it.
The text was updated successfully, but these errors were encountered: