-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misidentification of relevant coverage report causes tests to be thrown away #251
Comments
@j-ro I am working on a fix. Right now as you analyzed |
Absolutely happy to test, thanks!On Jan 3, 2025, at 2:13 AM, Sai ***@***.***> wrote:
@j-ro I am working on a fix. Right now as you analyzed cover_agent looks at the end of the file name and could have ambiguity as found in this case. I am working on using the name field instead of the filename filed. I could also use the full path as you suggested, but it is possible that sometimes multiple class files can be in a single source file. If I have a branch would you be able to test it and provide feedback if its working for your case?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
* While earlier PR[qodo-ai#230] managed to breakdown processing code into a class hierarechy, there wasnt any changes made to the code. This PR brings in enhancements to coverage processing where coverage data is stored by entity (Class or File). * Coverage data is stored using a FQDN so that conflicts are taken care. This closes[qodo-ai#251] * Earlier PR broke the behaviour of the agent that only target file coverage is considered if the global coverage flag is not set by the user, this PR fixes it to bring back the original behaviour.
@coderustic awesome, thanks! I'll try to give it a try this week! |
@coderustic any tips for getting it to run from source? I'm working on poetry install. It failed on my machine (M2 OSX Sequoia) on the litellm package. I could get it to complete by subbing out the codium pin for the standard one (https://github.com/BerriAI/litellm). But trying to run it errors:
I'm not a python dev, so I may be not doing something obvious. However the readme doesn't show any examples of how to actually run the CLI if you install it from source as far as I can tell. |
@j-ro Looks like main is broken and we use a fork of the litellm. Will let you know as soon as the build is fixed. |
* While earlier PR[qodo-ai#230] managed to breakdown processing code into a class hierarechy, there wasnt any changes made to the code. This PR brings in enhancements to coverage processing where coverage data is stored by entity (Class or File). * Coverage data is stored using a FQDN so that conflicts are taken care. This closes[qodo-ai#251] * Earlier PR broke the behaviour of the agent that only target file coverage is considered if the global coverage flag is not set by the user, this PR fixes it to bring back the original behaviour.
* While earlier PR[qodo-ai#230] managed to breakdown processing code into a class hierarechy, there wasnt any changes made to the code. This PR brings in enhancements to coverage processing where coverage data is stored by entity (Class or File). * Coverage data is stored using a FQDN so that conflicts are taken care. This closes[qodo-ai#251] * Earlier PR broke the behaviour of the agent that only target file coverage is considered if the global coverage flag is not set by the user, this PR fixes it to bring back the original behaviour.
* Enhanced coverage processing (#2) * While earlier PR[#230] managed to breakdown processing code into a class hierarechy, there wasnt any changes made to the code. This PR brings in enhancements to coverage processing where coverage data is stored by entity (Class or File). * Coverage data is stored using a FQDN so that conflicts are taken care. This closes[#251] * Earlier PR broke the behaviour of the agent that only target file coverage is considered if the global coverage flag is not set by the user, this PR fixes it to bring back the original behaviour. * removed sample-reports * bump version
CI Build was fixed but we had some failures in the regression tests when I merged this so I had to revert it. Apologies. |
no worries, let me know when I should give it a try (and what branch to use!) |
@coderustic wonder if you have an update here? Right now because I'm not able to run with poetry run, I'm unable to use the repo at all, since I removed the pipx install I had. Even if this PR isn't ready, do you have a path to getting the direct repo running? Is main still broken? |
@j-ro main has been fixed with the earlier issue with lite-llm. You can give a try either installing new version which is |
Thanks! Forgive me for my python ignorance, but how do I run with main latest? My git repo is up to date, but poetry run still errors:
|
I am sorry but looks like every one over github is merging fixes thats breaking everywhere, see this python-poetry/poetry#9961
|
Btw why dont you go back and try using pipx installed version? |
I'm using poetry version 2.0.0 I can go back to pipx, I just wouldn't be able to try this branch right? Or can I install branches with pipx? |
Thanks for being supportive and offering to help. I dont think this branch is ready yet, but I will try a different approach for you to test next time when my branch is ready. |
ok! I downgraded poetry and I have a different error now, but maybe I should just go back to pipx instead:
|
This is due to pydantic version. We have a deprecated warning currently which is probably removed in Pydantic version 2. I suggest you try using python 3.12 (if not use the pipx installed version so you dont have to deal with setting up a dev environment) |
ok great, poetry seems to be working at least somewhat. Still more work to do on my end but I may be able to get back running that way. If you have a branch to test at some point or something else do let me know! |
I'm also realizing as I test the main branch with poetry that the new (and fairly undocumented) use-report-coverage-feature-flag option probably obviates this issue if you use it, since it'll consider the entire coverage report and not just a part of it when deciding whether to keep tests or not. |
Yes, you are spot on (though I would name the flag as |
I will try it when I can, interesting! |
You might see that on the main file 1 data is overwritten by file 2 data (which is being fixed by #264) so it will be good to wait for few days and when I see all in progress work is merged I can let you know. |
sounds good! |
Though I did find kind of a weird behavior with the new option -- when we hit 100% coverage for the target file the loop keeps going until it runs out of iterations. That seems a bit unexpected? I can open another issue for that if you want. |
@j-ro thanks for pointing that out. @coderustic is going to address that in the next PR. Really really appreciate all the thorough feedback. |
FYI, I've been working on trying to get Python > 3.12 working but there are some libraries that haven't been playing nicely in the |
Any update on this one? Without stopping on 100% coverage, the tool is way less automatic (I have to monitor it and stop it manually unless I want to waste a lot of money on LLMs until we hit the iteration timeout) |
Have you tested it recently? We did some releases since and had some refactoring "challenges" as of late... |
@EmbeddedDevops1 yeah, I'm up to date with main, and with the use-report-coverage-feature-flag option, it is not stopping when the specified source file reaches 100% coverage. |
Okay, let me see if I can reproduce the issue and get on it. Sorry for the delay. |
no worries, thanks! |
reading the code a bit, my guess is the issue is somewhere here: qodo-cover/cover_agent/UnitTestValidator.py Line 566 in 22959a9
It seems like this might be a similar issue to my earlier report. My log always talks about a "non-source file" having coverage increasing, but that non-source file is actually my source file. So perhaps the lookup by path here has a similar bug to what I reported above. |
Hey @j-ro, nice to meet you. I was taking a look at this issue the other day, and I wanted to reach out. I think I have a working solution for the infinite continuation of tests when running in the I modified the python template test locally, with similar if not the same directory names as you posted above, but was unable to reproduce what behavior you're seeing when running in your ruby project. My testing has indicated that the file path should be the key that is associated with the new coverage percentages, so there shouldn't be any cross between values. What I did notice is that in my testing, the percentage are printed in the xml file in alphabetical order based on the key (file path), and on your posted example here, that isn't the case. I was wondering if maybe there is some coverage version mismatch that we aren't testing for. Can you provide a little more detail into what you're seeing? Coverage version, maybe some more insight into how you're structuring your app? I am not a ruby developer so apologies in advance for not being too knowledgeable. I've attached my test logs and coverage xml for your to review as well, and see how things line up. Command run:
Sample logs:
|
@kelper-hub huh interesting! thanks for helping me dig in here. Can you say more about what you mean by coverage version? This is a ruby on rails app. The coverage.xml file is being generated by the simplecov-cobertura gem, version 1.4.2 (https://github.com/dashingrocket/simplecov-cobertura) It doesn't seem to have a lot of options so far as I can tell. In terms of your log, mine looks similar. Here's an example of a run that I manually stopped when the target file reached 100% but didn't stop automatically. My command was this:
My starting coverage.xml file (with just rails test scaffolds, no actual tests) was this: https://drive.google.com/file/d/11_wjLskwoCRwSGkCyUQahl7rzu9mBd_y/view?usp=sharing My reading of that file is that my target file has a 67% coverage rate to start (it's a very small file with a lot of boilerplate, so this is expected). Overall there is a coverage rate of 14% of every file touched by this spec run. Rails pulls in lots of extra files, so again, not unexpected. Here's the log of my run:
Printing results from LLM model... language: Ruby
testing_framework: RSpec
number_of_tests: 0
relevant_line_number_to_insert_tests_after: 4
relevant_line_number_to_insert_imports_after: 1 2025-02-19 06:40:22,610 - cover_agent.UnitTestValidator - INFO - Running build/test command to generate coverage report: "docker compose exec -e RAILS_ENV=test -e SPEC_RUN_OK=true web rspec spec/workers/ladder_stats_worker_spec.rb" language: ruby
existing_test_function_signature: |
RSpec.describe LadderStatsWorker, type: :worker do
new_tests:
- test_behavior: |
Ensure that LadderStatsWorker logs the correct message and enqueues a job to Redis with the proper JSON payload.
test_name: |
test_enqueue_job_to_redis
test_code: |
# generated with AI by cover-agent (https://github.com/Codium-ai/cover-agent)
describe '#perform' do
let(:redis_double) { double('Redis') }
let(:ladder_id) { 42 }
let(:uuid) { 'user-uuid-123' }
let(:stat) { 'exit_count' }
let(:amount) { 10 }
before do
$redis_stats = redis_double
allow(Rails.logger).to receive(:info)
allow(SecureRandom).to receive(:uuid).and_return('static-generated-uuid')
end
it 'logs the correct message and calls Redis sadd with proper JSON payload' do
expected_payload = { ladder_id: ladder_id, uuid: uuid, stat: stat, amount: amount, identifier: 'static-generated-uuid' }.to_json
expect(redis_double).to receive(:sadd).with('ladder_stats', expected_payload)
LadderStatsWorker.new.perform(ladder_id, uuid, stat, amount)
expect(Rails.logger).to have_received(:info).with("LadderStatsWorker stat: #{ladder_id},#{uuid},#{stat},#{amount}")
end
end
new_imports_code: |
""
test_tags: |
happy path
- test_behavior: |
Ensure that LadderStatsWorker operates correctly when given a negative amount, still logging and enqueuing the job.
test_name: |
test_with_negative_amount
test_code: |
# generated with AI by cover-agent (https://github.com/Codium-ai/cover-agent)
describe '#perform with negative amount' do
let(:redis_double) { double('Redis') }
let(:ladder_id) { 100 }
let(:uuid) { 'negative-test-uuid' }
let(:stat) { 'score' }
let(:amount) { -5 }
before do
$redis_stats = redis_double
allow(Rails.logger).to receive(:info)
allow(SecureRandom).to receive(:uuid).and_return('neg-static-uuid')
end
it 'logs the message and enqueues the job to Redis even with negative amount' do
expected_payload = { ladder_id: ladder_id, uuid: uuid, stat: stat, amount: amount, identifier: 'neg-static-uuid' }.to_json
expect(redis_double).to receive(:sadd).with('ladder_stats', expected_payload)
LadderStatsWorker.new.perform(ladder_id, uuid, stat, amount)
expect(Rails.logger).to have_received(:info).with("LadderStatsWorker stat: #{ladder_id},#{uuid},#{stat},#{amount}")
end
end
new_imports_code: |
""
test_tags: |
edge case 2025-02-19 06:40:55,401 - cover_agent.UnitTestValidator - INFO - Running test with the following command: "docker compose exec -e RAILS_ENV=test -e SPEC_RUN_OK=true web rspec spec/workers/ladder_stats_worker_spec.rb" language: ruby
existing_test_function_signature: |
RSpec.describe LadderStatsWorker, type: :worker do
new_tests:
- test_behavior: |
Tests that perform method logs the correct message using Rails.logger.info.
test_name: |
logs_correct_message
test_code: |
# generated with AI by cover-agent (https://github.com/Codium-ai/cover-agent)
describe '#perform logging' do
let(:ladder_id) { 1 }
let(:uuid) { 'test-uuid' }
let(:stat) { 'win_count' }
let(:amount) { 3 }
it 'logs the correct message' do
expect(Rails.logger).to receive(:info).with("LadderStatsWorker stat: #{ladder_id},#{uuid},#{stat},#{amount}")
allow($redis_stats).to receive(:sadd) # Stub redis to avoid actual call.
LadderStatsWorker.new.perform(ladder_id, uuid, stat, amount)
end
end
new_imports_code: |
""
test_tags: happy path
- test_behavior: |
Tests that the perform method enqueues a correctly formatted JSON payload to $redis_stats using sadd.
test_name: |
enqueues_correct_json_payload
test_code: |
# generated with AI by cover-agent (https://github.com/Codium-ai/cover-agent)
describe '#perform redis enqueue' do
let(:ladder_id) { 2 }
let(:uuid) { 'user-123' }
let(:stat) { 'exit_count' }
let(:amount) { 5 }
let(:fake_identifier) { 'fixed-uuid' }
it 'calls $redis_stats.sadd with the correct JSON payload' do
allow(SecureRandom).to receive(:uuid).and_return(fake_identifier)
expected_payload = { ladder_id: ladder_id, uuid: uuid, stat: stat, amount: amount, identifier: fake_identifier }.to_json
expect($redis_stats).to receive(:sadd).with("ladder_stats", expected_payload)
LadderStatsWorker.new.perform(ladder_id, uuid, stat, amount)
end
end
new_imports_code: |
""
test_tags: happy path
- test_behavior: |
Tests that the Sidekiq worker queue option is set to ladder_stats.
test_name: |
worker_queue_option_set
test_code: |
# generated with AI by cover-agent (https://github.com/Codium-ai/cover-agent)
describe 'Sidekiq configuration' do
it 'has the correct queue option set' do
# Access the sidekiq options via get_sidekiq_options hash.
expect(LadderStatsWorker.get_sidekiq_options['queue']).to eq('ladder_stats')
end
end
new_imports_code: |
""
test_tags: happy path 2025-02-19 06:42:01,957 - cover_agent.UnitTestValidator - INFO - Running test with the following command: "docker compose exec -e RAILS_ENV=test -e SPEC_RUN_OK=true web rspec spec/workers/ladder_stats_worker_spec.rb" Recommended Fixes: Recommended Fixes:
My reading of this log is:
The coverage.xml file at the end looks like this: https://drive.google.com/file/d/1fchHl53a9_3FxEBmkVXOC_gsYOzTAjOD/view?usp=sharing Where you can see full coverage of my target file. |
There is a bug when matching a test run with the targeted file's coverage report where, if there is more than one file with the same name endings (different paths), the matched coverage report will not be the file we are targeting, causing an erroneous report that coverage did not increase, and thus causing the tests to be thrown out.
For example, my coverage.xml file might look like this (this is a Ruby/Rails project):
As you can see, there are two files that end in "report_invite" in this report. As they are related to each other, both are exercised when the one target test file (in this case, "spec/models/report_invite_spec.rb") is run. So they both show up in this report.
Here is the command I'm using:
As you can see from the above, the target file is "app/models/report_invite.rb". And if you manually look at the coverage report, you can see that it is 36% covered right now.
However, cover-agent reports that it is only 32% covered:
Digging in, you can see it is matching to the "app/models/concerns/has_report_invite.rb" file, not the actual target, which has that percentage covered.
Because cover-agent is reading the wrong file for its coverage report, it throws out tests it generates as not increasing coverage, causing a lot of wasted time and expensive tokens.
I'm guessing the issue is with this line: https://github.com/qodo-ai/qodo-cover/blob/main/cover_agent/CoverageProcessor.py#L131
The fix is probably to try and match on the path, rather than just the partial file name, as it assumes all file names in a coverage report are unique (or don't share endings), when that is only true in a given path.
If I get some time I'll try and make a PR here, but wanted to post this bug first to see if folks had feedback on how to approach it.
The text was updated successfully, but these errors were encountered: