This repo contains the source code and results of FlakyDoctor, a neuro-symbolic approach to fixing Implementation-Dependent (ID) and Order-Dependent (OD) tests.
File structures in this repository are as follows, please refer to README.md
in each directory for more details:
- datasets: Datasets of flaky tests in the evaluation.
- patches: Successful patches generated.
- results: Detailed results for successfully fixed flaky tests in the evaluation.
- src: Source code and scripts to run FlakyDoctor.
This section provides a quick demo using GPT-4 to reproduce sample results in ~40 minutes.
0. Before starting:
- FlakyDoctor works on
Linux
with the following environment:
Python 3.10.12
Java 8 and Java 11
Maven 3.6.3
- The current FlakyDoctor supports GPT-4 and Magicoder. Please prepare an openai key to use GPT-4; if you want to run Magicoder, download its checkpoints into a local path. We use three NVIDIA GeForce RTX 3090 GPUs in our experiments.
1. Set up requirements:
git clone https://github.com/Intelligent-CAT-Lab/FlakyDoctor
cd FlakyDoctor
bash -x src/setup.sh |& tee setup.log
2. Create a .env
which includes your local path of model Magicoder (you can skip this step if only running GPT-4):
echo "Magicoder_LOAD_PATH=[Your local path of Magicoder checkpoints]" > .env
3. Run the following commands to fix demo tests with GPT-4: Please put your openai key at the placeholder.
# install Java projects
bash -x src/install.sh datasets/demo_projects.csv projects outputs install_summary.csv
# fix flay tests
bash -x src/run_FlakyDoctor.sh projects [openai_key] GPT-4 outputs datasets/demo.csv ID
To check the outputs of the building project, logs of each round will be saved into a directory named [unique SHA]
inside outputs
. You can also check the summary of building results in install_summary.csv
, including project,sha,module,build_result,java_version
.
To check the results of flakiness repair, each round, a directory named as ID_Results_GPT-4_projects_[Unique SHA]
will be generated inside outputs
:
- you may check instant logs in
ID_Results_GPT-4_projects_[Unique SHA]/[Unique SHA].log
; - you can see a summary of all results in
ID_Results_GPT-4_projects_[Unique SHA]/GPT-4_results_[Unique SHA].csv
or more details inID_Results_GPT-4_projects_[Unique SHA]/GPT-4_test_Details_[Unique SHA].json
. - If any successful patches are generated, they will be saved in
ID_Results_GPT-4_projects_[Unique SHA]/GoodPatches
. Please note that the results may vary when running at multiple times due to the non-determinism of LLMs.
To reproduce the results from scratch, one should run the following commands:
0. Before starting:
- FlakyDoctor works on
Linux
with the following environment:
Python 3.10.12
Java 8 and Java 11
Maven 3.6.3
- Please also prepare an openai key and local checkpoints of Magicoder
1. Set up requirements:
git clone https://github.com/Intelligent-CAT-Lab/FlakyDoctor
cd FlakyDoctor
bash -x src/setup.sh
2. Create a .env
which includes your local path of model Magicoder:
echo "Magicoder_LOAD_PATH=[Your local path of Magicoder checkpoints]" > .env
3. Clone and build all Java projects: To clone and build the projects, one should run the following commands:
bash -x src/install.sh [input_csv] [clone_dir] [output_dir] [save_csv]
input_csv
: Input of ID Java projects you need to set up, each line is in the format ofProject URL, SHA, Module
. More details in datasets.clone_dir
: A directory to clone all the java projects.output_dir
: A directory for outputs and logs when building the projects.save_csv
: A summary of the build results.
For example, one can run:
bash -x src/install.sh datasets/ID_projects.csv projects outputs ID_summary.csv
to build all Java projects for ID tests (~15 hours)bash -x src/install.sh datasets/OD_projects.csv projects outputs OD_summary.csv
to build all Java projects for OD tests (~10 hours)
4. Run FlakyDoctor to fix flaky tests: To fix flaky tests, one should run the following commands:
bash -x src/run_FlakyDoctor.sh [clone_dir] [openai_key] [model] [output_dir] [input_csv] [test_type]
clone_dir
: A directory where all the java projects are cloned.openai_key
: Your openai authentication key.model
:GPT-4
orMagiCoder
output_dir
: A directory to save all the results.input_csv
: An input.csv
file that includes all the flaky tests. More details in datasets.test_type
: The type of flakiness to fix,ID
orOD
.
19 Tests have been accepted (one PR may include fixes for multiple tests):
Accepted PRs:
- funkygao/cp-ddd-framework#65
- apache/pinot#11771
- dropwizard/dropwizard#7629
- opengoofy/hippo4j#1495
- moquette-io/moquette#781
- jnr/jnr-posix#185
- FasterXML/jackson-jakarta-rs-providers#22
- yangfuhai/jboot#117
Opened PRs:
- perwendel/spark#1285
- dyc87112/SpringBoot-Learning#98
- graphhopper/graphhopper#2899
- BroadleafCommerce/BroadleafCommerce#2901
- dianping/cat#2320
- hellokaton/30-seconds-of-java8#8
- AmadeusITGroup/workflow-cps-global-lib-http-plugin#68
- wro4j/wro4j#1167
- kevinsawicki/http-request#177
- apache/flink#23648
We are waiting for developers to approve our requests to create an issue for the following PRs:
Why other tests can not be opened PRs:
Tests are deleted in the latest version of the project:
- org.apache.dubbo.registry.client.metadata.ServiceInstanceMetadataUtilsTest.testMetadataServiceURLParameters
- org.apache.cayenne.CayenneContextClientChannelEventsIT.testSyncToOneRelationship
- org.apache.shardingsphere.elasticjob.cloud.scheduler.env.BootstrapEnvironmentTest.assertWithoutEventTraceRdbConfiguration
- org.apache.shardingsphere.elasticjob.cloud.scheduler.mesos.AppConstraintEvaluatorTest.assertExistExecutorOnS0
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testParametrizedConstructor
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testSequenceListener
- com.willwinder.universalgcodesender.GrblControllerTest.testGetGrblVersion
- com.willwinder.universalgcodesender.GrblControllerTest.testIsReadyToStreamFile
Tests are fixed by developers in the latest version of the project:
- io.elasticjob.lite.lifecycle.internal.settings.JobSettingsAPIImplTest.assertUpdateJobSettings
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testBasicListenerWithUnexpectedMessage
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testConstructor
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testGenericsListener
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testOnMessageWithExpectedMessage
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerOnErrorWithNoSentCommandsShouldSendMessageToConsole
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerWithKnownErrorShouldWriteMessageToConsole
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerWithUnknownErrorShouldWriteGenericMessageToConsole
- com.graphhopper.isochrone.algorithm.IsochroneTest.testSearch
Tests are actually different types of flakiness after inspection:
- com.baidu.jprotobuf.pbrpc.EchoServiceTest.testDynamiceTalkTimeout
Repository is archived:
- io.searchbox.indices.RolloverTest.testBasicUriGeneration
- com.netflix.exhibitor.core.config.zookeeper.TestZookeeperConfigProvider.testConcurrentModification
- org.springframework.security.oauth2.provider.client.JdbcClientDetailsServiceTests.testUpdateClientRedirectURI