Skip to content

Artifact repository for the paper "Neurosymbolic Repair of Test Flakiness", In Proceedings of the 33nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024), Vienna, Austria, September 2024.

Notifications You must be signed in to change notification settings

Intelligent-CAT-Lab/FlakyDoctor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlakyDoctor ACM Artifacts Evaluated - functional v1.1 ACM Artifacts Available v1.1

This repo contains the source code and results of FlakyDoctor, a neuro-symbolic approach to fixing Implementation-Dependent (ID) and Order-Dependent (OD) tests.

🌟 File structures

File structures in this repository are as follows, please refer to README.md in each directory for more details:

  • datasets: Datasets of flaky tests in the evaluation.
  • patches: Successful patches generated.
  • results: Detailed results for successfully fixed flaky tests in the evaluation.
  • src: Source code and scripts to run FlakyDoctor.

🌟 A quick demo to reproduce sample results

This section provides a quick demo using GPT-4 to reproduce sample results in ~40 minutes.

0. Before starting:

  • FlakyDoctor works on Linux with the following environment:
Python 3.10.12
Java 8 and Java 11
Maven 3.6.3
  • The current FlakyDoctor supports GPT-4 and Magicoder. Please prepare an openai key to use GPT-4; if you want to run Magicoder, download its checkpoints into a local path. We use three NVIDIA GeForce RTX 3090 GPUs in our experiments.

1. Set up requirements:

git clone https://github.com/Intelligent-CAT-Lab/FlakyDoctor
cd FlakyDoctor
bash -x src/setup.sh |& tee setup.log

2. Create a .env which includes your local path of model Magicoder (you can skip this step if only running GPT-4):

echo "Magicoder_LOAD_PATH=[Your local path of Magicoder checkpoints]" > .env

3. Run the following commands to fix demo tests with GPT-4: Please put your openai key at the placeholder.

# install Java projects
bash -x src/install.sh datasets/demo_projects.csv projects outputs install_summary.csv 
# fix flay tests 
bash -x src/run_FlakyDoctor.sh projects [openai_key] GPT-4 outputs datasets/demo.csv ID 

To check the outputs of the building project, logs of each round will be saved into a directory named [unique SHA] inside outputs. You can also check the summary of building results in install_summary.csv, including project,sha,module,build_result,java_version.

To check the results of flakiness repair, each round, a directory named as ID_Results_GPT-4_projects_[Unique SHA] will be generated inside outputs:

  • you may check instant logs in ID_Results_GPT-4_projects_[Unique SHA]/[Unique SHA].log;
  • you can see a summary of all results in ID_Results_GPT-4_projects_[Unique SHA]/GPT-4_results_[Unique SHA].csv or more details in ID_Results_GPT-4_projects_[Unique SHA]/GPT-4_test_Details_[Unique SHA].json.
  • If any successful patches are generated, they will be saved in ID_Results_GPT-4_projects_[Unique SHA]/GoodPatches. Please note that the results may vary when running at multiple times due to the non-determinism of LLMs.

🌟 Reproduce the results from scratch

To reproduce the results from scratch, one should run the following commands:

0. Before starting:

  • FlakyDoctor works on Linux with the following environment:
Python 3.10.12
Java 8 and Java 11
Maven 3.6.3

1. Set up requirements:

git clone https://github.com/Intelligent-CAT-Lab/FlakyDoctor
cd FlakyDoctor
bash -x src/setup.sh

2. Create a .env which includes your local path of model Magicoder:

echo "Magicoder_LOAD_PATH=[Your local path of Magicoder checkpoints]" > .env

3. Clone and build all Java projects: To clone and build the projects, one should run the following commands:

bash -x src/install.sh [input_csv] [clone_dir] [output_dir] [save_csv]
  • input_csv: Input of ID Java projects you need to set up, each line is in the format of Project URL, SHA, Module. More details in datasets.
  • clone_dir: A directory to clone all the java projects.
  • output_dir: A directory for outputs and logs when building the projects.
  • save_csv: A summary of the build results.

For example, one can run:

  • bash -x src/install.sh datasets/ID_projects.csv projects outputs ID_summary.csv to build all Java projects for ID tests (~15 hours)
  • bash -x src/install.sh datasets/OD_projects.csv projects outputs OD_summary.csv to build all Java projects for OD tests (~10 hours)

4. Run FlakyDoctor to fix flaky tests: To fix flaky tests, one should run the following commands:

bash -x src/run_FlakyDoctor.sh [clone_dir] [openai_key] [model] [output_dir] [input_csv] [test_type]
  • clone_dir: A directory where all the java projects are cloned.
  • openai_key: Your openai authentication key.
  • model: GPT-4 or MagiCoder
  • output_dir: A directory to save all the results.
  • input_csv: An input .csv file that includes all the flaky tests. More details in datasets.
  • test_type: The type of flakiness to fix, ID or OD.

🌟 Pull requests

19 Tests have been accepted (one PR may include fixes for multiple tests):

Accepted PRs:

Opened PRs:

We are waiting for developers to approve our requests to create an issue for the following PRs:

Why other tests can not be opened PRs:

Tests are deleted in the latest version of the project:
- org.apache.dubbo.registry.client.metadata.ServiceInstanceMetadataUtilsTest.testMetadataServiceURLParameters
- org.apache.cayenne.CayenneContextClientChannelEventsIT.testSyncToOneRelationship
- org.apache.shardingsphere.elasticjob.cloud.scheduler.env.BootstrapEnvironmentTest.assertWithoutEventTraceRdbConfiguration
- org.apache.shardingsphere.elasticjob.cloud.scheduler.mesos.AppConstraintEvaluatorTest.assertExistExecutorOnS0
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testParametrizedConstructor
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testSequenceListener
- com.willwinder.universalgcodesender.GrblControllerTest.testGetGrblVersion
- com.willwinder.universalgcodesender.GrblControllerTest.testIsReadyToStreamFile

Tests are fixed by developers in the latest version of the project:
- io.elasticjob.lite.lifecycle.internal.settings.JobSettingsAPIImplTest.assertUpdateJobSettings
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testBasicListenerWithUnexpectedMessage
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testConstructor
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testGenericsListener
- net.sf.marineapi.ais.event.AbstractAISMessageListenerTest.testOnMessageWithExpectedMessage
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerOnErrorWithNoSentCommandsShouldSendMessageToConsole
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerWithKnownErrorShouldWriteMessageToConsole
- com.willwinder.universalgcodesender.GrblControllerTest.rawResponseHandlerWithUnknownErrorShouldWriteGenericMessageToConsole
- com.graphhopper.isochrone.algorithm.IsochroneTest.testSearch

Tests are actually different types of flakiness after inspection:
- com.baidu.jprotobuf.pbrpc.EchoServiceTest.testDynamiceTalkTimeout

Repository is archived:
- io.searchbox.indices.RolloverTest.testBasicUriGeneration
- com.netflix.exhibitor.core.config.zookeeper.TestZookeeperConfigProvider.testConcurrentModification
- org.springframework.security.oauth2.provider.client.JdbcClientDetailsServiceTests.testUpdateClientRedirectURI

About

Artifact repository for the paper "Neurosymbolic Repair of Test Flakiness", In Proceedings of the 33nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024), Vienna, Austria, September 2024.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published