You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As of #769, our test units have been overhauled. This issue documents the progress to pass all those tests.
Current status:
Total test count: 183
Passed tests: 128
Partial fail: 23
Complete fail: 27
Untested: 5
To run a test:
pip install -r requirements-dev.txt
Then
pytest tests/your_test_module.py
Tests that currently pass will be marked with a ✅, tests that do not pass will be marked with ❌, and untested ones will be left blank (default).
Note
A failed test does not mean that the associated feature does not work. A test may have many items (sometimes hundreds). The number of passed items will be logged for each test. Some feature tests may completely fail, but still work end-to-end.
The following features are known to be currently broken:
Llava-based Vision Model Loading
Out-Of-Tree model registration
General Tests
test_cache_block_hashing
test_config
test_embedded_commit
test_inputs
test_logits_processor
test_regression ❌ (3/4) -- the vram release test fails
test_sampling_params
test_scalartype
test_sequence
test_sharded_state_loader
test_utils
Async Aphrodite
test_api_server_async_aphrodite ❌
test_async_aphrodite
test_chat_template
test_openapi_server_ray ❌ (2/3)
test_request_tracker
Basic Correctness
test_basic_correctness
test_chunked_prefill ❌ (30/36)
test_cpu_offload
test_preemption ❌ (3/5)
Compilation
test_full_graph
Core
test_block_manager
test_chunked_prefill_scheduler
test_scheduler_encoder_decoder
test_scheduler
Distributed
test_basic_distributed_correctness
test_basic_distributed_correctness_enc_dec
test_chunked_prefill_distributed
test_comm_ops
test_custom_all_reduce
test_distributed_oot ❌ (0/1)
test_multimodal_broadcast ❌ (0/6)
test_pipeline_parallel ❌ (1/10)
test_pipeline_partition ❌ (0/1)
test_pp_cudagraph
test_pynccl
test_same_node (run with APHRODITE_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 tests/distributed/test_same_node.py)
test_shm_broadcast
Endpoints
OpenAI
test_audio ❌ (1/4)
test_basic
test_chat ❌ (25/33)
test_completion ❌ (77/112)
test_embedding
test_encoder_decoder
test_guided_processors
test_metrics
test_models
test_mp_api_server (takes too long to run, investigate)
test_oot_registeration ❌ (0/1)
test_return_tokens_as_ids
test_run_batch
test_serving_chat
test_shutdown ❌ (0/1)
test_tokenization
test_vision ❌ (0/16) -- seems to be issues with fetching the images
LLM
test_encode
test_generate_multiple_loras ❌ (0/1)
test_generate
test_guided_generate
Engine
test_args
test_computed_prefix_block
test_custom_executor
test_detokenization
test_multiproc_workers
test_skip_tokenizer_init
test_stop_reason ❌ (0/1)
test_stop_string
Output Processor
test_multi_step
test_stop_checker
Kernels
test_activation
test_attention_selector ❌ (0/22)
test_attention
test_blocksparse_attention
test_cache
test_cutlass
test_encoder_decoder_attn
test_flash_attn
test_flashinfer
test_fp8_quant
test_int8_quant
test_layernorm
test_marlin_gemm
test_moe
test_pos_encoding
test_prefix_prefill
test_rand
test_sampler ❌ (16/197) -- triton sampler, unused
LoRA
test_baichuan ❌ (0/3) -- issues with loading the lora config file
test_chatglm3 ❌ (0/3)
test_gemma ❌ (0/3)
test_layers
test_llama
test_long_context
test_lora_checkpoints
test_lora_huggingface
test_lora_manager
test_mixtral
test_phi
test_punica_sizes
test_punica_variation
test_quant_model
test_tokenizer_group ❌ (2/3) -- issues with the lora config file
test_utils
test_worker
Metrics
test_metrics ❌ (11/16)
Modeling
weight_utils
Models
test_aqlm ❌ (0/1)
test_bart ❌ (8/12)
test_big_models ❌ (2/4)
test_blip2
test_chameleon
test_danube3_4b
test_embedding
test_fp8 ❌ (0/4)
test_fuyu ❌ (0/4)
test_gguf ❌ (2/4)
test_gptq_marlin_24 ❌ (2/4)
test_gptq_marlin
test_internvl ❌ (0/8)
test_jamba
test_llava_image_embeds ❌ (0/3)
test_llava_next ❌ (0/4)
test_llava ❌ (0/4)
test_marlin
test_minicpmv ❌ (0/8)
test_mistral
test_models
test_oot_registration ❌ (1/2)
test_paligemma ❌ (0/8)
test_phi3v
test_qwen ❌ (0/1) -- qwen-vl
test_registry
Multimodal
test_mapper
test_utils ❌ (29/32) -- image fetch failure from url
Prefix Caching
test_disable_sliding_window
test_prefix_caching
Prompt Adapter
test_bloom
test_multi_adapter_inference
test_pa_lora
Quantization
test_bitsandbytes
test_compressed_tensors
test_configs
test_cpu_offload ❌ (3/4)
test_experts_int8
test_fp8 ❌ (12/14)
test_lm_head
Samplers
test_beam_search ❌ (0/1)
test_ignore_eos
test_logits_processor
test_logprobs ❌ (0/25) -- triton compile issues with chunked prefill
test_ranks
test_rejection_sampling
test_sampler
test_seeded_generate
test_typical_acceptance_sampler
Spec Decode
test_batch_expansion
test_dynamic_spec_decode
test_metrics
test_multi_step_worker ❌ (22/24)
test_ngram_worker
test_spec_decode_worker
test_utils
End-to-end spec decode tests
test_compatibilty
test_integration_dist_tp2
test_integration_dist_tp4
test_integration
test_logprobs
test_medusa_correctness ❌ (0/10) -- seem to be an issue with getting head_size for medusa model
test_mlp_correctness ❌ (0/13)
test_multistep_correctness ❌ (33/35)
test_ngram_correctness
test_seed
Tensorizer Loader
test_tensorizer ❌ (8/9)
Tokenization
test_cached_tokenizer
test_detokenize ❌ (212/215)
test_get_eos
test_tokenizer_group
test_tokenizer
Weight Loading
test_weight_loading
Worker
test_model_runner
test_encoder_decoder_model_runner
test_model_input
test_swap
The text was updated successfully, but these errors were encountered:
As of #769, our test units have been overhauled. This issue documents the progress to pass all those tests.
Current status:
To run a test:
Then
Tests that currently pass will be marked with a ✅, tests that do not pass will be marked with ❌, and untested ones will be left blank (default).
Note
A failed test does not mean that the associated feature does not work. A test may have many items (sometimes hundreds). The number of passed items will be logged for each test. Some feature tests may completely fail, but still work end-to-end.
The following features are known to be currently broken:
General Tests
Async Aphrodite
Basic Correctness
Compilation
Core
Distributed
APHRODITE_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 tests/distributed/test_same_node.py
)Endpoints
OpenAI
LLM
Engine
Output Processor
Kernels
LoRA
Metrics
Modeling
Models
Multimodal
Prefix Caching
Prompt Adapter
Quantization
Samplers
Spec Decode
End-to-end spec decode tests
Tensorizer Loader
Tokenization
Weight Loading
Worker
The text was updated successfully, but these errors were encountered: