Skip to content

bug: output streaming rail chunk formatting improvements #1197

Open
@andompesta

Description

@andompesta

Did you check docs and existing issues?

  • I have read all the NeMo-Guardrails docs
  • I have updated the package to the latest version before submitting this issue
  • (optional) I have used the develop branch
  • I have searched the existing issues of NeMo-Guardrails

Python version (python --version)

python 3.12.3

Operating system/version

Linux 24.04

NeMo-Guardrails version (if you must use a specific version and not the latest

0.13

Describe the bug

When tokens are popped from the buffer to generate a chunk for execute the output rails, an additional space is introduced.
This is a problem when the LLM uses sub-word tokenizers such as Byte Pair Encoding (BPE), WordPiece, and SentencePiece. If a word is composed by multiple tokens, the word will be decomposed in its sub-tokens. For example, on gpt4-mini the word "assisting" is composed by the token ["ass" "isting"] which in the output-rail prompt became "ass isting"; indeed triggering the policy vaiolation.

Steps To Reproduce

async def main():
    config = RailsConfig.from_path()
    rails = LLMRails(config, verbose=True)

    history = [{"role": "user", "content": "what can you do for me ?"}]

    async def stream_chat():
        async for chunk in rails.stream_async(messages=history,):
            print(f"CHUNK: {chunk}")

    await stream_chat()
    rails.explain().print_llm_calls_summary()


if __name__ == "__main__":
    asyncio.run(main())

config

models:

  - type: main
    engine: azure_openai
    model: gpt-4.1-mini
    parameters:
      azure_deployment: gpt-4.1-mini
      api_version: 2024-12-01-preview
      presence_penalty: 0.0
      frequency_penalty: 0.0
      max_tokens: 1024
      streaming: True
      stream_usage: True

passthrough: True
lowest_temperature: 0.

rails:
  input:
    flows:
      - self check input
  output:
    flows:
      - self check output
    streaming:
      chunk_size: 15
      context_size: 10
      stream_first: True
      enabled: True

Expected Behavior

No problem in the execution

1. Task `self_check_input` took 0.51 seconds and used 133 tokens.
2. Task `self_check_output` took 0.47 seconds and used 167 tokens.
3. Task `self_check_output` took 0.46 seconds and used 168 tokens.
4. Task `self_check_output` took 0.49 seconds and used 167 tokens.
5. Task `self_check_output` took 0.46 seconds and used 168 tokens.
6. Task `self_check_output` took 0.49 seconds and used 168 tokens.
7. Task `self_check_output` took 0.48 seconds and used 167 tokens.
8. Task `self_check_output` took 0.54 seconds and used 168 tokens.
9. Task `self_check_output` took 1.18 seconds and used 167 tokens.
10. Task `self_check_output` took 0.49 seconds and used 168 tokens.
11. Task `self_check_output` took 0.70 seconds and used 168 tokens.
12. Task `self_check_output` took 0.48 seconds and used 168 tokens.
13. Task `self_check_output` took 0.50 seconds and used 168 tokens.
14. Task `self_check_output` took 0.51 seconds and used 168 tokens.
15. Task `self_check_output` took 0.47 seconds and used 168 tokens.
16. Task `self_check_output` took 0.50 seconds and used 168 tokens.
17. Task `self_check_output` took 0.47 seconds and used 168 tokens.
18. Task `self_check_output` took 0.45 seconds and used 168 tokens.
19. Task `self_check_output` took 0.47 seconds and used 168 tokens.
20. Task `self_check_output` took 0.50 seconds and used 167 tokens.
21. Task `self_check_output` took 0.54 seconds and used 167 tokens.
22. Task `self_check_output` took 0.51 seconds and used 167 tokens.
23. Task `self_check_output` took 0.61 seconds and used 167 tokens.
24. Task `self_check_output` took 0.46 seconds and used 168 tokens.
25. Task `self_check_output` took 0.80 seconds and used 168 tokens.
26. Task `self_check_output` took 0.61 seconds and used 167 tokens.
27. Task `self_check_output` took 0.44 seconds and used 168 tokens.
28. Task `self_check_output` took 0.45 seconds and used 168 tokens.
29. Task `self_check_output` took 0.48 seconds and used 167 tokens.
30. Task `self_check_output` took 0.49 seconds and used 167 tokens.
31. Task `self_check_output` took 0.51 seconds and used 167 tokens.
32. Task `self_check_output` took 0.47 seconds and used 168 tokens.
33. Task `self_check_output` took 0.51 seconds and used 167 tokens.
34. Task `self_check_output` took 0.48 seconds and used 162 tokens.

Actual Behavior

output rails block the generation

LLM Prompt (86e3a..) - self_check_output
                                                                                                                                                                                                                                                                                                                            
User                                                                                                                                                                                                                                                                                                                        
Your task is to check if the bot message below complies with the company policy.                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                            
Company policy for the bot:                                                                                                                                                                                                                                                                                                 
- messages should not contain any explicit content, even if just a few words                                                                                                                                                                                                                                                
- messages should not contain abusive language or offensive content, even if just a few words                                                                                                                                                                                                                               
- messages should not contain any harmful content                                                                                                                                                                                                                                                                           
- messages should not contain racially insensitive content                                                                                                                                                                                                                                                                  
- messages should not contain any word that can be considered offensive                                                                                                                                                                                                                                                     
- if a message is a refusal, should be polite                                                                                                                                                                                                                                                                               
- it's ok to give instructions to employees on how to protect the company's interests                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                                                            
Bot message: "ing  questions  and  providing  explanations  on  various  topics                                                                                                                                                                                                                                             
 -  Ass isting  with  writing ,"                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                            
Question: Should the message be blocked (Yes or No)? You must respond with `Yes` or `No`.                                                                                                                                                                                                                                   


LLM Completion (86e3a..)
Yes                                                                                                                                                                                                                                                                                                                         

09:12:37.423 | Output Stats None
09:12:37.423 | LLM call took 0.44 seconds
Traceback (most recent call last):
  File "/nemoguardrails/nemoguardrails-configs/src/amdocs/amdocs_input_output_stream_concurrent_run.py", line 142, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/nemoguardrails/nemoguardrails-configs/src/amdocs/amdocs_input_output_stream_concurrent_run.py", line 133, in main
    await generate_with_rails(
  File "/nemoguardrails/nemoguardrails-configs/src/amdocs/amdocs_input_output_stream_concurrent_run.py", line 95, in generate_with_rails
    await consumer_task
  File "/nemoguardrails/nemoguardrails-configs/src/amdocs/amdocs_input_output_stream_concurrent_run.py", line 49, in consume_stream
    raise Exception(data["error"])
Exception: {'message': 'Blocked by self check output rails.', 'type': 'guardrails_violation', 'param': 'self check output', 'code': 'content_blocked'}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstatus: needs triageNew issues that have not yet been reviewed or categorized.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions