feat(qwen): enhance agent configurations with thinking control parameters

Updated the Qwen agent configuration to include `extra_body` parameters for thinking control across various models. Added `enable_thinking` and `preserve_thinking` options for reasoning agents, while utility agents have `enable_thinking` set to false. Adjusted the Qwen client initialization to support these new configurations. Updated test report to reflect changes in success rates and latencies.
This commit is contained in:
Dmitry Ng
2026-05-29 00:06:47 +03:00
parent 643ff7d218
commit 83c263e98e
3 changed files with 374 additions and 313 deletions
+50
View File
@@ -1,8 +1,32 @@
# Qwen (Alibaba Cloud DashScope) agent configuration.
#
# Strategy: qwen3.7-max (flagship) for critical reasoning, qwen3.6-plus (mid-tier
# multimodal Plus) for orchestration, qwen3.5-flash for cheap utility, qwen3-coder-*
# for code-specialized tasks (installer/coder).
#
# CRITICAL thinking control via extra_body (DashScope-specific, not OpenAI standard):
# - enable_thinking=false: utility agents (simple, simple_json, reflector, searcher,
# enricher) MUST set this explicitly. Qwen3.5/3.6/3.7 hybrid models have thinking
# ENABLED by default. Without disabling, reasoning_content leaks into content and
# corrupts short deterministic outputs (e.g. docker image selector returning the
# full chain-of-thought instead of just "vxcontrol/kali-linux").
# - enable_thinking=true: reasoning agents (primary, assistant, generator, refiner,
# adviser, pentester) explicitly enable thinking. Redundant with hybrid defaults
# but defensive against future provider changes.
# - preserve_thinking=true: keeps reasoning_content from previous assistant turns
# in subsequent requests. Supported ONLY by qwen3.7-max and qwen3.6-plus families.
# Required for agent loops with tool calls to preserve reasoning continuity.
# Works together with WithPreserveReasoningContent() in qwen.go.
# - qwen3-coder-* (coder, installer) are NOT hybrid thinking models — no thinking
# control needed.
simple:
model: "qwen3.5-flash"
temperature: 0.6
n: 1
max_tokens: 8192
extra_body:
enable_thinking: false
price:
input: 0.1
output: 0.4
@@ -14,6 +38,8 @@ simple_json:
n: 1
max_tokens: 4096
json: true
extra_body:
enable_thinking: false
price:
input: 0.1
output: 0.4
@@ -24,6 +50,9 @@ primary_agent:
temperature: 1.0
n: 1
max_tokens: 16384
extra_body:
enable_thinking: true
preserve_thinking: true
price:
input: 0.5
output: 3.0
@@ -34,6 +63,9 @@ assistant:
temperature: 1.0
n: 1
max_tokens: 16384
extra_body:
enable_thinking: true
preserve_thinking: true
price:
input: 0.5
output: 3.0
@@ -44,6 +76,9 @@ generator:
temperature: 1.0
n: 1
max_tokens: 32768
extra_body:
enable_thinking: true
preserve_thinking: true
price:
input: 2.5
output: 7.5
@@ -54,6 +89,9 @@ refiner:
temperature: 1.0
n: 1
max_tokens: 20480
extra_body:
enable_thinking: true
preserve_thinking: true
price:
input: 2.5
output: 7.5
@@ -64,6 +102,9 @@ adviser:
temperature: 1.0
n: 1
max_tokens: 8192
extra_body:
enable_thinking: true
preserve_thinking: true
price:
input: 2.5
output: 7.5
@@ -74,6 +115,8 @@ reflector:
temperature: 0.7
n: 1
max_tokens: 4096
extra_body:
enable_thinking: false
price:
input: 0.1
output: 0.4
@@ -84,6 +127,8 @@ searcher:
temperature: 0.7
n: 1
max_tokens: 4096
extra_body:
enable_thinking: false
price:
input: 0.1
output: 0.4
@@ -94,6 +139,8 @@ enricher:
temperature: 0.7
n: 1
max_tokens: 4096
extra_body:
enable_thinking: false
price:
input: 0.1
output: 0.4
@@ -124,6 +171,9 @@ pentester:
temperature: 0.8
n: 1
max_tokens: 16384
extra_body:
enable_thinking: true
preserve_thinking: true
price:
input: 0.5
output: 3.0
+11
View File
@@ -83,11 +83,22 @@ func New(
return nil, err
}
// Alibaba Cloud DashScope OpenAI-compatible API. Thinking is controlled via
// extra_body.enable_thinking (true/false) in the per-agent config — this is a
// DashScope-specific parameter, not OpenAI standard. Qwen3.5/3.6/3.7 hybrid
// models have thinking ENABLED by default, so utility agents must explicitly
// set enable_thinking=false or reasoning_content will be returned inline as
// part of content (corrupting outputs that expect short deterministic answers
// like docker image selection or descriptors).
// WithPreserveReasoningContent() is required for multi-turn with tool calls
// when preserve_thinking=true is set in extra_body for qwen3.7-max/qwen3.6-plus
// (other Qwen3 models do not support preserve_thinking).
client, err := openai.New(
openai.WithToken(cfg.QwenAPIKey),
openai.WithModel(QwenAgentModel),
openai.WithBaseURL(cfg.QwenServerURL),
openai.WithHTTPClient(httpClient),
openai.WithPreserveReasoningContent(),
)
if err != nil {
return nil, err
+313 -313
View File
@@ -1,27 +1,27 @@
# LLM Agent Testing Report
Generated: Wed, 27 May 2026 23:05:00 UTC
Generated: Thu, 28 May 2026 16:06:51 UTC
## Overall Results
| Agent | Model | Reasoning | Success Rate | Average Latency |
|-------|-------|-----------|--------------|-----------------|
| simple | qwen3.5-flash | true | 23/23 (100.00%) | 3.014s |
| simple_json | qwen3.5-flash | true | 5/5 (100.00%) | 7.088s |
| primary_agent | qwen3.6-plus | true | 23/23 (100.00%) | 5.366s |
| assistant | qwen3.6-plus | true | 23/23 (100.00%) | 5.758s |
| generator | qwen3.7-max | true | 23/23 (100.00%) | 3.473s |
| refiner | qwen3.7-max | true | 23/23 (100.00%) | 3.352s |
| adviser | qwen3.7-max | true | 22/23 (95.65%) | 2.941s |
| reflector | qwen3.5-flash | true | 23/23 (100.00%) | 3.377s |
| searcher | qwen3.5-flash | true | 23/23 (100.00%) | 4.025s |
| enricher | qwen3.5-flash | true | 23/23 (100.00%) | 2.857s |
| coder | qwen3-coder-plus | true | 23/23 (100.00%) | 1.556s |
| installer | qwen3-coder-flash | true | 20/23 (86.96%) | 1.060s |
| pentester | qwen3.6-plus | true | 23/23 (100.00%) | 5.504s |
| simple | qwen3.5-flash | false | 23/23 (100.00%) | 1.194s |
| simple_json | qwen3.5-flash | false | 4/5 (80.00%) | 0.945s |
| primary_agent | qwen3.6-plus | true | 23/23 (100.00%) | 6.079s |
| assistant | qwen3.6-plus | true | 23/23 (100.00%) | 5.512s |
| generator | qwen3.7-max | true | 23/23 (100.00%) | 5.172s |
| refiner | qwen3.7-max | true | 23/23 (100.00%) | 5.455s |
| adviser | qwen3.7-max | true | 23/23 (100.00%) | 4.750s |
| reflector | qwen3.5-flash | true | 23/23 (100.00%) | 1.222s |
| searcher | qwen3.5-flash | true | 23/23 (100.00%) | 1.083s |
| enricher | qwen3.5-flash | true | 23/23 (100.00%) | 0.972s |
| coder | qwen3-coder-plus | true | 23/23 (100.00%) | 1.702s |
| installer | qwen3-coder-flash | true | 23/23 (100.00%) | 1.124s |
| pentester | qwen3.6-plus | true | 23/23 (100.00%) | 9.207s |
**Total**: 277/281 (98.58%) successful tests
**Overall average latency**: 3.587s
**Total**: 280/281 (99.64%) successful tests
**Overall average latency**: 3.575s
## Detailed Results
@@ -31,38 +31,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 2.746s | |
| Text Transform Uppercase | ✅ Pass | 2.043s | |
| Count from 1 to 5 | ✅ Pass | 3.729s | |
| Math Calculation | ✅ Pass | 1.686s | |
| Basic Echo Function | ✅ Pass | 1.282s | |
| Streaming Simple Math Streaming | ✅ Pass | 1.606s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 2.200s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 0.928s | |
| Simple Math | ✅ Pass | 1.515s | |
| Text Transform Uppercase | ✅ Pass | 1.075s | |
| Count from 1 to 5 | ✅ Pass | 1.066s | |
| Math Calculation | ✅ Pass | 1.122s | |
| Basic Echo Function | ✅ Pass | 1.252s | |
| Streaming Simple Math Streaming | ✅ Pass | 0.603s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 0.569s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 0.800s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 1.163s | |
| Search Query Function | ✅ Pass | 1.264s | |
| Ask Advice Function | ✅ Pass | 1.232s | |
| Streaming Search Query Function Streaming | ✅ Pass | 0.884s | |
| Basic Context Memory Test | ✅ Pass | 2.803s | |
| Function Argument Memory Test | ✅ Pass | 0.750s | |
| Function Response Memory Test | ✅ Pass | 0.951s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 1.953s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 1.255s | |
| Penetration Testing Methodology | ✅ Pass | 7.005s | |
| Vulnerability Assessment Tools | ✅ Pass | 19.557s | |
| SQL Injection Attack Type | ✅ Pass | 2.703s | |
| Penetration Testing Framework | ✅ Pass | 5.521s | |
| Web Application Security Scanner | ✅ Pass | 4.520s | |
| Penetration Testing Tool Selection | ✅ Pass | 1.533s | |
| JSON Response Function | ✅ Pass | 1.795s | |
| Search Query Function | ✅ Pass | 1.148s | |
| Ask Advice Function | ✅ Pass | 1.347s | |
| Streaming Search Query Function Streaming | ✅ Pass | 0.834s | |
| Basic Context Memory Test | ✅ Pass | 0.655s | |
| Function Argument Memory Test | ✅ Pass | 1.019s | |
| Function Response Memory Test | ✅ Pass | 1.023s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 1.142s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 0.996s | |
| Penetration Testing Methodology | ✅ Pass | 2.497s | |
| Vulnerability Assessment Tools | ✅ Pass | 1.849s | |
| SQL Injection Attack Type | ✅ Pass | 0.552s | |
| Penetration Testing Framework | ✅ Pass | 1.971s | |
| Web Application Security Scanner | ✅ Pass | 1.461s | |
| Penetration Testing Tool Selection | ✅ Pass | 1.152s | |
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 3.014s
**Average latency**: 1.194s
---
@@ -72,15 +72,15 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Person Information JSON | ✅ Pass | 4.643s | |
| User Profile JSON | ✅ Pass | 5.759s | |
| Streaming Person Information JSON Streaming | ✅ Pass | 6.514s | |
| Project Information JSON | ✅ Pass | 7.046s | |
| Vulnerability Report Memory Test | ✅ Pass | 11.475s | |
| Project Information JSON | ✅ Pass | 0.751s | |
| User Profile JSON | ✅ Pass | 0.767s | |
| Person Information JSON | ✅ Pass | 1.137s | |
| Vulnerability Report Memory Test | ❌ Fail | 1.318s | got map\[string\]interface \{\}\{"$schema":"http://json\-schema\.org/draft\-07/schema\#", "properties":map\[string\]interface \{\}\{"open\_ports":\... |
| Streaming Person Information JSON Streaming | ✅ Pass | 0.747s | |
**Summary**: 5/5 (100.00%) successful tests
**Summary**: 4/5 (80.00%) successful tests
**Average latency**: 7.088s
**Average latency**: 0.945s
---
@@ -90,38 +90,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 5.300s | |
| Text Transform Uppercase | ✅ Pass | 3.870s | |
| Math Calculation | ✅ Pass | 2.999s | |
| Count from 1 to 5 | ✅ Pass | 12.409s | |
| Basic Echo Function | ✅ Pass | 3.121s | |
| Streaming Simple Math Streaming | ✅ Pass | 4.737s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 4.527s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 4.412s | |
| Simple Math | ✅ Pass | 5.447s | |
| Text Transform Uppercase | ✅ Pass | 5.747s | |
| Count from 1 to 5 | ✅ Pass | 6.848s | |
| Math Calculation | ✅ Pass | 3.912s | |
| Basic Echo Function | ✅ Pass | 2.791s | |
| Streaming Simple Math Streaming | ✅ Pass | 4.355s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 5.205s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 2.995s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 2.506s | |
| Search Query Function | ✅ Pass | 2.580s | |
| Ask Advice Function | ✅ Pass | 3.013s | |
| Streaming Search Query Function Streaming | ✅ Pass | 2.157s | |
| Basic Context Memory Test | ✅ Pass | 4.461s | |
| Function Argument Memory Test | ✅ Pass | 4.159s | |
| Function Response Memory Test | ✅ Pass | 6.988s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 5.212s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 3.572s | |
| Penetration Testing Methodology | ✅ Pass | 11.000s | |
| Vulnerability Assessment Tools | ✅ Pass | 11.996s | |
| SQL Injection Attack Type | ✅ Pass | 5.396s | |
| Penetration Testing Framework | ✅ Pass | 11.164s | |
| Web Application Security Scanner | ✅ Pass | 4.543s | |
| Penetration Testing Tool Selection | ✅ Pass | 3.287s | |
| JSON Response Function | ✅ Pass | 4.066s | |
| Search Query Function | ✅ Pass | 2.232s | |
| Ask Advice Function | ✅ Pass | 2.860s | |
| Streaming Search Query Function Streaming | ✅ Pass | 2.094s | |
| Basic Context Memory Test | ✅ Pass | 4.240s | |
| Function Argument Memory Test | ✅ Pass | 2.635s | |
| Function Response Memory Test | ✅ Pass | 9.775s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 7.318s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 4.524s | |
| Penetration Testing Methodology | ✅ Pass | 13.685s | |
| Vulnerability Assessment Tools | ✅ Pass | 12.372s | |
| SQL Injection Attack Type | ✅ Pass | 11.677s | |
| Penetration Testing Framework | ✅ Pass | 12.500s | |
| Web Application Security Scanner | ✅ Pass | 9.014s | |
| Penetration Testing Tool Selection | ✅ Pass | 3.509s | |
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 5.366s
**Average latency**: 6.079s
---
@@ -131,38 +131,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 5.339s | |
| Text Transform Uppercase | ✅ Pass | 5.104s | |
| Count from 1 to 5 | ✅ Pass | 5.229s | |
| Math Calculation | ✅ Pass | 4.509s | |
| Basic Echo Function | ✅ Pass | 3.068s | |
| Streaming Simple Math Streaming | ✅ Pass | 4.267s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 4.656s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 2.860s | |
| Simple Math | ✅ Pass | 4.896s | |
| Text Transform Uppercase | ✅ Pass | 4.750s | |
| Count from 1 to 5 | ✅ Pass | 5.467s | |
| Math Calculation | ✅ Pass | 3.192s | |
| Basic Echo Function | ✅ Pass | 2.627s | |
| Streaming Simple Math Streaming | ✅ Pass | 3.657s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 4.401s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 5.018s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 3.200s | |
| Search Query Function | ✅ Pass | 1.780s | |
| Ask Advice Function | ✅ Pass | 2.841s | |
| Streaming Search Query Function Streaming | ✅ Pass | 2.020s | |
| Basic Context Memory Test | ✅ Pass | 5.116s | |
| Function Argument Memory Test | ✅ Pass | 3.908s | |
| Function Response Memory Test | ✅ Pass | 4.534s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 8.970s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 7.041s | |
| Penetration Testing Methodology | ✅ Pass | 11.914s | |
| Vulnerability Assessment Tools | ✅ Pass | 17.492s | |
| SQL Injection Attack Type | ✅ Pass | 5.510s | |
| Penetration Testing Framework | ✅ Pass | 9.591s | |
| Penetration Testing Tool Selection | ✅ Pass | 2.796s | |
| Web Application Security Scanner | ✅ Pass | 10.669s | |
| JSON Response Function | ✅ Pass | 2.385s | |
| Search Query Function | ✅ Pass | 2.864s | |
| Ask Advice Function | ✅ Pass | 3.599s | |
| Streaming Search Query Function Streaming | ✅ Pass | 1.866s | |
| Basic Context Memory Test | ✅ Pass | 4.909s | |
| Function Argument Memory Test | ✅ Pass | 3.381s | |
| Function Response Memory Test | ✅ Pass | 4.146s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 5.778s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 5.842s | |
| Penetration Testing Methodology | ✅ Pass | 9.568s | |
| SQL Injection Attack Type | ✅ Pass | 6.860s | |
| Vulnerability Assessment Tools | ✅ Pass | 18.097s | |
| Penetration Testing Framework | ✅ Pass | 10.065s | |
| Web Application Security Scanner | ✅ Pass | 9.724s | |
| Penetration Testing Tool Selection | ✅ Pass | 3.665s | |
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 5.758s
**Average latency**: 5.512s
---
@@ -172,38 +172,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 3.289s | |
| Text Transform Uppercase | ✅ Pass | 2.103s | |
| Count from 1 to 5 | ✅ Pass | 4.641s | |
| Math Calculation | ✅ Pass | 1.319s | |
| Basic Echo Function | ✅ Pass | 1.852s | |
| Streaming Simple Math Streaming | ✅ Pass | 1.998s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 3.591s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 1.928s | |
| Simple Math | ✅ Pass | 5.569s | |
| Text Transform Uppercase | ✅ Pass | 4.279s | |
| Math Calculation | ✅ Pass | 2.508s | |
| Basic Echo Function | ✅ Pass | 2.450s | |
| Count from 1 to 5 | ✅ Pass | 13.152s | |
| Streaming Simple Math Streaming | ✅ Pass | 3.691s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 4.566s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 2.392s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 1.388s | |
| Search Query Function | ✅ Pass | 2.131s | |
| Ask Advice Function | ✅ Pass | 1.544s | |
| Streaming Search Query Function Streaming | ✅ Pass | 1.977s | |
| Basic Context Memory Test | ✅ Pass | 4.750s | |
| Function Argument Memory Test | ✅ Pass | 3.913s | |
| Function Response Memory Test | ✅ Pass | 4.594s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 3.141s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 3.417s | |
| Penetration Testing Methodology | ✅ Pass | 3.566s | |
| Vulnerability Assessment Tools | ✅ Pass | 15.497s | |
| SQL Injection Attack Type | ✅ Pass | 4.163s | |
| Penetration Testing Framework | ✅ Pass | 2.677s | |
| Web Application Security Scanner | ✅ Pass | 4.448s | |
| Penetration Testing Tool Selection | ✅ Pass | 1.928s | |
| JSON Response Function | ✅ Pass | 2.320s | |
| Search Query Function | ✅ Pass | 3.027s | |
| Ask Advice Function | ✅ Pass | 4.213s | |
| Streaming Search Query Function Streaming | ✅ Pass | 2.036s | |
| Basic Context Memory Test | ✅ Pass | 7.051s | |
| Function Argument Memory Test | ✅ Pass | 3.966s | |
| Function Response Memory Test | ✅ Pass | 7.543s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 3.481s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 4.203s | |
| Penetration Testing Methodology | ✅ Pass | 6.768s | |
| SQL Injection Attack Type | ✅ Pass | 4.673s | |
| Vulnerability Assessment Tools | ✅ Pass | 18.743s | |
| Penetration Testing Framework | ✅ Pass | 5.278s | |
| Web Application Security Scanner | ✅ Pass | 3.018s | |
| Penetration Testing Tool Selection | ✅ Pass | 4.024s | |
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 3.473s
**Average latency**: 5.172s
---
@@ -213,38 +213,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 4.088s | |
| Text Transform Uppercase | ✅ Pass | 3.518s | |
| Count from 1 to 5 | ✅ Pass | 4.139s | |
| Math Calculation | ✅ Pass | 1.941s | |
| Basic Echo Function | ✅ Pass | 1.601s | |
| Streaming Simple Math Streaming | ✅ Pass | 2.624s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 3.881s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 2.028s | |
| Simple Math | ✅ Pass | 4.724s | |
| Text Transform Uppercase | ✅ Pass | 4.732s | |
| Count from 1 to 5 | ✅ Pass | 8.230s | |
| Math Calculation | ✅ Pass | 2.684s | |
| Basic Echo Function | ✅ Pass | 2.505s | |
| Streaming Simple Math Streaming | ✅ Pass | 5.229s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 6.093s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 1.982s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 2.901s | |
| Search Query Function | ✅ Pass | 2.251s | |
| Ask Advice Function | ✅ Pass | 1.898s | |
| Streaming Search Query Function Streaming | ✅ Pass | 2.260s | |
| Basic Context Memory Test | ✅ Pass | 2.202s | |
| Function Argument Memory Test | ✅ Pass | 2.319s | |
| Function Response Memory Test | ✅ Pass | 4.750s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 2.590s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 3.847s | |
| Penetration Testing Methodology | ✅ Pass | 4.130s | |
| Vulnerability Assessment Tools | ✅ Pass | 12.315s | |
| SQL Injection Attack Type | ✅ Pass | 4.225s | |
| Penetration Testing Framework | ✅ Pass | 2.234s | |
| Web Application Security Scanner | ✅ Pass | 2.434s | |
| Penetration Testing Tool Selection | ✅ Pass | 2.917s | |
| JSON Response Function | ✅ Pass | 2.442s | |
| Search Query Function | ✅ Pass | 2.913s | |
| Ask Advice Function | ✅ Pass | 3.594s | |
| Streaming Search Query Function Streaming | ✅ Pass | 3.382s | |
| Basic Context Memory Test | ✅ Pass | 4.152s | |
| Function Argument Memory Test | ✅ Pass | 3.718s | |
| Function Response Memory Test | ✅ Pass | 5.162s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 4.590s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 4.049s | |
| Penetration Testing Methodology | ✅ Pass | 5.504s | |
| SQL Injection Attack Type | ✅ Pass | 5.041s | |
| Vulnerability Assessment Tools | ✅ Pass | 20.343s | |
| Penetration Testing Framework | ✅ Pass | 10.747s | |
| Web Application Security Scanner | ✅ Pass | 10.091s | |
| Penetration Testing Tool Selection | ✅ Pass | 3.551s | |
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 3.352s
**Average latency**: 5.455s
---
@@ -254,38 +254,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 3.094s | |
| Text Transform Uppercase | ✅ Pass | 3.196s | |
| Count from 1 to 5 | ✅ Pass | 3.210s | |
| Math Calculation | ✅ Pass | 2.542s | |
| Basic Echo Function | ✅ Pass | 1.431s | |
| Streaming Simple Math Streaming | ✅ Pass | 2.476s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 2.746s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 1.526s | |
| Simple Math | ✅ Pass | 3.761s | |
| Text Transform Uppercase | ✅ Pass | 6.663s | |
| Count from 1 to 5 | ✅ Pass | 8.046s | |
| Math Calculation | ✅ Pass | 2.987s | |
| Basic Echo Function | ✅ Pass | 2.306s | |
| Streaming Simple Math Streaming | ✅ Pass | 3.074s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 2.033s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 7.517s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 2.530s | |
| Search Query Function | ✅ Pass | 1.782s | |
| Ask Advice Function | ✅ Pass | 1.862s | |
| Streaming Search Query Function Streaming | ✅ Pass | 1.915s | |
| Basic Context Memory Test | ✅ Pass | 3.090s | |
| Function Argument Memory Test | ❌ Fail | 3.063s | expected text 'Go programming language' not found |
| Function Response Memory Test | ✅ Pass | 4.037s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 2.505s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 3.618s | |
| Penetration Testing Methodology | ✅ Pass | 2.493s | |
| Vulnerability Assessment Tools | ✅ Pass | 6.921s | |
| SQL Injection Attack Type | ✅ Pass | 3.418s | |
| Penetration Testing Framework | ✅ Pass | 4.994s | |
| Web Application Security Scanner | ✅ Pass | 1.890s | |
| Penetration Testing Tool Selection | ✅ Pass | 3.285s | |
| JSON Response Function | ✅ Pass | 1.833s | |
| Search Query Function | ✅ Pass | 2.622s | |
| Ask Advice Function | ✅ Pass | 3.735s | |
| Streaming Search Query Function Streaming | ✅ Pass | 1.859s | |
| Basic Context Memory Test | ✅ Pass | 4.274s | |
| Function Argument Memory Test | ✅ Pass | 5.488s | |
| Function Response Memory Test | ✅ Pass | 7.424s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 2.711s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 4.780s | |
| Penetration Testing Methodology | ✅ Pass | 6.058s | |
| Vulnerability Assessment Tools | ✅ Pass | 13.545s | |
| SQL Injection Attack Type | ✅ Pass | 3.451s | |
| Penetration Testing Framework | ✅ Pass | 5.348s | |
| Web Application Security Scanner | ✅ Pass | 6.648s | |
| Penetration Testing Tool Selection | ✅ Pass | 3.077s | |
**Summary**: 22/23 (95.65%) successful tests
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 2.941s
**Average latency**: 4.750s
---
@@ -295,38 +295,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 3.446s | |
| Text Transform Uppercase | ✅ Pass | 2.472s | |
| Count from 1 to 5 | ✅ Pass | 3.109s | |
| Math Calculation | ✅ Pass | 2.244s | |
| Basic Echo Function | ✅ Pass | 1.191s | |
| Streaming Simple Math Streaming | ✅ Pass | 2.377s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 2.817s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 0.952s | |
| Simple Math | ✅ Pass | 1.664s | |
| Text Transform Uppercase | ✅ Pass | 0.999s | |
| Count from 1 to 5 | ✅ Pass | 0.706s | |
| Math Calculation | ✅ Pass | 0.808s | |
| Basic Echo Function | ✅ Pass | 1.256s | |
| Streaming Simple Math Streaming | ✅ Pass | 0.597s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 0.636s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 1.226s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 1.376s | |
| Search Query Function | ✅ Pass | 1.034s | |
| Ask Advice Function | ✅ Pass | 1.177s | |
| Streaming Search Query Function Streaming | ✅ Pass | 1.399s | |
| Basic Context Memory Test | ✅ Pass | 2.150s | |
| Function Argument Memory Test | ✅ Pass | 0.884s | |
| Function Response Memory Test | ✅ Pass | 0.900s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 2.927s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 1.201s | |
| Penetration Testing Methodology | ✅ Pass | 8.294s | |
| SQL Injection Attack Type | ✅ Pass | 2.431s | |
| Vulnerability Assessment Tools | ✅ Pass | 24.680s | |
| Penetration Testing Framework | ✅ Pass | 4.850s | |
| Web Application Security Scanner | ✅ Pass | 4.195s | |
| Penetration Testing Tool Selection | ✅ Pass | 1.546s | |
| JSON Response Function | ✅ Pass | 0.959s | |
| Search Query Function | ✅ Pass | 1.138s | |
| Ask Advice Function | ✅ Pass | 1.422s | |
| Streaming Search Query Function Streaming | ✅ Pass | 1.120s | |
| Basic Context Memory Test | ✅ Pass | 0.691s | |
| Function Argument Memory Test | ✅ Pass | 0.528s | |
| Function Response Memory Test | ✅ Pass | 0.553s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 1.283s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 1.122s | |
| Penetration Testing Methodology | ✅ Pass | 2.639s | |
| Vulnerability Assessment Tools | ✅ Pass | 2.019s | |
| SQL Injection Attack Type | ✅ Pass | 1.036s | |
| Penetration Testing Framework | ✅ Pass | 1.655s | |
| Web Application Security Scanner | ✅ Pass | 3.147s | |
| Penetration Testing Tool Selection | ✅ Pass | 0.902s | |
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 3.377s
**Average latency**: 1.222s
---
@@ -336,38 +336,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 3.959s | |
| Text Transform Uppercase | ✅ Pass | 1.535s | |
| Count from 1 to 5 | ✅ Pass | 3.085s | |
| Math Calculation | ✅ Pass | 1.883s | |
| Basic Echo Function | ✅ Pass | 1.491s | |
| Streaming Simple Math Streaming | ✅ Pass | 1.364s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 1.694s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 1.338s | |
| Simple Math | ✅ Pass | 1.541s | |
| Text Transform Uppercase | ✅ Pass | 0.541s | |
| Count from 1 to 5 | ✅ Pass | 0.660s | |
| Math Calculation | ✅ Pass | 0.556s | |
| Basic Echo Function | ✅ Pass | 0.701s | |
| Streaming Simple Math Streaming | ✅ Pass | 0.537s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 1.012s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 0.886s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 1.277s | |
| Search Query Function | ✅ Pass | 1.249s | |
| Ask Advice Function | ✅ Pass | 1.653s | |
| Streaming Search Query Function Streaming | ✅ Pass | 0.925s | |
| Basic Context Memory Test | ✅ Pass | 3.426s | |
| Function Argument Memory Test | ✅ Pass | 1.552s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 2.090s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 1.792s | |
| Function Response Memory Test | ✅ Pass | 22.405s | |
| Penetration Testing Methodology | ✅ Pass | 6.872s | |
| SQL Injection Attack Type | ✅ Pass | 2.512s | |
| Vulnerability Assessment Tools | ✅ Pass | 19.206s | |
| Penetration Testing Framework | ✅ Pass | 4.480s | |
| Web Application Security Scanner | ✅ Pass | 4.976s | |
| Penetration Testing Tool Selection | ✅ Pass | 1.793s | |
| JSON Response Function | ✅ Pass | 0.744s | |
| Search Query Function | ✅ Pass | 1.350s | |
| Ask Advice Function | ✅ Pass | 1.986s | |
| Streaming Search Query Function Streaming | ✅ Pass | 0.726s | |
| Basic Context Memory Test | ✅ Pass | 1.089s | |
| Function Argument Memory Test | ✅ Pass | 0.558s | |
| Function Response Memory Test | ✅ Pass | 0.620s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 1.284s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 0.607s | |
| Penetration Testing Methodology | ✅ Pass | 2.167s | |
| Vulnerability Assessment Tools | ✅ Pass | 1.673s | |
| SQL Injection Attack Type | ✅ Pass | 1.089s | |
| Penetration Testing Framework | ✅ Pass | 1.859s | |
| Web Application Security Scanner | ✅ Pass | 1.590s | |
| Penetration Testing Tool Selection | ✅ Pass | 1.127s | |
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 4.025s
**Average latency**: 1.083s
---
@@ -377,38 +377,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 1.626s | |
| Text Transform Uppercase | ✅ Pass | 2.623s | |
| Count from 1 to 5 | ✅ Pass | 2.794s | |
| Math Calculation | ✅ Pass | 1.657s | |
| Basic Echo Function | ✅ Pass | 1.406s | |
| Streaming Simple Math Streaming | ✅ Pass | 2.086s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 2.390s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 1.279s | |
| Simple Math | ✅ Pass | 0.582s | |
| Text Transform Uppercase | ✅ Pass | 1.107s | |
| Count from 1 to 5 | ✅ Pass | 0.693s | |
| Math Calculation | ✅ Pass | 0.516s | |
| Basic Echo Function | ✅ Pass | 0.700s | |
| Streaming Simple Math Streaming | ✅ Pass | 0.523s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 0.544s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 0.772s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 1.314s | |
| Search Query Function | ✅ Pass | 1.160s | |
| Ask Advice Function | ✅ Pass | 1.645s | |
| Streaming Search Query Function Streaming | ✅ Pass | 0.948s | |
| Basic Context Memory Test | ✅ Pass | 3.003s | |
| Function Argument Memory Test | ✅ Pass | 2.254s | |
| Function Response Memory Test | ✅ Pass | 4.521s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 2.225s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 1.494s | |
| Penetration Testing Methodology | ✅ Pass | 5.008s | |
| Vulnerability Assessment Tools | ✅ Pass | 11.717s | |
| SQL Injection Attack Type | ✅ Pass | 4.587s | |
| Penetration Testing Framework | ✅ Pass | 4.199s | |
| Web Application Security Scanner | ✅ Pass | 4.499s | |
| Penetration Testing Tool Selection | ✅ Pass | 1.261s | |
| JSON Response Function | ✅ Pass | 0.830s | |
| Search Query Function | ✅ Pass | 1.114s | |
| Ask Advice Function | ✅ Pass | 0.989s | |
| Streaming Search Query Function Streaming | ✅ Pass | 0.729s | |
| Basic Context Memory Test | ✅ Pass | 0.571s | |
| Function Argument Memory Test | ✅ Pass | 0.820s | |
| Function Response Memory Test | ✅ Pass | 0.683s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 1.762s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 1.029s | |
| Penetration Testing Methodology | ✅ Pass | 2.050s | |
| Vulnerability Assessment Tools | ✅ Pass | 1.990s | |
| SQL Injection Attack Type | ✅ Pass | 0.651s | |
| Penetration Testing Framework | ✅ Pass | 1.320s | |
| Web Application Security Scanner | ✅ Pass | 1.317s | |
| Penetration Testing Tool Selection | ✅ Pass | 1.060s | |
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 2.857s
**Average latency**: 0.972s
---
@@ -418,38 +418,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 0.873s | |
| Text Transform Uppercase | ✅ Pass | 0.824s | |
| Count from 1 to 5 | ✅ Pass | 0.953s | |
| Math Calculation | ✅ Pass | 0.842s | |
| Basic Echo Function | ✅ Pass | 1.514s | |
| Streaming Simple Math Streaming | ✅ Pass | 1.386s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 1.346s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 1.009s | |
| Simple Math | ✅ Pass | 1.889s | |
| Text Transform Uppercase | ✅ Pass | 1.323s | |
| Count from 1 to 5 | ✅ Pass | 0.900s | |
| Math Calculation | ✅ Pass | 1.026s | |
| Basic Echo Function | ✅ Pass | 1.016s | |
| Streaming Simple Math Streaming | ✅ Pass | 1.920s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 1.298s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 0.987s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 1.687s | |
| Search Query Function | ✅ Pass | 0.994s | |
| Ask Advice Function | ✅ Pass | 1.286s | |
| Streaming Search Query Function Streaming | ✅ Pass | 1.041s | |
| Basic Context Memory Test | ✅ Pass | 0.905s | |
| Function Argument Memory Test | ✅ Pass | 0.845s | |
| Function Response Memory Test | ✅ Pass | 0.806s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 2.240s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 0.862s | |
| Penetration Testing Methodology | ✅ Pass | 4.261s | |
| Vulnerability Assessment Tools | ✅ Pass | 3.398s | |
| SQL Injection Attack Type | ✅ Pass | 0.869s | |
| Penetration Testing Framework | ✅ Pass | 3.691s | |
| Web Application Security Scanner | ✅ Pass | 2.796s | |
| Penetration Testing Tool Selection | ✅ Pass | 1.349s | |
| JSON Response Function | ✅ Pass | 1.585s | |
| Search Query Function | ✅ Pass | 1.567s | |
| Ask Advice Function | ✅ Pass | 1.272s | |
| Streaming Search Query Function Streaming | ✅ Pass | 0.972s | |
| Basic Context Memory Test | ✅ Pass | 1.434s | |
| Function Argument Memory Test | ✅ Pass | 0.764s | |
| Function Response Memory Test | ✅ Pass | 1.068s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 1.771s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 0.781s | |
| Penetration Testing Methodology | ✅ Pass | 3.191s | |
| Vulnerability Assessment Tools | ✅ Pass | 3.556s | |
| SQL Injection Attack Type | ✅ Pass | 0.965s | |
| Penetration Testing Framework | ✅ Pass | 5.216s | |
| Web Application Security Scanner | ✅ Pass | 3.193s | |
| Penetration Testing Tool Selection | ✅ Pass | 1.441s | |
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 1.556s
**Average latency**: 1.702s
---
@@ -459,38 +459,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 1.007s | |
| Text Transform Uppercase | ✅ Pass | 0.566s | |
| Count from 1 to 5 | ✅ Pass | 1.100s | |
| Math Calculation | ✅ Pass | 1.011s | |
| Basic Echo Function | ❌ Fail | 0.727s | no tool calls found, expected at least 1 |
| Streaming Simple Math Streaming | ✅ Pass | 0.542s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 0.560s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 0.774s | |
| Simple Math | ✅ Pass | 0.558s | |
| Text Transform Uppercase | ✅ Pass | 0.527s | |
| Count from 1 to 5 | ✅ Pass | 0.651s | |
| Math Calculation | ✅ Pass | 0.965s | |
| Basic Echo Function | ✅ Pass | 1.248s | |
| Streaming Simple Math Streaming | ✅ Pass | 0.919s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 0.591s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 0.761s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 0.830s | |
| Search Query Function | ❌ Fail | 1.222s | no tool calls found, expected at least 1 |
| Ask Advice Function | ✅ Pass | 0.877s | |
| Streaming Search Query Function Streaming | ❌ Fail | 1.146s | no tool calls found, expected at least 1 |
| Basic Context Memory Test | ✅ Pass | 0.973s | |
| Function Argument Memory Test | ✅ Pass | 0.613s | |
| Function Response Memory Test | ✅ Pass | 0.583s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 1.826s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 0.625s | |
| Penetration Testing Methodology | ✅ Pass | 1.940s | |
| Vulnerability Assessment Tools | ✅ Pass | 3.151s | |
| SQL Injection Attack Type | ✅ Pass | 0.588s | |
| Penetration Testing Framework | ✅ Pass | 1.175s | |
| Web Application Security Scanner | ✅ Pass | 1.550s | |
| Penetration Testing Tool Selection | ✅ Pass | 0.973s | |
| JSON Response Function | ✅ Pass | 0.869s | |
| Search Query Function | ✅ Pass | 0.856s | |
| Ask Advice Function | ✅ Pass | 1.283s | |
| Streaming Search Query Function Streaming | ✅ Pass | 1.259s | |
| Basic Context Memory Test | ✅ Pass | 0.981s | |
| Function Argument Memory Test | ✅ Pass | 0.649s | |
| Function Response Memory Test | ✅ Pass | 1.007s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 1.940s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 1.070s | |
| Penetration Testing Methodology | ✅ Pass | 2.822s | |
| Vulnerability Assessment Tools | ✅ Pass | 2.937s | |
| SQL Injection Attack Type | ✅ Pass | 0.620s | |
| Penetration Testing Framework | ✅ Pass | 1.331s | |
| Web Application Security Scanner | ✅ Pass | 1.018s | |
| Penetration Testing Tool Selection | ✅ Pass | 0.972s | |
**Summary**: 20/23 (86.96%) successful tests
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 1.060s
**Average latency**: 1.124s
---
@@ -500,38 +500,38 @@ Generated: Wed, 27 May 2026 23:05:00 UTC
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| Simple Math | ✅ Pass | 4.527s | |
| Text Transform Uppercase | ✅ Pass | 4.599s | |
| Count from 1 to 5 | ✅ Pass | 5.743s | |
| Math Calculation | ✅ Pass | 3.509s | |
| Basic Echo Function | ✅ Pass | 4.851s | |
| Streaming Simple Math Streaming | ✅ Pass | 3.707s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 4.781s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 5.300s | |
| Simple Math | ✅ Pass | 3.761s | |
| Text Transform Uppercase | ✅ Pass | 4.980s | |
| Count from 1 to 5 | ✅ Pass | 6.066s | |
| Math Calculation | ✅ Pass | 5.125s | |
| Basic Echo Function | ✅ Pass | 4.145s | |
| Streaming Simple Math Streaming | ✅ Pass | 4.829s | |
| Streaming Count from 1 to 3 Streaming | ✅ Pass | 5.837s | |
| Streaming Basic Echo Function Streaming | ✅ Pass | 3.264s | |
#### Advanced Tests
| Test | Result | Latency | Error |
|------|--------|---------|-------|
| JSON Response Function | ✅ Pass | 2.456s | |
| Search Query Function | ✅ Pass | 3.989s | |
| Ask Advice Function | ✅ Pass | 3.255s | |
| Streaming Search Query Function Streaming | ✅ Pass | 2.430s | |
| Basic Context Memory Test | ✅ Pass | 4.780s | |
| Function Argument Memory Test | ✅ Pass | 5.338s | |
| Function Response Memory Test | ✅ Pass | 2.911s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 5.094s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 6.163s | |
| Penetration Testing Methodology | ✅ Pass | 12.475s | |
| Vulnerability Assessment Tools | ✅ Pass | 11.406s | |
| SQL Injection Attack Type | ✅ Pass | 6.216s | |
| Penetration Testing Framework | ✅ Pass | 11.490s | |
| Web Application Security Scanner | ✅ Pass | 8.229s | |
| Penetration Testing Tool Selection | ✅ Pass | 3.336s | |
| JSON Response Function | ✅ Pass | 2.894s | |
| Search Query Function | ✅ Pass | 3.922s | |
| Ask Advice Function | ✅ Pass | 3.311s | |
| Streaming Search Query Function Streaming | ✅ Pass | 2.728s | |
| Basic Context Memory Test | ✅ Pass | 4.075s | |
| Function Argument Memory Test | ✅ Pass | 5.545s | |
| Function Response Memory Test | ✅ Pass | 10.448s | |
| Penetration Testing Memory with Tool Call | ✅ Pass | 5.993s | |
| Cybersecurity Workflow Memory Test | ✅ Pass | 3.695s | |
| Penetration Testing Methodology | ✅ Pass | 9.881s | |
| Vulnerability Assessment Tools | ✅ Pass | 10.935s | |
| Penetration Testing Framework | ✅ Pass | 12.705s | |
| Web Application Security Scanner | ✅ Pass | 8.109s | |
| Penetration Testing Tool Selection | ✅ Pass | 2.886s | |
| SQL Injection Attack Type | ✅ Pass | 86.622s | |
**Summary**: 23/23 (100.00%) successful tests
**Average latency**: 5.504s
**Average latency**: 9.207s
---