# deepseek/deepseek-v4-flash

> Designed for ultimate responsiveness and cost efficiency, DeepSeek V4 Flash is a 284B parameter (13B activated) Mixture-of-Experts model from DeepSeek. It integrates hybrid attention to streamline long-context processing while maintaining exceptional reasoning and coding output during high-throughput workloads. Built-in execution support for high and xhigh (max reasoning) efforts offers scalable logic depth. It is technically primed for demanding integration scenarios, including agent workflows, coding assistants, and real-time chat systems.

## Overview

- **Endpoint**: `https://api.shortapi.ai/v1/chat/completions`
- **Model ID**: `deepseek/deepseek-v4-flash`
- **Category**: llm
- **Kind**: text-generation

## Pricing

Input: $0.0028/M (cache hit) · $0.14/M (cache miss) · Output: $0.28/M

For more details, please check our pricing page.

## API Information

This model can be used via our HTTP API or more conveniently via our client libraries.
See the input and output schema below, as well as the usage examples.

### Input Schema

The API accepts the following input parameters (standard OpenAI format):

- **`model`** (`string`, _required_):
  Model ID

- **`messages`** (`array<object>`, _required_):
  List of conversation messages

  - **`role`** (`string`, _required_):
    Message Role
    - Options: "system", "user", "assistant", "tool", "developer"

  - **`content`** (`string | array<object>`, _required_):
    Message content

    - **`type`** (`string`, _optional_):
      - Options: "text", "image_url", "input_audio", "file", "video_url"

    - **`text`** (`string`, _optional_):

    - **`image_url`** (`object`, _optional_):

      - **`url`** (`string`, _required_):
        Image URL or base64

      - **`detail`** (`string`, _optional_):
        - Options: "auto", "low", "high"

    - **`input_audio`** (`object`, _optional_):

      - **`data`** (`string`, _optional_):
        Base64 encoded audio data

      - **`format`** (`string`, _optional_):
        - Options: "wav", "mp3"

    - **`file`** (`object`, _optional_):

      - **`file_id`** (`string`, _required_):
        File ID

      - **`filename`** (`string`, _optional_):

      - **`file_data`** (`string`, _optional_):

    - **`video_url`** (`object`, _optional_):

      - **`url`** (`string`, _optional_):

  - **`name`** (`string`, _optional_):
    Sender's Name

  - **`tool_calls`** (`array<object>`, _optional_):

    - **`id`** (`string`, _required_):

    - **`type`** (`string`, _required_):

    - **`function`** (`object`, _optional_):

      - **`name`** (`string`, _optional_):

      - **`arguments`** (`string`, _optional_):

  - **`tool_call_id`** (`string`, _optional_):
    Tool invocation ID (used for messages of the tool role)

  - **`reasoning_content`** (`string`, _optional_):
    Reasoning content

- **`temperature`** (`number`, _optional_):
  Sampling temperature
  - Default: `1`
  - Range: `0` to `2`

- **`top_p`** (`number`, _optional_):
  Nuclear sampling parameters
  - Default: `1`
  - Range: `0` to `1`

- **`stream`** (`boolean`, _optional_):
  Is it a streaming response?
  - Default: `false`

- **`stream_options`** (`object`, _optional_):

  - **`include_usage`** (`boolean`, _optional_):

- **`stop`** (`string | array<string>`, _optional_):
  Stop sequence

- **`max_tokens`** (`integer`, _optional_):
  Maximum number of generated tokens

- **`max_completion_tokens`** (`integer`, _optional_):
  Maximum number of completed tokens

- **`presence_penalty`** (`number`, _optional_):
  - Default: `0`
  - Range: `-2` to `2`

- **`frequency_penalty`** (`number`, _optional_):
  - Default: `0`
  - Range: `-2` to `2`

- **`logit_bias`** (`object`, _optional_):

- **`user`** (`string`, _optional_):

- **`tools`** (`array<object>`, _optional_):

  - **`type`** (`string`, _required_):

  - **`function`** (`object`, _required_):
    The function definition

    - **`name`** (`string`, _required_):

    - **`description`** (`string`, _optional_):

    - **`parameters`** (`object`, _optional_):
      Parameter definitions in JSON Schema format

- **`tool_choice`** (`string | object`, _optional_):

  - **`type`** (`string`, _optional_):

  - **`function`** (`object`, _optional_):
    The function definition

    - **`name`** (`string`, _optional_):

- **`response_format`** (`object`, _optional_):

  - **`type`** (`string`, _optional_):
    - Options: "text", "json_object", "json_schema"

  - **`schema`** (`object`, _optional_):
    JSON Schema definition

- **`seed`** (`integer`, _optional_):

- **`reasoning_effort`** (`string`, _optional_):
  Inference strength (the model used to support the inference)
  - Options: "low", "medium", "high"

- **`modalities`** (`array<string>`, _optional_):

- **`audio`** (`object`, _optional_):

  - **`format`** (`string`, _optional_):

  - **`voice`** (`string`, _optional_):

### Output Schema

The API returns a standard OpenAI JSON response.

```json
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1677858242,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 25,
    "total_tokens": 35,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "text_tokens": 10,
      "audio_tokens": 0,
      "image_tokens": 0
    },
    "completion_tokens_details": {
      "text_tokens": 25,
      "audio_tokens": 0,
      "reasoning_tokens": 0
    }
  },
  "system_fingerprint": "fp_44709d6fcb"
}
```

## Use Example

### Bash (cURL)
```bash
curl --request POST \
  --url https://api.shortapi.ai/v1/chat/completions \
  --header "Authorization: Bearer $SHORTAPI_KEY" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "deepseek/deepseek-v4-flash",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 384000
}'
```

### JavaScript (Fetch API)
```javascript
const response = await fetch(`https://api.shortapi.ai/v1/chat/completions`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${SHORTAPI_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "deepseek/deepseek-v4-flash",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "Hello, how are you?" }
    ],
    temperature: 0.7,
    max_tokens: 384000
  })
});
const data = await response.json();
```

### Python (Requests)
```python
import requests

url = "https://api.shortapi.ai/v1/chat/completions"
payload = {
    "model": "deepseek/deepseek-v4-flash",
    "messages": [
        { "role": "system", "content": "You are a helpful assistant." },
        { "role": "user", "content": "Hello, how are you?" }
    ],
    "temperature": 0.7,
    "max_tokens": 384000
}
headers = {
    "Authorization": f"Bearer {SHORTAPI_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)
data = response.json()
```