# Check specific model
```
#### `server` - Start API Server
```bash
mlxk server # Start on localhost:8000
mlxk server --port 8001 # Custom port
mlxk server --host 0.0.0.0 --port 8000 # Allow external access
mlxk server --max-tokens 4000 # Set default max tokens (default: 2000)
mlxk server --reload # Development mode with auto-reload
```
### Command Aliases
After installation, these commands are equivalent:
- `mlxk` (recommended)
- `mlx-knife`
- `mlx_knife`
## Configuration
### Cache Location
By default, models are stored in `~/.cache/huggingface/hub`. Configure with:
```bash
# Set custom cache location
export HF_HOME="/path/to/your/cache"
# Example: External SSD
export HF_HOME="/Volumes/ExternalSSD/models"
```
### Model Name Expansion
Short names are automatically expanded for MLX models:
- `Phi-3-mini-4k-instruct-4bit` â `mlx-community/Phi-3-mini-4k-instruct-4bit`
- Models already containing `/` are used as-is
## Advanced Usage
### Generation Parameters
```bash
# Creative writing (high temperature, diverse output)
mlxk run Mistral-7B "Write a story" --temperature 0.9 --top-p 0.95
# Precise tasks (low temperature, focused output)
mlxk run Phi-3-mini "Extract key points" --temperature 0.3 --top-p 0.9
# Long-form generation
mlxk run Mixtral-8x7B "Explain quantum computing" --max-tokens 2000
# Reduce repetition
mlxk run model "prompt" --repetition-penalty 1.2
```
### Working with Specific Commits
```bash
# Use specific model version
mlxk show model@commit_hash
mlxk run model@commit_hash "prompt"
```
### Non-MLX Model Handling
The tool automatically detects framework compatibility:
```bash
# Attempting to run PyTorch model
mlxk run bert-base-uncased
# Error: Model bert-base-uncased is not MLX-compatible (Framework: PyTorch)!
# Use MLX-Community models: https://huggingface.co/mlx-community
```
## Troubleshooting
### Model Not Found
```bash
# If model isn't found, try full path
mlxk pull mlx-community/Model-Name-4bit
# List available models
mlxk list --all
```
### Performance Issues
- Ensure sufficient RAM for model size
- Close other applications to free memory
- Use smaller quantized models (4-bit recommended)
### Streaming Issues
- Some models may have spacing issues - this is handled automatically
- Use `--no-stream` for batch output if needed
## Contributing
Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.
## Security
For security concerns, please see [SECURITY.md](SECURITY.md) or contact us at broke@gmx.eu.
MLX Knife runs entirely locally - no data is sent to external servers except when downloading models from HuggingFace.
## License
MIT License - see [LICENSE](LICENSE) file for details
Copyright (c) 2025 The BROKE team đĻĢ
## Acknowledgments
- Built for Apple Silicon using the [MLX framework](https://github.com/ml-explore/mlx)
- Models hosted by the [MLX Community](https://huggingface.co/mlx-community) on HuggingFace
- Inspired by [ollama](https://ollama.ai)'s user experience
---
Made with â¤ī¸ by The BROKE team 
Version 1.1.0-beta3 | August 2025
đŽ Next: BROKE Cluster for multi-node deployments