ollama/ml
Daniel Hiltgen dba1e27fa8
llama: enable FA on CUDA CC 6.x GPUs (#16994)
Recent upstream Pascal kernel fixes let us compile native SM60/SM61 kernels again instead of relying on PTX JIT, so allow Flash Attention auto at runtime for CC 6.x devices.

Fixes #16591

Fixes #16754
2026-07-02 17:11:39 -07:00
..
backend/ggml/ggml/src/ggml-cuda/template-instances runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
backend.go Add support for gemma4 (#15214) 2026-04-02 11:33:33 -07:00
device.go llama: enable FA on CUDA CC 6.x GPUs (#16994) 2026-07-02 17:11:39 -07:00
device_test.go llama: enable FA on CUDA CC 6.x GPUs (#16994) 2026-07-02 17:11:39 -07:00
path.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00
path_test.go runner: Remove CGO engines, use llama-server exclusively for GGML models (#16031) 2026-05-29 13:35:47 -07:00