Automated llama.cpp CUDA Installation
This guide explains how to use the automated setup script to install pita with llama.cpp and NVIDIA CUDA support on Linux systems.
Prerequisites
- Linux OS: This script is designed for Linux bash environments.
- NVIDIA GPU: A CUDA-capable GPU with driver installed.
- Conda: Miniconda or Anaconda installed and initialized.
Usage
You can create the script locally by saving the following content as setup_llamacpp_cuda.sh:
#!/bin/bash
# Script to set up pita_llamacpp_cuda environment with CUDA support
# This builds llama-cpp-python from source to ensure CUDA compatibility
set -e # Exit on any error
ENV_NAME="pita_llamacpp_cuda"
# 1. Create a temporary YAML for the base environment
cat <<EOF > llamacpp_cuda.yml
name: \$ENV_NAME
channels:
- defaults
- nvidia
- conda-forge
dependencies:
- python=3.12
- pip
- cuda-cudart=12.4.127
- cuda-toolkit=12.4.1
- cmake
EOF
echo "=========================================="
echo "Setting up \$ENV_NAME environment"
echo "=========================================="
# Check if environment already exists
if conda info --envs | grep -q "^\$ENV_NAME "; then
echo "Environment \$ENV_NAME already exists."
read -p "Do you want to remove and recreate it? (y/n): " choice
if [[ "\$choice" == "y" || "\$choice" == "Y" ]]; then
echo "Removing existing environment..."
conda env remove -n "\$ENV_NAME" -y
else
echo "Keeping existing environment. Will attempt to install llama-cpp-python..."
fi
fi
# Create environment if it doesn't exist
if ! conda info --envs | grep -q "^\$ENV_NAME "; then
echo "Creating conda environment from llamacpp_cuda.yml..."
conda env create -f llamacpp_cuda.yml
fi
echo ""
echo "=========================================="
echo "Installing llama-cpp-python with CUDA"
echo "=========================================="
# Get the conda prefix for this environment
CONDA_PREFIX_PATH=\$(conda info --envs | grep "^\$ENV_NAME " | awk '{print \$NF}')
if [[ -z "\$CONDA_PREFIX_PATH" ]]; then
CONDA_PREFIX_PATH=\$(conda info --envs | grep "^\$ENV_NAME$" | awk '{print \$NF}')
fi
echo "Environment path: \$CONDA_PREFIX_PATH"
# Set up environment variables for CUDA build
export CUDACXX="\$CONDA_PREFIX_PATH/bin/nvcc"
export CPATH="\$CONDA_PREFIX_PATH/targets/x86_64-linux/include:\$CPATH"
export LD_LIBRARY_PATH="\$CONDA_PREFIX_PATH/lib:\$LD_LIBRARY_PATH"
export CMAKE_ARGS="-DGGML_CUDA=on -DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler"
echo "CUDACXX: \$CUDACXX"
echo "CMAKE_ARGS: \$CMAKE_ARGS"
echo ""
# Install llama-cpp-python with CUDA support
echo "Building and installing llama-cpp-python (this may take a few minutes)..."
conda run -n "\$ENV_NAME" bash -c "export CUDACXX='\$CONDA_PREFIX_PATH/bin/nvcc'; export CPATH='\$CONDA_PREFIX_PATH/targets/x86_64-linux/include:\$CPATH'; export LD_LIBRARY_PATH='\$CONDA_PREFIX_PATH/lib:\$LD_LIBRARY_PATH'; export CMAKE_ARGS='-DGGML_CUDA=on -DCMAKE_CUDA_FLAGS=-allow-unsupported-compiler'; pip install llama-cpp-python --no-cache-dir"
# Verify installation
echo ""
echo "=========================================="
echo "Verifying CUDA backend installation"
echo "=========================================="
CUDA_CHECK=\$(conda run -n "\$ENV_NAME" python -c "
import os
lib_dir = os.path.dirname(__import__('llama_cpp').__file__) + '/lib'
libs = os.listdir(lib_dir)
cuda_libs = [l for l in libs if 'cuda' in l.lower()]
if cuda_libs:
print('SUCCESS: CUDA backend found:', cuda_libs)
else:
print('WARNING: No CUDA backend found. Available libs:', libs)
")
echo "\$CUDA_CHECK"
echo ""
echo "=========================================="
echo "Setup complete!"
echo "=========================================="
echo ""
echo "To activate the environment:"
echo " conda activate \$ENV_NAME"
echo ""
Execution Steps
-
Save the script as
setup_llamacpp_cuda.sh. -
Make the script executable:
bash chmod +x setup_llamacpp_cuda.sh -
Run the script:
bash ./setup_llamacpp_cuda.sh -
Follow the prompts: The script will check if an environment named
pita_llamacpp_cudaalready exists and ask if you want to recreate it.
What the Script Does
The script performs the following steps automatically:
- Environment Creation: Creates a Conda environment named
pita_llamacpp_cudausing the correct Python and dependency versions. - CUDA Build Setup: Configures essential environment variables (
CUDACXX,CPATH,LD_LIBRARY_PATH) to ensurellama-cpp-pythoncan find your CUDA toolkit. - Compilation: Builds
llama-cpp-pythonfrom source with-DGGML_CUDA=onto enable GPU acceleration. - Verification: Runs a small Python check to confirm that the CUDA backend (
libggml-cuda) was correctly included in the build.
Activating the Environment
Once the script completes successfully, activate your new environment:
conda activate pita_llamacpp_cuda
Then install pita in editable mode:
pip install -e .