Warhol Installation

Resources

Warhol

Warhol is a proxmox node that runs on the GPU server. In essence Warhol is GPU.

Warhol is being setup through proxmox.

To reach Warhol, first enable a pass-through to olimp or home, and start a local browser instance (chromimum) proxied over the ssh connection:

ssh -D 8082 home
#different terminal
chromium --proxy-server="socks5://127.0.0.1:8082"

Navigate to https://172.16.10.215:8006/ in Chromium, use username and password provided by Črt.
Warhol is one of the nodes in cluster (the other being kobilica and will be used as storage).

Switching to non-enterprise version

In /etc/apt/sources.list.d move existing enterprise sources to disabled:

cd /etc/apt/sources.list.d
mv pve-enterprise.list pve-enterprise.list.disabled
mv ceph.list ceph.list.disabled

Add sources that point to community proxmox edition:

echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" >> pve-no-subscription.list
echo "deb http://download.proxmox.com/debian/ceph-quincy bookworm no-subscription" >> ceph-no-subscription.list
apt-get update

Setting up NVIDIA GPU

vGPU (Abandonded)

Assuming more than a single VM instance will run GPU, a vGPU strategy was employed.

To that end, nuoveau was blacklisted:

echo "blacklist nouveau" >> /etc/modprobe.d/nouveau-blacklist.conf
update-initramfs -u -k all
shutdown -r now
#after reboot, took a while
lsmod | grep no
#no output

Install DKMS:

apt update
apt install dkms libc6-dev proxmox-default-headers --no-install-recommends

It seems vGPU costs money. Alternative is a pass-through setup, where GPU is assigned to a single VM.

Pass-through

Following pass-through manual, also nvidia* was added to blacklisted modules.

Following official PCIe passthrough, or a forum tutorial, one should add vfio* drivers:

echo "vfio" >> /etc/modules
echo "vfio_iommu_type1" >> /etc/modules
echo "vfio_pci" >> /etc/modules
update-initramfs -u -k all

After reboot, new modules should be loaded, check with:

lsmod | grep vfio

Now, vfio is running, but NVIDIA is not associated with it, checked with:

lspci -k

output

First, find the PCI ID of the card:

>lspci -nn
c1:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1)

ID is the number in brackets, [10de:26b1].

Then, the tutorial suggests to help modprobe determine which driver to use by:

echo "options vfio-pci ids=10de:26b1" >> /etc/modprobe.d/vfio.conf

After reboot:

lspci -k

output

Final checks:
output

The PCI has then to be added to the VM in Hardware section of the PROXMOX VME setup by clicking Add -> PCI Device -> Raw device -> navigate to GPU, I also marked All Functions check box. After that, VM was showing the GPU with lspci command.

  Attached Files  
   
 lspci.png
 lspci1.png
 final.png

Discussion