Build για local LLM με 70B parameters

Sheogorath · 20 Απριλίου

26 λεπτά πριν, daemonix είπε

btw, NVLINK is not required. the crosstalk between PCI is more than ok for LLMs. The model layers are split between GPU and the data flowing are not that much.

I have an 80way system that only has NVLINK per 2 GPUs (for 4 pairs of 2 GPUs). Almost zero difference with NV talk OFF.

EDIT: you can run 3090 @ 90% power limit with almost no loss. if you google there are post with people running 50-60 watt off with 5% speed loss for LLMs.

What about pci express bandwidth requirements? Is PCIe gen3 x4 enough for 4060ti/5060ti level GPUs?

daemonix · 20 Απριλίου

41 minutes ago, Sheogorath said:

What about pci express bandwidth requirements? Is PCIe gen3 x4 enough for 4060ti/5060ti level GPUs?

I must confess I dont remember exact numbers but:

1) I know that only people with 1x "mining risers" complain about speed (I have never used any mining or ex-mining gear so I dont really know)

2) on my 8 GPU system I dont remember seeing more that 1-1.2GByte/s PCIe traffic on "nvtop". Only loading the model once is pushing max speed.

Some people are using the x16 slot as 4x4x4x4 (like we do with multi nvme NAS systems). I think 4x is way more that 1-2GByte/s data speed.

EDIT:
maybe its too late and Im brain dead but I just tested the llama3.3:70b-instruct-q8_0. So 70b and 8bit quantisation (most local user run 4bit, half the memory usage).

Max PCIE TX/RX on nvtop was 40MiB/s (so more or less megabyte/s) for each GPU.

Επεξ/σία 20 Απριλίου από daemonix

Σύνδεση

Build για local LLM με 70B parameters

Προτεινόμενες αναρτήσεις

Sheogorath

daemonix

Δημιουργήστε ένα λογαριασμό ή συνδεθείτε για να σχολιάσετε

Δημιουργία λογαριασμού

Σύνδεση

Σύνδεση