Sheogorath Δημοσ. 20 Απριλίου Δημοσ. 20 Απριλίου 26 λεπτά πριν, daemonix είπε btw, NVLINK is not required. the crosstalk between PCI is more than ok for LLMs. The model layers are split between GPU and the data flowing are not that much. I have an 80way system that only has NVLINK per 2 GPUs (for 4 pairs of 2 GPUs). Almost zero difference with NV talk OFF. EDIT: you can run 3090 @ 90% power limit with almost no loss. if you google there are post with people running 50-60 watt off with 5% speed loss for LLMs. What about pci express bandwidth requirements? Is PCIe gen3 x4 enough for 4060ti/5060ti level GPUs?
daemonix Δημοσ. 20 Απριλίου Δημοσ. 20 Απριλίου (επεξεργασμένο) 41 minutes ago, Sheogorath said: What about pci express bandwidth requirements? Is PCIe gen3 x4 enough for 4060ti/5060ti level GPUs? I must confess I dont remember exact numbers but: 1) I know that only people with 1x "mining risers" complain about speed (I have never used any mining or ex-mining gear so I dont really know) 2) on my 8 GPU system I dont remember seeing more that 1-1.2GByte/s PCIe traffic on "nvtop". Only loading the model once is pushing max speed. Some people are using the x16 slot as 4x4x4x4 (like we do with multi nvme NAS systems). I think 4x is way more that 1-2GByte/s data speed. EDIT: maybe its too late and Im brain dead but I just tested the llama3.3:70b-instruct-q8_0. So 70b and 8bit quantisation (most local user run 4bit, half the memory usage). Max PCIE TX/RX on nvtop was 40MiB/s (so more or less megabyte/s) for each GPU. Επεξ/σία 20 Απριλίου από daemonix 1
Προτεινόμενες αναρτήσεις
Δημιουργήστε ένα λογαριασμό ή συνδεθείτε για να σχολιάσετε
Πρέπει να είστε μέλος για να αφήσετε σχόλιο
Δημιουργία λογαριασμού
Εγγραφείτε με νέο λογαριασμό στην κοινότητα μας. Είναι πανεύκολο!
Δημιουργία νέου λογαριασμούΣύνδεση
Έχετε ήδη λογαριασμό; Συνδεθείτε εδώ.
Συνδεθείτε τώρα