Gonna need a bigger boat. And by boat I mean RAM, GPU and CPUs. Probably a dedicated power circuit.
-- process wanted 42GB memory and system has all told 32.
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 18578.44 MiB on device 0: cudaMalloc failed: out of memory
ggml_backend_alloc_ctx_tensors_from_buft: failed to allocate buffer
llama_model_load: error loading model: failed to allocate buffer
llama_load_model_from_file: failed to load model
999768 Aborted (core dumped)