python 深度学习记录遇到的报错问题10

2024-01-02 16:07:39

本篇继python 深度学习解决遇到的报错问题9_module 'd2l.torch' has no attribute 'train_ch3-CSDN博客

一、CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

警告：

然后模型训练的时候，报错：?

原因：为什么把警告打印出来，是因为警告可以让我们了解一些有用信息。首先警告里的内容不可忽略，翻译过来就是NVIDIA RTX A4000与CUDA功能sm_86不兼容当前的PyTorch安装。当前的PyTorch安装支持CUDA功能sm_37 sm_50 sm_60 sm_70。说白了就是CUDA和pytorch版本不一致。

解决方法：安装的torch应该是cpu版本的，需要换成gpu版本的。

验证CUDA设备的可用性：使用torch.cuda.is_available()检查CUDA是否可用，并使用torch.cuda.device_count()检查可用的CUDA设备数量。确保代码正常选择并使用可用的CUDA设备。

二、torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.86 GiB. GPU 0 has a total capacty of 15.73 GiB of which 3.04 GiB is free. Including non-PyTorch memory, this process has 12.67 GiB memory in use. Of the allocated memory 11.10 GiB is all

报错：

原因：这个错误信息表明PyTorch应用程序在尝试分配额外的GPU内存时发生了CUDA内存不足错误。不知道为什么在参数前面加上CUDA_VISIBLE_DEVICES=0,1,2,3也还是会报显存不够的错误.。

解决方法：未解决。

三、RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

报错：

解决方法：未解决。

四、RuntimeError：Expected floating point type for target with class probabilities, got Long

报错：计算损失loss = loss_fn(outputs, targets)的时候，

原因：有类别概率的目标的预期浮点类型，但是是Long。大概意思就是输入的标签值是浮点数，但实际上所获得的是Long类型的值。

解决方法：

重新运行代码，

OK，问题解决。

五、ValueError: invalid literal for int() with base 10

报错：

原因：这种错误是指类型转换错误，int()函数是可以将字符串转换为整数，但是这个字符串如果是带小数的，比如1.1，3.14之类,这个时候如果你再用int(1.1)转换得话，就会出现上面报的错误。

解决方法：先将字符串转换为浮点数，再将浮点数转换为整数。

int(float(i))