I’m using pxelinux to deploy a in-ram version of TinyCore (Linux version 4.19.10-tinycore). It’s running on a Z270-A motherboard with latest BIOS (as of today). PXE is booted in legacy mode.
I wrote a Java application that deploys SSD images over the network, writing them using RandomAccessFile. I’ve been experiencing weird behavior when writing, specifically this:
print_req_error: I/O error, dev sda, sector 42319888
Buffer I/O error on dev sda, logical block 5289986, async page read
ata1: EH complete
ata1.00: Enabling discard_zeroes_data
ata1.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/08:90:10:c0:85/00:00:02:00:00/40 tag 18 ncq dma 4096 in
res 41/40:00:10:c0:85/00:00:02:00:00/40 Emask 0x409 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda] tag#18 Sense Key : 0x3 [current]
sd 0:0:0:0: [sda] tag#18 ASC=0x11 ASCQ=0x4
sd 0:0:0:0: [sda] tag#18 CDB: opcode=0x28 28 00 02 85 c0 10 00 00 08 00
print_req_error: I/O error, dev sda, sector 42319888
Buffer I/O error on dev sda, logical block 5289986, async page read
ata1: EH complete
ata1.00: Enabling discard_zeroes_data
I’ve tried disabling NCQ using libata.force=noncq
but to no avail.
The weird thing is, no such errors occur when wiping the device using dd if=/dev/zero of=/dev/sda bs=1M
and then attempting to write data again with my program. It looks like filling the drive with zeros fixes the issue, but this takes a really long time and it’s not beneficial for the device’s health.
For this specific reason, I implemented the program writing the image in such a way that, before writing the actual data, zeros are written to simulate the above command. Even so, the error still happens.
smartctl -a /dev/sda
does not show any bad signs. I’ve seen this happen with multiple devices, such as Sillicon Power S55 and Micron 1100. This only happens in this setup. It never happened with an installed version of Ubuntu 18.04 (ran from a disk, not from ram).
The ram is not faulty, tested with memtest. All cables are good, running off a Corsair RM1000i.
Here is an output of dmesg. I cannot seem to find a way to fix this, I’m lost at this point. Also, here is smartctl output.
EDIT: It does not always happen at the same sector. Sometimes it happens at a sector that worked fine in the past, it looks random.