I would recommend changing the checksum polynomial.
For the range of page sizes possible in transogg the CRC-32C (Castagnoli) polynomial offers superior error detection ability over the 802.3 CRC32 that we currently have specified. This polynomial is also used by SCTP and iSCSI, so I could wave my arms and suggest that it might be more likely to get hardware assistance (ethernet CRC is usually done on the adaptor, but the SCTP and iSCSI crc can't be easily offloaded) ... but no armwaving is required: i7 has an instruction for the Castagnoli polynomial. There are superior polys to the iSCSI one, at least for some sizes relevant to us, but they lack obvious prospects of hardware assistance.
Upsides to switching to Castagnoli CRC:
- Faster CRC on some hardware
- Increased error detection
Upsides to switching to another improved CRC:
- (Possibly) Increased error detection over Castagnoli
Upsides to staying with current CRC:
- Decreased implementation size and complexity for something also supporting the ogg format
...though supporting multiple generator polynomials in a typical software implementation isn't hard. --Gmaxwell 06:56, 29 May 2010 (UTC)
Arrange checksum order of essential data for fast page edits
IP allows fast editing of packets by using a checksum that isn't order-sensitive, so you can subtract the old value and add the new value of any given field directly to the checksum without having to re-sum the whole packet. CRC allows the same thing using EOR, but the modification to the checksum depends both the data and its distance to the end of the checked block. If data which is likely to be edited in a repetitive way over many pages is stored at a fixed distance to the end of the checksum block then the checksum delta can be pre-computed and applied directly without recalculation.
At least two ideas have been discussed so far:
- checksum payload, then variable-length header content, then fixed-length header
- checksum different components separately and EOR them together in an order-insensitive way, or EOR them with length-agnostic perturbations which preserve the useful error-detection features of a CRC.
Note that some CRC libraries and accelerators will have overheads that restrict their ability to accumulate data in a random order. They're likely to prefer well-aligned data and long runs of conventionally-ordered data. An ideal solution wouldn't interfere with their operation unnecessarily. --Gumboot 12:14, 4 June 2010 (UTC)