Email from Jim Quinlan, Multimedia & Intel(r) Xscale(tm), Intel Corporation

 

Hi Tom,

Okay, here's a quick core dump of what I know. If possible, it would be a benefit if you could use your notoriety to engage FlaskMPEG and ask them some questions as well, so that you can get even more info, even as far as to get their source code for the MMX integer IEEE-compliant code. I will even try to imagine why one might use FP here, but I think a little more research is needed here to support your conclusions. I'll be available all weekend if you need further elaboration.

First, lets talk about the video encode process. Most of them are similar (MPEG1, MPEG2, MPEG4) in that their foundation is based on 8x8 (I)DCT blocks with motion compensation. MPEG4 has new innovative techniques, bells, and whistles, but quite often they aren't enabled because they require aggressive computation and an equally sophisticated decoding infrastructure. Now it's possible that FlaskMPEG is enabling one of these MPEG4 features, and using FP is a visual quality win somehow, but that is not how things look given by your post of Toby's email.

The MPEG4 spec (in front of me now) specifies in Annex A that following the accuracy standard of IDCT 1180-1190 is 'necessary' for MPEG4, but not sufficient. It says that you have to follow this and "the precision shall be sufficient so that significant errors do not occur in the final integer values". The actual test for for IEEE IDCT compliancy is unequivocal; I can dig it up if you want. The scuttlebutt I heard once about the IEEE (I)DCT standard is that they wanted to make it deterministic and precise (ie "bit-exact"), but they compromised on a solution less precise as to avoid obsoleting existing commercial approaches. I have no idea if this is true, but it makes good sense: the idea of keeping the decoder's IDCT calculations deterministic (or at least somewhat predictable) has great value: the encoder can compute *exactly* the decoder's error term, and can compensate and control this term for future frames.

Now, the visual encoding process actually contains a decoder that it uses as feedback. The purpose of this decoder is to model the actual assumed decoder, including its inexactness, so that the encoder can carry a known error term that it will compensate for in the encoding of a subsequent predictive frame (a frame which is based in part on predictive motion vectors from the previous frame added to residual data from the IDCT). You can probably get the standard picture or diagram of this model from any introductory MPEG text. The point is that the encoder is aware of the magnitude of the error in the decoder's calculations, and can compensate as needed.

In more detail: the encoder will take an 8x8 pixel block of content and run it through a DCT. Then, it quantizes the resulting frequency coeffs. The amount of quantization is based on quality and bandwidth decisions that the encoder makes. This act of quantization inherently chops of bits as part of the compression process.

Now this is where the encoder's internal decoder comes into play. *Before* it sends these coeffs to the Huffman-like compressor (VLC), It first takes these same quantized coeffs, de-quants them, and runs them through an IDCT. The only thing assumed about the decoder's IDCT is that it follows the IEEE standard. This is where my puzzlement about the FP IDCT options comes into play: how can one extract extra precision in the IDCT decoding process when the encoder has already assumed that the decoder only follows the IEEE precision standard? Now its possible that there is some extra assumptions that the DVD people are making here, but if so it is beyond the MPEG2 standard. By calculating a priori what the decoder is going to calculate, the encoder can (1) decide if the error is acceptable for the current frame and (2) carry an error term to be used and compensated for in the encoding of the subsequent frame. If (1) fails, the encoder tries again with less quantization until it is happy with the error of the current frame's compression. If the cumulative error term of (2) cannot be compensated for or grows out of control or out of the encoder's determinism, the encoder must for the next frame generate an "I" (aka key; no motion comp allowed) block/macroblock/frame instead to reset the error term.

Now, I am admittedly not too familiar with the DVD ripping process. But here is my guess as to what is going on:

  1. decryption of DVD content
  2. selection and extraction of desired video and audio streams.
  3. Full MPEG2 decode.
  4. MPEG[124] encode of (3).

I'm guessing that one has to do (3) and (4) because the user wants to reduce the bitrate (ie increase compression) or change the format of (2) so that it fits on a PC or whatever, just like MP3. Now I am unsure if FlaskMPEG is just doing the full standard (3) and (4), or have they found a more sophisticated way to do the conversion that benefits from FP? Possible, but I wouldn't think they'd be at this stage yet since this is a recent endeavor, and its not a simple thing to do. This is why I think it is important for you (Tom) to engage FlaskMPEG in your analysis. Only they know what's going on in their ripper (I do not believe they are open source; if I'm mistaken, let me know and I'll look at their code).

Now the FlaskMPEG document on the web says this about the IDCT:

"Right now, FlasK MPEG has three algorithms to perform the iDCT, all IEEE-1180 compliant. A MMX one, an integer based one and one using floating point numbers. Even when all are IEEE compliant, the floating point one is more accurate but it takes a lot more CPU time. The integer one should be enough for almost everybody without MMX and the MMX iDCT should be the default option for almost everyone."

This blurb is probably why both of us are up late tonight. It makes the claim that the FlasK MPEG has an FP version that is "more accurate". Now it is possible that the FP version is "more accurate", but this doesn't mean much if the people who did the original DVD encoding assumed that the decoding would be done using an integer IDCT (again, I do not know what hardware lies in a typical DVD player, nor if there are any de facto DVD rules of thumb).

From a FlasKMPEG screen snapshot (again, right from their website), they offer the user three choices: "MMx IDCT (fastest), non-MMx fast IDCT, IEEE-1180 reference quality IDCT (Slowest)". These choices seem odd; they contradict the quote above, although its not uncommon for one's GUI to be out of date with one's code.

It could be that Flask MPEG has empirical and qualitative data that this FP IDCT is actually a big win. Or, it could be akin to the difference between ripping MP3 @192 verses @168. Or, it could be no difference at all.

Now Tom if you have actually ripped an integer version and an FP version, and conducted a visual quality test using some reasonable amount of scientific method (eg a bunch of friends viewing identical monitors with synced display), and you have found that there is indeed a significant difference in visual quality, then you and Toby may have made some good points here. But again, I have not seen the homework done to support this yet.

If I were you, I would be really anal about this and request that FlasK give you the MMx IDCT source so that you can verify that they are using the IEEE version and not the non-compliant IEEE "Aan" version. I say this because I've seen some people conclude that one converts the MMx-Aan version into the MMx-IEEE version by simply transposing coeffs (I can explain if you want); this is completely in error. Although I expect the FlasK guys are way to smart to make an error like this, it never hurts to confirm these things when one is getting > 1 million page hits, eh?

If you wish, I can inspect the MMX IDCT code and pass judgement on its IEEEness.

Whew!

Glad I could contribute and keep up the good work!

Jim Quinlan,
Multimedia & Intel(r) Xscale(tm)
Intel Corporation
M/S: HD2-230
77 Reed Road,
Hudson, MA 01749

 

Ritorna alla pagina digital video

Ritorna alla home page