Talk:Mean opinion score

Latest comment: 7 years ago by Slhck

Don't quantise individual scores edit

It is implied that individual listeners must report their opinions as integers 1 - 2 - 3 - 4 - 5. A consequence that can be seen in MOS scores collected from small groups of listeners and listening sessions is that they are coarsely quantised. For example, possible results in the middle range are MOS = 3, 3-1/8, 3-1/4, 3-3/8, 3-1/2, 3-5/8, etc. where 8 opinion scores are averaged together. However, an important use of MOS listening tests is to evaluate differences in MOS, possibly small, between different audio processes such as codecs, telephone links, etc. After going to the expense of organising human listening tests it is unskilful to enforce that each listener report only 4 grades of difference between audio sequences (s)he hears. During work evaluating the subjective impact of transmission errors in digital telephone links, I find that individual listeners can usually report their order of preference of 8 or more differently impaired sentences with confidence (i.e. repeatability). I contend therefore that listeners should be encouraged to report their scores to any precision they wish. In practice, steps of 0.1 (i.e. 40 possible scores = [1.0 - 1.1 - 1.2....4.9 - 5.0]) can be allowed. Cuddlyable3 (talk) 12:00, 20 February 2008 (UTC)Reply

MOS values of different codecs table edit

Values in MOS table are incorrect or at least confusing, MOS of iLBC or AMR cannot be higher than those of G.711 simply because G.711 transfers raw data without compression and iLBC and AMR are lossy codecs compressing those raw data, (saying that MP3 has better perceptual quality than WAV it is made from would be the same case). This is probably because results measured under ideal/normal conditions in case of iLBC and AMR are mixed with results measured under stress conditions in case of G.711, G.711 MOS under ideal conditions is 4.3 or even 4.45, various sources differ. See http://en.wikipedia.org/wiki/G.711 and http://www.vocal.com/speech-coders/associated-psqm-values/. Please make a decision whether you want to list data for ideal conditions or stress conditions and make it consistent Xtonda (talk) 13:52, 21 June 2014 (UTC)Reply

Complete re-write of article edit

I edited this article, giving it a complete makeover. This edit was done in the scope of the QoE-Net project which is funded by the European Commission. It has been reviewed by Prof. Sebastian Möller, a senior scientist and leading expert in the field of Quality of Experience. I clarified terminology and added proper references to ITU standards. I also added a section on the mathematical definition and the biases present in MOS ratings. The table with DCR scores was removed, as those are not "MOS" but "DMOS" ratings. Audio-only parts that were not related to subjective MOS but modeling of packet loss impact have been removed — they were citing one specific paper only. The reason for doing so is as follows: Due to the abundance of MOS models and the ever-changing nature of multimedia codecs and applications, a statement of "Codec X equals to Y MOS" will never be completely accurate. The purpose of this article therefore should not be a listing of MOS models or predictions. So, the MOS listings without context were also removed. For any questions, feel free to leave a message. Slhck (talk) 12:28, 6 January 2017 (UTC)Reply