Talk:Floating-point arithmetic/Archive 1

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

→

Archive 5

is 9/7 a floating Point Number

Latest comment: 17 years ago1 comment1 person in discussion

9/7 ist not a floating Point Number!? -WeißNix 15:48, 5 Aug 2003 (UTC)

Not quite sure what you're saying there WeißNix, but if you're saying 9/7 isn't a floating point number I'd agree. 9/7 would seem to represent nine sevenths perfectly, where as any floating point representation would have a finite accuracy - this is what I believe, anyway. Tompagenet 15:54, 5 Aug 2003 (UTC)

nit pick - the floating point representation of 9/7 in base 7 is 1.2e+0 8-) AndrewKepert

or in base 49, 9/7 is 1.e+0.... But perhaps the clear statement is "With base 2 or 10 the number 9/7 is not exactly representable as a floating point number". The one hiccup is that you might like to think of real numbers as "floating point numbers with infinitely long significands". But since that doesn't really add anything (other than a representation) over the notion of a real number, it isn't clear that this is a useful way to think about the subject. Jake 20:56, 3 October 2006 (UTC)

What is Absorption?

I do not understand the sentence "Absorption : 1.10¹⁵ + 1 = 1.10¹⁵" in the section Problems with floating point. I guess that absorption (with a p) means that two different real numbers are represented by the same floating point number, but 1.10¹⁵ is approximately 4.177 so I cannot see how that is the case here. Jitse Niesen 22:42, 8 Aug 2003 (UTC)

The author meant 1.10¹⁵ to be interpreted as 1 * 10¹⁵ (=1,000,000,000,000,000). The point they're making is that adding a relatively small number to a larger stored number can result in the stored number not actually changing. A dot is often used as a symbol for multiplication in (especially pure) maths, the article might be clearer with a different representation though. Hope that's some help -- Ams80 21:52, 13 Aug 2003 (UTC)

Ah, now I understand, thanks. I changed 1.10¹⁵ to 1·10¹⁵, as this is how multiplication is customary denoted. Indeed, another representation (10¹⁵ and 1E15 come to mind) might be clearer, but I am not decided. -- Jitse Niesen 23:29, 13 Aug 2003 (UTC)

Seems to me that × should be used, following the precedent in the first line of the Representation section of the article. I'll make that change (and elsewhere in the article, as needed)... Mfc

Numbers don't represent themselves?

Shouldn't the talk page be used for discussion of the content of the article, and not the article itself? There is request for improvement at the bottom (that I agree with, but am not in a position to help) and the recent addition by User:Stevenj, which is really a meta-comment, somewhat indignant, and factually incorrect:

Many a textbook asserts that a floating-point number represents the set of all numbers that differ from it by no more than a fraction of the difference between it and its neighbors with the same floating-point format. This figment of the author’s imagination may influence programmers who read it but cannot otherwise affect computers that do not read minds. A number can represent only itself, and does that perfectly. (Kahan, 2001)

The problem is that numbers don't represent themselves. Numbers are represented by numerals, either present as marks on a bit of paper, glowing phosphor or electrons hanging around on a transistor. In the present case I think the "many a textbook" is actually correct, as the point of doing a floating point calculation on a computer is to get an approximation to the answer. The operands are real numbers, and the putting of these values into computer memory is the "representation" step. It is here the range of values gets condensed to a single approximating value, as present in the hypothetical text-book. The operations of real arithmetic correspond to the algorithms the chip uses to manipulate the representations, and the output is a representation of the answer. It may not be a very good representation (in that it may say 0.333333333 where the answer is 1/3) but for most people it is good enough. So anyway, I am removing it and if others feel something similar belongs it is their call. --AndrewKepert 07:42, 22 Sep 2003 (UTC)

Try reading Kahan's articles (please tell me you know who he is!). The point is that the computer stores an exact representation of certain rational numbers; the idea that "every floating point number is a little bit imprecise", that all floating point calculations are a necessarily an "approximation", indicates muddy thinking on the part of the programmer (or textbook author), not what the computer actually does, and is an old and common source of confusion that Wikipedia should resist. Arguing about the physical storage of the number is beside the point; the issue here is the logical representation (a certain rational value), not the physical one. —Steven G. Johnson

Strictly speaking, a number can possess neither precision nor accuracy. A number possesses only its value. (Kahan)

Hi Steven, No I am not familiar with Kahan. Let me explain my viewpoint on this as a mathematician. As you point out, we are not talking about numbers here, we are talking about representations of numbers - i.e. numerals. They are numerals no matter what matter is used to store and/or display them. Yes, a numeral represents a unique number precisely (in the case of floating point representation, a unique rational number). The twist is that it is also used to represent a whole stack of other numbers imprecisely. The imprecision may come from the input step (e.g. A/D conversion) it may come from the algorithms implemented in hardware or software or it may come from inherent limitations in the set of numbers that are represented precisely. This is not a criticism of any representation - it is a feature. We would be well and truly stuck if we insisted on representing every number precisely in any of these steps (you want to use π? Sorry Dave, I can't give you π precisely, so you can't use it).

My original statement was that the self-righteous quote from Kahan was muddying the waters between numeral and number. While it is a valid viewpoint from a particular point of view (e.g. if you restrict "real number" to mean "the real numbers that I can represent with a 64-bit float") and useful you are focussing on what is going on inside the computer (which is presumably why Kahan objects to it being the starting point for programmers) it is really out of place in an introductory statement on what floating point representation is about. The content of our current discussion bears this out - there are many subtle issues involved with the simple act of "writing down a number", of which we are scratching the surface. These are not useful to the layperson who is wondering what floating point representation is and how it works. Maybe the issue is worth raising (maybe even in a balanced fashion) further down the article, but not at the top.

...and I agree with your final quote from Kahan. A number possesses only its value. However, a representation of a number has accuracy. A system that uses numerals to represent numbers (aka a numeral system) has precision.

Is my point of view clear from this?

Work calls ... --AndrewKepert 06:32, 23 Sep 2003 (UTC)

First of all, Kahan is one of the world's most respected numerical analysts and the prime mover behind the IEEE 754 floating-point standard; he's not someone whose opinion on floating-point you dismiss lightly. Also, he is an excellent and entertaining writer, so I highly recommend that you go to his web page and read a few of his articles. I think that your final statement gets to the core of your fallacy: you say that a representation of a number has accuracy, but accuracy is meaningless out of context. "3.0" is a 3-ASCII character representation of the number 3. Is it accurate? If it's being used in place of π it's pretty poor accuracy. If it's being used to represent the (integer) number of wheels on a tricycle, it has perfect accuracy. Floating-point is the same way: it stores rational numbers, and whether these are "accurate" are a statement about the context in which they are used, not about the numbers or the format in which they are stored. Saying that they are used to represent a whole stack of other numbers imprecisely is describing the programmer's confusion, not the computer, nor even a particular problem (in which, at most, the floating-point number is an approximation for a single correct number, the difference between the two being the context-dependent accuracy). —Steven G. Johnson

The relevant context is very simple. Can the binary representation "accurately" reproduce the numeric literal given in the programming language. For example, when you do X=0.5, you will get a 100% accurate floating point number because of the conversion between the decimal notation and the binary notation is exact. However, if you do X=0.3, you can never get decimal 0.3 exactly represented in binary floating point notation. So my intepretation of "accurate" is the preservation of the same value between the programmer specification and machine represenation. 67.117.82.5 20:29, 23 Sep 2003 (UTC)

You're not contradicting anyone. The point is not that there is no such thing as accuracy, but that it is a property of the program/context and not just of the floating-point number. (Although determining accuracy is usually more difficult than for the trivial case of literal constants.) It's nonsense, therefore, to say that a floating-point number has a certain inherent inaccuracy, and this is the common fallacy that the original quote lampoons (among other things). It's not just semantics: it leads to all sorts of misconceptions: for example, the idea that the accuracy is the same as the precision (the machine epsilon), when for some algorithms the error may be much larger, and for other algorithms the error may be smaller. —Steven G. Johnson

Now we're getting somewhere. In my point of view (a pure mathematician, not a numerical analyst) there is no such thing as a floating point number. There is only a floating point representation of a number. I know there is confusion about accuracy vs precision. Obviously I am confused by your point of view and you by mine.

Let me pull apart your example from my viewpoint. The number 3 (as in this many: ***) can be represented by the numeral "3.0" with very good accuracy. (error=0 etc). The number π (as in the ratio of circumference to diameter) can also be represented by the numeral "3.0" with bad accuracy (error=0.14159...). The numeration system whereby real numbers are represented by ascii strings of the form m.n has precision 0.05 (or 0.1 depending on how you define precision). If you were looking for the best representation of π in this numeral system, you can use "3.1" which has accuracy less (better) than the precision of the numeration system. All of this follows from my statement about accuracy and precision in my view. Please point out why you consider this point of view fallacious? I want to understand!

I have no doubt about Kahan's credentials, I just am not a numerical analyst, so I haven't read anything on numerical analysis since I was an undergrad. I know very little about lots of things, and a lot about a few things. My point is that hairy-chested statements that are debatable when viewed from a different angle should not be up front in wikipedia.

-- AndrewKepert 01:36, 24 Sep 2003 (UTC)

P.S. broadly speaking I agree with the quotes you gave from Kahan. Textbooks are incorrect to say "floating point number having accuracy" as once the number is only seen as the exact value represented by the numeral stored in the bytes, it has lost its original context. If you take my point of view (very literal, admittedly) then the principal error is to say "floating point number". His last statement in that first quote about numbers only representing themselves is what I consider to be incorrect. Numbers don't represent numbers (well, they can, but not in the context of floating point representation). If you take the (fictional, but practical) viewpoint that the floating point representation is the number then you can't talk about numbers you care about such as π. However when you take this viewpoint to its extreme, then you conclude that there is no such thing as accuracy or precision! Your final quote from Kahan I agree with totally, for reasons I have already stated (accuracy: property of a representation, precision: property of a method of representation). -- AndrewKepert 02:22, 24 Sep 2003 (UTC)

(Restarting paragraph nesting.)

I think the confusion in the language between us is due to the fact that there are two things going on that could be called "representation". First of all, the binary floating-point description, the "numeral" to use your term, has an unambiguous injective mapping to the rationals (plus NaN, Inf, etc.): a floating-point numeral represents a rational number (the "floating-point number"). This representation has a precision, given by the number of digits that are available. Moreover, it is defined by the relevant floating-point standard; it is not context-dependent. Second, in a particular application, this rational number may be used to approximate some real number, such as π, with an accuracy that may be better or (more commonly) worse than the precision. That is, the rational number may represent some real number, but this mapping is highly context dependent and may not even be known.

It is this second mapping that engenders so much confusion in general, and the common fallacy of textbook authors is to conflate it with the first: to say that the finite precision implies that a floating-point numeral intrinsically represents a corresponding range of real numbers. Kahan was being somewhat glib (he wasn't giving a formal definition, and being pedantic about the numeral/number distinction is beside the point), but I think his basic point is valid. The misconceptions he is trying to combat are the ideas that floating-point numbers are intrinsically inaccurate, or that they somehow represent a "range" of real numbers simultaneously (at most, one floating-point number represents one real number, with a certain accuracy, in a context-dependent fashion), or that they can only represent real numbers in the range given by their precision.

Moreover, the computer doesn't "know" about this second putative representation; as far as it's concerned, it only works with rational numbers (which "represent only themselves"), in the sense that its rules are based on rational arithmetic, and it's misleading to imagine otherwise. (There are alternative systems called interval arithmetic that do have rules explicitly based on representing continuous ranges of real numbers, but these behave quite differently from floating-point.)

In any case, upon reflection I agree that the quote is probably too provocative (and, perhaps, easily misunderstood out of context) to put up front. But I still think it is good to actively combat such common mistakes.

In the same PDF file, he gives a list of "prevalent misconceptions". I think it's worthwhile reproducing it here. It was intended to be provocative, I suspect, and the last three points are more understandable in the context of his presentation (a criticism of Java's fp specs), but it is thought-provoking.

—Steven G. Johnson

Prevalent Misconceptions about Floating-Point Arithmetic

(by William Kahan, in [1])

Because they are enshrined in textbooks, ancient rules of thumb dating from the era of slide-rules and mechanical desk-top calculators continue to be taught in an era when numbers reside in computers for a billionth as long as it would take for a human mind to notice that those ancient rules don't always work. They never worked reliably.

Floating-point numbers are all at least slightly uncertain.
In floating-point arithmetic, every number is a "Stand-In" for all numbers that differ from it in digits beyond the last digit stored, so "3" and "3.0E0" and "3.0D0" are all slightly different.
Arithmetic much more precise than the data it operates upon is needless, and wasteful.
In floating-point arithmetic nothing is ever exactly 0; but if it is, no useful purpose is served by distinguishing +0 from –0.
Subtractive cancellation always causes numerical inaccuracy, or is the only cause of it.
A singularity always degrades accuracy when data approach it, so "Ill-Conditioned" data or problems deserve inaccurate results.
Classical formulas taught in school and found in handbooks and software must have passed the Test of Time, not merely withstood it.
Progress is inevitable: When better formulas are found, they supplant the worse.
Modern "Backward Error-Analysis" explains all error, or excuses it.
Algorithms known to be "Numerically Unstable" should never be used.
Bad results are the fault of bad data or bad programmers, never bad programming language design.
Most features of IEEE Floating-Point Standard 754 are too arcane to matter to most programmers.
"Beauty is truth, truth beauty. — that is all ye know on earth, and all ye need to know." ... from Keats' Ode on a Grecian Urn. (In other words, you needn't sweat over ugly details.)

From AndrewKepert 01:11, 25 Sep 2003 (UTC)

Okay, I think we are getting closer to talking the same language. I think I now understand what you said (thank god - we're both smart, both have a mathematical background ... 8-) ). As I see it, a representation in this context is the mapping (surjection) number -> numeral, whereas your emphasis is on the partial inverse to this mapping numeral -> number. (BTW "numeral" is not my term - it is the standard term for a written (in this case bitted) representation of a number.)

My biggest problem here is your use of the term the mapping as if it were uniquely defined. There is a unique surjective mapping from R to numerals that you can define, of course, which is used when you want to represent a known literal like π. (In this mapping, accuracy is ordinarily at least as good as the precision.) But this mapping does not generally reflect the numbers that can be represented by a given numeral in a program; more on this below. —Steven G. Johnson

I have two reasons for this:

1. Representation is the standard way of downgrading from an abstract entity (such as a group, topological algebra, or whatever) to something more tractible (space of functions, etc -- see for example group representation). So, "representation" means a mapping from the set of things being represented to another set of objects that are used represent them. "Representation of ..." is also commonly used for elements of the range of this mapping, and is the context in which we have both been using it above.

The difference in the direction of the mapping is superficial; we both agree that the numeral represents a number and not the other way around, and the direction of the mapping just indicates which problem you are solving. The main disagreement is about which number(s) are represented. Or, more to the point, about the meaning of "accuracy" even if we agree that the obvious numeration system is the "representation." See below.

2. As I see it, "numbers" stored in computers are there as representations of quantities. They only have meaning in relation to what they represent. Kahan and his colleagues did not dream up IEEE754 as an empty exercise in self-amusement. They wanted a usable finitely-stored representation of R. The only way R can be represented by finite bit-strings is accept some imprecision, so this is what is done.

But the imprecision does not tell you the error, or even bound it. See below.

Representations don't have to be faithful (one-to-one), and if we were ambitious, we could also allow them to be multi-valued, as would be required for π to have several acceptable approximate representations with accuracy close to the precision of the numeration system. However, I don't see multiple values as the main issue.

So given this, a numeral system (or I prefer the french phrase "System of Numeration": fr:Systèmes de numération) is a representation - a function from a set of numbers to a set of numerals. Then "accuracy" is a property associated with the representation of a number x (the difference between x and the rational number associated with the numeral for x) and "precision" is a property associated with the representation function itself (an upper bound to the accuracy, possibly over a certain interval in the domain, as is required in floating point repn). These definitions do require the function numeral -> number, as you need to have "the rational number associated with the numeral". So in all, I don't think it is pedantry to insist on the distinction between number and numeral, as it is what the concept of accuracy relies on, which was your initial point.

Precision is neither an upper nor a lower bound to the accuracy compared to the number that is actually desired, i.e. in any useful sense. You're looking at the accuracy of the wrong mapping. See below.

(Note that the IEEE 754 standard speaks of floating point formats as being representations for only a finite set of real numbers. Not that you are wrong, but you know as well as I that there is no universal, unambiguous, formal meaning of the word "representation", even in mathematics; let's not get hung up on finding the "correct" terminology, especially since I think such a focus misses the point here. English requires context and intelligence to interpret.)

So with "representation" in my sense, the hypothetical text-book authors would be correct to say something like a floating-point numeral/representation represents the set of all numbers that differ from it by no more than a fraction of the difference between it and its neighbors with the same floating-point format, as the real numbers in some interval ( a2ⁿ - ε , a2ⁿ + ε ) all have the same floating point representation, looking something like +a × 2ⁿ. Having said this, I agree with you and Kahan that this misleading. From my point of view the correct statement would be to change the viewpoint a number represented in floating point shares its representation with the set of all numbers ... although it is still clumsy in this form. A better starting point is to ignore the distinction between number and numeral and just get stuck into coding. When accuracy becomes an issue, the idea that we are only using a representation of a number (i.e. a numeral) is the clearest way I see of opening the gap between "number" and "numeral" and thus being able to talk about accuracy and precision.

Your "corrected" version is still wrong; or, at least, it doesn't reflect how floating-point numbers are actually used except for declaring constants. This is the heart of our disagreement. Suppose I have a program F to compute some invertible function f(x), which has a well-defined result y in the limit of arbitrary-precision arithmetic (for exactly specified x). In floating-point arithmetic, I get some rational result Y which is an approximation (a representation of sorts) of the desired result y, but which is not necessarily the same as the rational representation of y that the numeral system by itself would dictate. That is, if the numeral system's representation mapping was N(y), the "representation" mapping of the program answer could be written F(N(f^–1(y))). (But I don't think defining the approximation as such a formal representation mapping is always desirable or practical.) Depending upon the accumulation of errors and the cleverness of the programmer, the error |Y-y| may be arbitrarily large; it certainly is not bounded by the machine ε (even if you look at relative error). ε only describes the precision with which a number is specified, not the accuracy; the fallacy is to conflate the two.

I can't read Kahan's mind, nor am I a numerical analyst myself, but as far as I can tell he would say simply that the floating-point numeral represents unambiguously only that single rational number, with some precision. It may be used to approximate some other real number (which may or may not be exactly representable), with some context-dependent accuracy. But I think that the main point is not terminology of "representation", but rather to divorce the concepts of precision and accuracy. —Steven G. Johnson

Lots of interesting issues raised here. I think it would make a good paragraph or three in the article. -- AndrewKepert 01:11, 25 Sep 2003 (UTC)

I really strongly recommend that anyone interested takes a few minutes to read a little bit from Kahan's web page, say pages 25-45 of this presentation. It's quite entertaining, light reading, and illuminates some of these issues (although it's not a paper on formal representation theory). —Steven G. Johnson

Quick response: Thanks for the example -- a sketch of it (as a commutative diagram figure, except for the fact that it doesn't quite commute) is basically how I was thinking of algorithms and accuracy, except from the other side of the square. If you are thinking of the algorithms for manipulating floating point numbers, yours is the natural side of the square to be working. If you are thinking of the real numbers that is the reason you wrote the code for, the algorithms are approximations to the true arithmetic (etc) operations on R. I still think my "correction" is correct -- it was poorly worded, as I was trying to paraphrase. Anyway I will check my reasoning on this when I have the time (not now). I did catch your point about the problem with my referring to "the floating point representation of a number, as in practise a single numeration system can have many usable "approximate representations" of a given number - thus my earlier comment on multi-valued functions.

I intend to continue this conversation later - I am away for the next week. (No phone, no e-mail, no web - bliss!) -- AndrewKepert 08:53, 25 Sep 2003 (UTC)