Mathematics desk
< April 25	<< Mar \| April \| May >>	April 27 >

Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

April 26

Convert a computer file to a non-negative Integer number

How do I convert an ordinary computer file into a non-negative Integer number. What I'm interested in is the contents of the file. The name of the file is irrelevant and need not be saved.

For example: a file with the size of 1 byte and the value of 16 can be easily be converted to the binary number 10000 which is decimal value 16.

However, I realizes that this method does not work because what if I have

A file with size of 2 bytes and with the hexadecimal value of "00 10" would not this also convert to the decimal value 16.

Clearly I must somehow also encode the size of the files in bytes as well as the actual value of the file.

What is the best way of converting an ordinary computer file to a non-negative Integer number which uses the least amount of numerical digits? 122.107.207.98 (talk) 00:36, 26 April 2009 (UTC)[reply]

Quick and dirty fix; tack a 1 in front of every file. So the first file would become x110, in decimal 272. The second file becomes x10010, or in decimal 65552. Taemyr (talk) 00:42, 26 April 2009 (UTC)[reply]

Since the total number of files with n or fewer bits is 2ⁿ⁺¹-1, Taemyr's method is exceedingly close to optimal. McKay (talk) 01:37, 26 April 2009 (UTC)[reply]

I think that's wrong. Try it for n=3 or n=4; your constant is off by one. I think

\sum _{i=1}^{n}2^{i}=2*(2^{n}-1)

. .froth. (talk) 04:24, 26 April 2009 (UTC)[reply]

You forget the empty file. I have lots of them on my computer so I know they exist. :) McKay (talk) 08:23, 26 April 2009 (UTC)[reply]

Oh and have a look at arithmetic encoding ~~although it doesn't really help your integer situation~~ curse you tiredness it works fine. I suspect this is the optimal that taemyr's bit of waste approaches. .froth. (talk) 04:25, 26 April 2009 (UTC)[reply]

Taemyr, your method is brilliant.

no file has the decimal representation value of 0
a file with zero bytes has the decimal representation value of 1
a file with 1 byte and hex value of 00 has the decimal representation value of 256
a file with 2 bytes and hex value of 00 00 has the decimal representation value of 65536
a file with 1 byte has the decimal representation value ranging from 256-511
a file with 2 bytes has the decimal representation value ranging from 65536-131071

eh? what kind of file has the decimal representation value of 2 . It does seems to me that the range 2-255 is not being used. And the range 512-65535 is also not being used. 122.107.207.98 (talk) 10:46, 26 April 2009 (UTC)[reply]

Here's a better way: treat the file as a base-256 number, but with the bytes representing digit values of 1–256 instead of 0-255. This maps the empty file to 0, the one-byte files to 1–256, the two-byte files to 257–65792, and so on, and it's easy to compute:

   Integer int_of_file(FILE* f) {
       Integer n = 0, place = 1;
       int c;
       while ((c = getc(f)) != EOF) {
           n += place * (c + 1);
           place *= 256;
       }
       return n;
   }

   void file_of_int(FILE* f, Integer n) {
       while (n) {
           putc((n - 1) % 256, f);
           n = (n - 1) / 256;
       }
   }

That treats the file as little-endian. Big-endian is a bit more tricky as you have to search for the maximum place value in file_of_int. -- BenRG (talk) 11:04, 26 April 2009 (UTC)[reply]

For reference, here are the working python codes, ready to use

file2num.py

#!/usr/bin/python
import sys

if __name__ == "__main__":
  if len(sys.argv) != 2:
    print "Usage: file2num.py inputfile"
    sys.exit()
  num=0
  place=1
  myfile=open(sys.argv[1])
  while 1:
    char=myfile.read(1)
    if char:
      num = num + place * (ord(char) + 1)
      place = place * 256
    else:
      break
  print num

num2file.py

#!/usr/bin/python
from __future__ import division
import sys

if __name__ == "__main__":
  if len(sys.argv) != 2:
    print "Usage: num2file.py numberfile"
    sys.exit()
  numfile=open(sys.argv[1])
  num=int(numfile.readline())
  while num:
    c = (num - 1) % 256
    num = (num - 1) // 256
    sys.stdout.write(chr(c))

122.107.207.98 (talk) 03:14, 2 May 2009 (UTC)[reply]

Birthday paradox

If A doesn't share a birthday with B and B doesn't share a birthday with C, then there's no way A can share a birthday with C. Doesn't this ruin the calculations that "prove" the unintuitive result of the birthday paradox? Also there are circular relationships with 4 people, and 5 people, and n people.. What's the actual graph look like, or at least what's the 50/50 point? .froth. (talk) 03:26, 26 April 2009 (UTC)[reply]

Yes there is - if A and C are born on the 1st January, B is born on the 2nd of January, then A doesn't share a birthday with B who doesn't share a birthday with C, but A shares a birthday with C. 'Not sharing a birthday' does not have transitivity, whereas sharing a birthday does, so in fact sharing a birthday is an equivalence relation - don't expect to see it turning up on exams any time soon though... Otherlobby17 (talk) 03:45, 26 April 2009 (UTC)[reply]

Oh right >_< But sharing is transitive so should the graph actually be higher? .froth. (talk) 03:52, 26 April 2009 (UTC)[reply]

What makes you think that somehow the calculation would ignore the possibility of more than two sharers? It doesn't. (The actual, real-life, 50/50 point is a little higher due to leap days but may also be influenced by systematic biases in when people are born—but I'm pretty sure it's still above 23 and below 24.) —JAO • T • C 09:39, 26 April 2009 (UTC)[reply]

The Gateaux derivative - how is it defined?

Our article on the Gateaux derivative defines it this way:

A function f : U ⊂ V → W is called Gâteaux differentiable at

x\in U

if f has a directional derivative along all directions at x. This means that there exists a function g : V → W such that

g(h)=\lim _{t\to 0}{\frac {f(x+th)-f(x)}{t}}

for any chosen vector h in V, and where t is from the scalar field associated with V (usually, t is real).

My question is whether anyone can confirm that this is correct, because I thought it required that t approach 0 from above, ie. t is always positive. The reason is that the influence function is considered a special Gateaux derivative, and that definitely requires that t approach from above. Thanks in advance, It's been emotional (talk) 08:35, 26 April 2009 (UTC)[reply]

Well, the standard definition (for V,W Banach spaces or also TVS , U open in V) is, f is Gâteaux differentiable at

x\in U

iff what you wrote happens, with

g:V\to W

a linear continuous operator; in this case you may equivalently take t positive in the definition, for g(-h)=-g(h). (By the way, I do not like so much the distinction between G-derivative and G-differential as it is made in the link; if g is not a linear continuous operator, people would just say "f has directional derivatives in all directions h", with no further names for this). --pma (talk) 09:06, 26 April 2009 (UTC)[reply]

Aren't you describing the Frechet derivative? My understanding was that the Gateaux derivative differed from the Frechet derivative in that the derivative did not have to be linear.76.126.116.54 (talk) 20:51, 26 April 2009 (UTC)[reply]

No, both differentials are linear continuous operators, but Fréchet differentiability is a stronger condition, in that it is required that f(x+h)-f(x)-Lh=o(h) as h tends to 0. A standard example of a function on

\mathbb {R} ^{2}

that is differentiable in the origin

(0,0)

in Gâteaux but not in Fréchet sense, is

\textstyle {\big (}{\frac {x^{2}y}{x^{4}+y^{2}}}{\big )}^{2}

, which is not even continuous. --pma (talk) 21:26, 26 April 2009 (UTC)[reply]

Thanks, pma that's much clearer (though I'm still trying to work out if the function you gave really is discontinuous, rather than just not Frechet differentiable). I got the stuff I cut and pasted from our article Frechet derivative, which seems completely wrong. If you can confirm that for me, I'll get to work editing it, either soon or at least I'll make a note of it for when my thesis is done (I'm under the hammer at the moment). I'll add an acknowledgement of you and the ref desk for the help too. Thanks also to 76.126, because I also had the same question. It's been emotional (talk) 08:27, 27 April 2009 (UTC)[reply]

Well, maybe it's not completely wrong, but it uses a definition of Gâteaux derivative that is not the standard one. Also, usually differentiability or derivability are synonymous (both in the F. and in the G. context). At most, some authors distinguish between "differential" and "derivative", preferring the latter for functions of one (real or complex) variable, so that the differential is always the linear map, and the derivative is the usual limit vector, the two being linked by the identities df(x)[h]=f'(x)h and f'(x)=df(x)[1]). Great books on differential calculus in Banach spaces: Cartan; Dieudonné; also, the first chapter of Hörmander has a short but complete and perfect introduction. Going back to the function of the example, I think it is constant on the graph of any parabola (x,cx²), x>0, with a constant depending on c (this shows the discontinuity at the origin). --pma (talk) 11:57, 27 April 2009 (UTC)[reply]

Thx, all clear now! I think I'll at least tag that page, because it does need editing - it's inconsistent with the page on Gateaux derivatives. cheers, It's been emotional (talk) 02:35, 29 April 2009 (UTC)[reply]

Pólya enumeration theorem

I am having trouble in reconciling the Pólya enumeration theorem I am reading from my book, and what is given here on wikipedia.

My book states: Suppose S is a set of n objects and G is a subgroup of the symmetric group S_n. Let P_G(X) be the cycle index of G. Then the pattern inventory for the nonequivalent colorings of S under the action of G using colors y₁, y₂...y_m is

P_{G}(\sum _{i=1}^{m}y_{i},\sum _{i=1}^{m}y_{i}^{2},\cdots ,\sum _{i=1}^{m}y_{i}^{n})

Here a pattern inventory of the colorings of n objects using the m colors is the generating function: $H(y_{1},y_{2},\cdots y_{m})=\sum _{v}a_{v}y_{1}^{n_{1}}y_{2}^{n_{2}}\cdots y_{m}^{n_{m}}$ . The sum here runs over all vectors $v=(n_{1},n_{2}\cdots n_{m})$ of nonnegative integers satisfying $n_{1}+n_{2}\cdots n_{m}=n$ ; $a_{v}$ represents the number of nonequivalent colorings of the n objects where the color $y_{i}$ occurs precisely $n_{i}$ times. For example by looking at the pattern inventory of the colorings of 4 objects (beads) by 3 colors (r,g and b) and taking G=D₄, we can see that as the coefficient r²gb is 2 so there are two necklaces with 4 beads possible using these three colors. I understand this fully.

The wikipedia article however seems to take a more general approach. It starts off with two sets X and Y. I assume that X stands for the 4 beads and Y the colors {r,g,b}. The colors are then accorded some weights. Then the colorings are also assigned weights. Now it defines a generating function c(t) whose coefficients are the number of colors of a particular weight. I am having difficulty in renconciling this with what I have understood from my book. Specifically it would help if someone could clarify these things to me:

~~Does the WP aricle takes the approach that the number of possible colors are infinite?~~
What are the weights in the necklace problem that I have outlined?
What do weights signify in general?
How is my book's PET equivalent (or a special case of) WP's PET?

Thanks--Shahab (talk) 09:44, 26 April 2009 (UTC)[reply]

Example in the article about the Hermite interpolation

I looked at the article about Hermite interpolation [1] and I tried to reproduce the example, however I could not figure out where the 28 comes from (third row, fourth column).

It's also not clear for me what the columns (after the x and f(x) values) contain. The column which starts with -8 contains the first derivative, if the values on the left are equal to the values on the left one row above then it seems to contain the derivative, but otherwise?

It would be great if u could explain this a bit, so we could edit the example and make it easier to understand.

Thx in advance!

--F7uffyBunny (talk) 18:27, 26 April 2009 (UTC)[reply]

--- http://en.wikipedia.org/wiki/Hermite_interpolation

Figured it out and updated the article —Preceding unsigned comment added by F7uffyBunny (talk • contribs) 21:06, 26 April 2009 (UTC)[reply]

While waiting for a more specific answer to your question, notice that you can easily make free links using double square brakets. As to the Hermite interpolation, have also a look to Chinese theorem#Applications. --pma (talk) 21:10, 26 April 2009 (UTC)[reply]

Digit distribution

Periodically in "detective stories" you get a plot involving someone faking a set of accounts or some other list of numbers and they don't meet the normal statistical usage of digits wherein 1 is much more frequent than 9. Two parts to this: (1) does this apply for item sales records where I would expect an excess of ".99" to come up since stores love this price break. (2) If the list is the sort where that analysis applies, how should you fake it from a list of random numbers wherein each digit is equally likely - no I'm not planning anything fraudulent. -- SGBailey (talk) 22:10, 26 April 2009 (UTC)[reply]

Benford's law is about the leading digit. I haven's seen statistics about prices but I would guess it partially applies there. Selection of prices just below a round number could very well cause a deviation from it. PrimeHunter (talk) 23:09, 26 April 2009 (UTC)[reply]

For realistic distribution of the first digit, use 10**random(). Then you might want to round the result to an integer, and subtract .01 or .05. You'll also need to adjust this formula somehow for a distribution of magnitudes (number of digits). —Tamfang (talk) 23:42, 26 April 2009 (UTC)[reply]

Wikipedia:Reference desk/Archives/Mathematics/2009 April 26

Contents