Perhaps you have seen one of those math problems that says, "What's the next number in this series: 2, 6, 12, 20, ...". Or in the real world, scientists and engineers routinely find a set of numbers through experiments and would then like to find a formula that fits these numbers. A few years ago I developed a simple (though admittedly sometimes tedious) technique for finding a formula to fit a set of numbers.
Disclaimer: While I developed this technique myself, the mathematics behind it is simple enough that I would not be in the least surprised to learn that someone else invented it before me. If anyone reading this is aware of this or a similar technique being published elsewhere, I'd be interested to here about it.
A mathematically astute person might immediately object that what I am saying is impossible, that there is no "one formula" that fits any given set of numbers. This is quite true. What this technique finds is the simplest polynomial that fits the numbers. A polynomial is an equation in the form
a_{0} + a_{1}n + a_{2}n^{2} + a_{3}n^{3} + a_{4}n^{4} + ...
To explain my technique, let's work through an example. Suppose you are given the following series of numbers and want to find a formula for them, and then compute the next number in the sequence.
2, 8, 9, 11, 20
The first step is to arrange them in a column. To the left of this column we write an ascending list of counting numbers, like this:
1 | 2 |
2 | 8 |
3 | 9 |
4 | 11 |
5 | 20 |
We number the column with the original values "0". We then create a column 1, where each value is the difference between each pair of values in column 0. Always take the lower number minus the upper number. That is, we first compare the first two numbers: 2 and 8. 8 - 2 = 6, so we write 6 to the right of and somewhat between 2 and 8. Then we compare the third number, 9, to the second number 8. 9 - 8 = 1. We proceed down the column this way, like so:
0 | 1 | |
---|---|---|
1 | 2 | |
6 | ||
2 | 8 | |
1 | ||
3 | 9 | |
2 | ||
4 | 11 | |
9 | ||
5 | 20 |
Now we make a column 2 by taking the differences between the values in column 1, like so:
0 | 1 | 2 | |
---|---|---|---|
1 | 2 | ||
6 | |||
2 | 8 | -5 | |
1 | |||
3 | 9 | 1 | |
2 | |||
4 | 11 | 7 | |
9 | |||
5 | 20 |
Note that negative results -- as in 1 - 6 -- are normal and expected.
We continue making new columns in this way until all the values in a column are the same. In this case, that just takes one more column.
0 | 1 | 2 | 3 | |
---|---|---|---|---|
1 | 2 | |||
6 | ||||
2 | 8 | -5 | ||
1 | 6 | |||
3 | 9 | 1 | ||
2 | 6 | |||
4 | 11 | 7 | ||
9 | ||||
5 | 20 |
Recall that subtracting a negative is the same as adding a positive. Thus 1 minus negative 5 = 1 + 5 = 6.
What happens, you may ask, if we never reach a point where all the numbers in a column are the same? Simple: sooner or later we will get to the point where there is only one number in the column. As this number must always be the same as itself, then we are done. More on this later.
We are now ready to construct the first term of the polynomial. The column number where we stopped is the power of n. In this case we stopped at column 3, so the first term involves n^{3}. We multiply this by the value in the column divided by the factorial of the column number. (A "factorial" is the product of all the integers from 1 to that number. For example, 5 factorial -- which is written "5!" -- is 5 x 4 x 3 x 2 x 1 = 120.) The value in this column is 6 and 3! is 6, so our first term is:
6 / 3! x n^{3}
= 6 / 6 x n^{3}
= n^{3}
Now we evaluate this term for 1, 2, 3, and so on, and subtract each result from the corresponding starting value.
We started with value #1 was 2. So we plug 1 into n^{3} = 1^{3} = 1, then subtract this from 2, 2 - 1 = 1.
Value #2 is 8. So we plug 2 into n^{3} = 2^{3} = 8. Subtract this from 8, 8 - 8 = 0.
We proceed thus through all the values:
Original - n^{3} | |
---|---|
1 | 2 - 1^{3} = 2 - 1 = 1 |
2 | 8 - 2^{3} = 8 - 8 = 0 |
3 | 9 - 3^{3} = 9 - 27 = -18 |
4 | 11 - 4^{3} = 11 - 64 = -53 |
5 | 20 - 5^{3} = 20 - 125 = -105 |
We create a new table from these values, calling these our new column 0. Like so:
0 | |
---|---|
1 | 1 |
2 | 0 |
3 | -18 |
4 | -53 |
5 | -105 |
Now we go through the same process as we did for the first table.
0 | 1 | 2 | |
---|---|---|---|
1 | 1 | ||
-1 | |||
2 | 0 | -17 | |
-18 | |||
3 | -18 | -17 | |
-35 | |||
4 | -53 | -17 | |
-52 | |||
5 | -105 |
Applying the same rules gives our second term:
-17 / 2! x n^{2}
= -17/2 x n^{2}
Note that this table was completed in fewer columns than the first table. That is, the first table ended with column 3, while this table ended with column 2. Every table must end with fewer columns than the previous table. If that doesn't happen, you have made an arithmetic mistake. (Otherwise you could get two n^{3} terms or whatever, which doesn't make sense.)
Again we plug the successive integers into this term and subtract from the starting values. Note we subtract with the values that we started with on this table -- we don't go back to the original values. Note that we are subtracting a negative, which is the same as adding a positive.
Original - -17/2 n^{2} | |
---|---|
1 | 1 + 17/2 1^{2} = 1 + 17/2 = 19/2 |
2 | 0 + 17/2 2^{2} = 0 + 34 = 34 |
3 | -18 + 17/2 3^{2} = -18 + 153/2 = 117/2 |
4 | -53 + 17/2 4^{2} = -53 + 272/2 = 83 |
5 | -105 + 17/2 5^{2} = -105 + 425/2 = 215/2 |
We build a new table with these values and go through the whole process again:
0 | 1 | |
---|---|---|
1 | 19/2 | |
49/2 | ||
2 | 34 | |
49/2 | ||
3 | 117/2 | |
49/2 | ||
4 | 83 | |
49/2 | ||
5 | 215/2 |
Giving our next term as:
49/2 / 1! x n^{1}= 49/2 n
Plug integers into this term to get the next table:
Original - 49/2 n | |
---|---|
1 | 19/2 - 49/2 x 1 = 19/2 - 49/2 = -15 |
2 | 34 - 49/2 x 2 = 34 - 49 = -15 |
3 | 117/2 - 49/2 x 3 = 117/2 - 147/2 = -15 |
4 | 83 - 49/2 x 4 = 83 - 98 = -15 |
5 | 215/2 - 49/2 x 5 = 215/2 - 245/2 = -15 |
Building another table and following through the process yields the simple result:
0 | |
---|---|
1 | -15 |
2 | -15 |
3 | -15 |
4 | -15 |
5 | -15 |
This is simple, of course, because column 0 already contains all the same value, so we are quickly done.
This gives a final term of:
-15 x n^{0}= -15
(Recall that any number to the zero power is simply one, and can be dropped from the expression.)
Once we get a table that is complete in the zero column, we are done. So now we put all these terms together to get the full equation:
n^{3} - 17/2 n^{2} + 49/2 n - 15
To check, try this on each value:
Original | Formula | |
---|---|---|
1 | 2 | 1^{3} - 17/2 x 1^{2} + 49/2 x 1 - 15 = 1 - 17/2 + 49/2 - 15 = 2 |
2 | 8 | 2^{3} - 17/2 x 2^{2} + 49/2 x 2 - 15 = 8 - 34 + 49 - 15 = 8 |
3 | 9 | 3^{3} - 17/2 x 3^{2} + 49/2 x 3 - 15 = 27 - 153/2 + 147/2 - 15 = 9 |
4 | 11 | 4^{3} - 17/2 x 4^{2} + 49/2 x 4 - 15 = 64 - 136 + 98 - 15 = 11 |
5 | 20 | 5^{3} - 17/2 x 5^{2} + 49/2 x 5 - 15 = 125 - 425/2 + 245/2 - 15 = 20 |
To find the next value in the table is now simply a matter of plugging 6 into the formula:
Original | Formula | |
---|---|---|
6 | ? | 6^{3} - 17/2 x 6^{2} + 49/2 x 6 - 15 = 216 - 306 + 147 - 15 = 42 |
And we see that the answer is the near-mythical value, 42.
As I noted earlier, a table is complete when all the values in a column are the same. If you come to a single value in a column, then it is the same as itself, and the table is complete. Mathematically, this is a perfectly valid solution. But if this happens, it should introduce some doubt into your mind. Do the numbers really follow some pattern, or am I just coming up with a formula to fit a set of random numbers? It is, after all, possible to come up with a formula for any set of numbers. The fact that this technique works should demonstrate that: pick any set of numbers, run them through this method, and you will get a formula to describe them. If you get to a column with two or three numbers the same, this is a good indication that there really is a pattern described by this formula. You would be unlikely to get such a result by chance.
Note this also illustrates the fact that any problem asking "what's the next number in this sequence" has an infinite number of possible answers. If someone gives you the sequence, say, "1, 4, 9, 16", you could run them through the above process and get the answer that the person is probably looking for: the rule is n^{2} so the next value is 25. But you could also invent any number as the next number in the sequence, say 42, and come up with a rule for "1, 4, 9, 16, 42". Feel free to work it out. It comes out to:
17/24 n^{4} - 85/12 n^{3} + 619/24 n^{2} - 425/12 n + 17
and the next term is then 121.
So if you want to be obnoxious, the next time you are given a quiz of "find the next number in the series" problems, just pick any number you like and fill it in, and you'll be completely correct. You'll probably get a failing grade on the test, but you can enjoy the smug satisfaction of knowing you were right.
In the above discussion I've been assuming that all numbers were exact. For a math problem, this would normally be the case. But if you've gotten a set of numbers from real-world measurements, there is probably some margin of error on your measurements. In this case, you run into an additional problem: If two numbers look close, are they equal within some margin of error? Or are they really different? And if you do consider them equal, what value do you use when you proceed to the next step?
I've tinkered with setting a margin of error, for example, differences less than .001 are considered insignificant, and then averaging the numbers together to go to the next step. I haven't found any solution better than trial and error for selecting the margin of error. If the next table does not end in fewer columns than the previous, then the margin of error was not correctly chosen. Perhaps there is something more rigorous one could do here; I haven't pursued it.
While I haven't constructed a formal mathematical proof that this works, the principle is simple if you know a little calculus.
When we take the difference between pairs of values, we are, in effect, finding the derivative. You may note that the value you get is the derivative at some value in between the two values where you took the difference. That is, when you take the difference between the Y values corresponding to X values of, say, 2 and 3, you get the derivative at some X between 2 and 3. Not necessarily 2.5, but between 2 and 3.
When you continue taking differences until you get a constant, this is the equivalent of taking derivatives until you get to a constant function.
So for example, suppose the equation is y=2 x^{3} + 3 x - 7. (Of course the whole point is that we don't know that initially, but let's suppose that's what it is.) So as we take successive differences, we get:
Column | Equation |
---|---|
0 | y = 2 x^{3} + 3 x - 7 |
1 | y' = 6 x^{2} + 3 |
2 | y'' = 12 x |
3 | y''' = 12 |
Thinking backwards, then, we see that when we reach a constant, the column number must be the highest power in the original equation. Each time we take a derivative, we multiply by the exponent, so if we take derivatives all the way to a constant, we will have multiplied by all the integers from 1 to the original power, that is, by the power factorial. So when we get to a constant row, we know that the first term in the original equation must have been this constant times x to the column power divided by that power factorial, that is, if k is the constant and p is the column number, the first term must have been (k / p!) x^{p}. In this example, the final column would be column 3 which would contain all 12's. We get (12 / 3!) x^{3} = (12 / 6) x^{3} = 2 x^{3}. Which indeed is the first term.
If we substract this value from each of the original Y's, then we are effectively dropping the first term from the equation. If we run through the whole process again, this should find us the second term. Etc.
Of course, I don't expect you to do all this arithmetic yourself. That's what computers are for.
So I've written a little program to do the work for you.
© 1999 by Jay Johansen
Fiaz Dec 3, 2010
Thanks for your excellent article at http://www.johansens.us/sane/technotes/formula.htm
I couldn't really understand why this works.. but I could follow through your argument and solve for any future such cases.
For your reference, I also happen to see a very concise and sweet method in here..
http://www.teachers.ash.org.au/mikemath/numseqfindiff/note2.pdf
which works nicely and to the point.
The method is certainly similar to users, and again I couldn't really understand the theory behind why it works.
Thanks & Regards
Fiaz
Walter Jan 27, 2011
Take a look at ... http://en.wikipedia.org/wiki/Difference_engine#Method_of_differences
A nobody called Isaac Newton has invented this method (^_^)
Jay Johansen Feb 18, 2011
It would be easy enough to increase the number of input fields on the screen. The logic would be the same. But 15 seems like a very large number.
This program finds the simplest polynomial formula that will generate the set of numbers that you give. That is, formulas in the form a + b x + c x2 + dx3 + e x4 ... (Where x2 means "x squared", x3 means "x cubed", etc. No superscripts in text email.) The highest exponent term it can generate is one less than the number of values you entered. That is, if you enter 10 values, it could come up with a formula going up to x to the 9th power. With 15 values, it can go up to x to the 14th power. If the set of numbers you give really requires a polynomial going up to the 14th power, well, I guess it depends what you're doing, but that's a very complicated formula. If, say, you're using numbers from a physics experiment and trying to find a formula to relate them, the odds are that the real formula doesn't go to the 14th power. This program doesn't allow for experimental error, it expects all numbers to be exact. Maybe some day I'll make a version that lets you enter some sort of margin for experimental error.
Anyway, my point is, try entering the first 15 numbers from your set and see what formula you get. If it really goes up to the 14th power, you probably have a set of numbers that cannot be described by a polynomial. They may be describable by some other sort of formula, a logarithm or trig function or some such. Or they may be just a bunch of random numbers.
Azick Feb 27, 2011
Sir, I am writing from Cameroon. I write to appreciate your effort of developing a very good program, How to Find formula for a set of Numbers. For I have been searching for such a program until this day that I found this one.But Sir, you limited the program for whole numbers and fractions formula to only 15 numbers.That means only the formula for 15 numbers and not more can be obtained after clicking on poly me Sir, I also write to inquire if you have any latest update for the program especially with regard to finding the formula for more than 18 numbers or so or between 18 to 25 numbers.Thanks
Terry Apr 22, 2011
I googled and found your site tonight for solving a sequence. I had a table with 4 sets of numbers, x and y, with x values starting with 1 and ending with 4. Your method helped me find the formula, which was a 3rd degree polynomial, not fun. Well, I think I have seen this method before. Have you been told any other sources for it? I have taken History of Mathematics. Maybe it was in that class. But your method saved the day - it was the only one that showed up in the search engine.
I was helping a poor College Algebra student who was given the table and asked whether it was proportional. We should have just plotted it to see that it wasn't linear (and/or didn't go through the origin). But I got stubborn and had to find the formula. It was a good exercise. Thank you for posting it.
Have a nice weekend.
Columbus State University
Columbus, Georgia
Nick Jul 28, 2011
Very interesting work. Brilliant. Thanks for sharing. Has anyone ever discussed your method ? Has it out there before you discovered it ?
Nick
Cincinnati, Ohio
Jay Johansen Jul 29, 2011
I really need to update my page. It turns out that this technique was invented by a little-known mathematician named Isaac Newton. I guess if you're going to come in second to someone, Isaac Newton isn't a bad person for it to be. Also, he didn't create a web page to implement it. :-)
Nick Aug 7, 2011
Well, I guess you are in good company. You should create an algorithm for data analysis. Just tweak the web page code to analyze data for signals (formulas). That awesome technique has to have some commercial/scientific application. I might write some C++ code for the technique.
Chris Sep 6, 2011
I found your site and have found it immensely helpful with a project i have been working on for a while.
It is essentially the same formula program you have made, except its designed to analyse multiple columns of data to get the equation, similar to BACON1. The version im working on is also designed to calculate it in terms of other mathematical functions (such as sin, mod, log, sqrt, etc). I was wondering if you would be willing to give me a hand with it or any resources i could use for aid.
Jay Johansen Sep 8, 2011
Expanding to other types of equations is an obvious next step. It also seems like a fairly difficult next step. Do you have a plan for how to do it? You could always do it by trial and error, i.e. have a list of functions and try each in turn until you get something that seems to work. Off the top of my head I don't know what you could do that would be better, but, well, there should be something. Trial and error would make it very difficult to discover a formula like y=sin x / x + ln x - x^2.
Curtis Sep 12, 2011
I recently view your website regarding your systematic computation of polynomial formulas. One issue I wanted to mention is that it incorrectly maps all functions to polynomials. For example the function n! which is NP and cannot be mapped to a polynomial, but in your website it is mapped to a polynomial in your program. There should be a quick check in place to see if the sequence grows at a rate faster than polynomial speed (i.e. n^k as k->some constant exponent value). Another example is 2^x...
The idea of subtracting numbers in a sequence to obtain the terms has been done many times, however your website does it quite nicely! However, your algorithm is inefficience and I wanted to help you out but giving you a suggestion to improve your solution. If you can "guess" the highest order term, simply by looking at the terms and seeing how each term relates to index i... you can guess the highest power this way. Then compute f(n)/n^highest power. If this value approaches a constant, you have correctly guessed the highest power. If it approaches 0, the power you have guessed was too high (you should guess again with a lower power), and if it approaches infinite the power you guessed was too low (guess again with a higher power). You can use this information to determine whether you need to revise the power you guessed or not. MOST IMPORTANTLY, when it approaches a constant, the value that it approaches is the constant which is multiplied by the highest power (i.e. f(n)/n^2 (assuming 2 is highest power) if computing this sequence seems to approach 5, then the first term in the function is 5n^2).
This is a much more efficient algorithm. Computing this way would allow you to compute hundreds of terms instead of just 15 on your site, as long as your server speed was fast enough, which in today's standards, is definitely not a problem.
Keep up the good work and enthusiasm!
Jay Johansen Sep 13, 2011
Hmm, I don't know how you can say that my algorithm "INCORRECTLY maps all functions to polynomials". How can you say that my mapping is incorrect? It derives a polynomial that generates the given set of numbers.
Consider the series 1, 2, 6, 24. Yes, this series could be generated by the formula y=x!
This program generates the polynomial y=11/6 x^3 - 19/2 x^2 + 50/3 x -8. The first four terms of this polynomial are, in fact, 1, 2, 6, 24.
11/6 (1)^3 - 19/2 (1)^2 + 50/3 -8 = 11/6 - 19/2 + 50/3 - 8 = (11 - 57 + 100 - 48) / 6 = 1
11/6 (2)^3 - 19/2 (2)^2 + 50/3 (2) - 8 = 88 / 6 - 76/2 + 100/3 - 8 = (88 - 228 + 200 - 48) / 6 = 2
Etc.
Both formulas generate the same set of numbers. Therefore, both are correct solutions to the problem. The fact that you may have come up with the sequence by using the factorial function doesn't make that the "correct answer". There's no way for the computer to know what you were thinking when you typed in the numbers.
Indeed, there are an infinite number of equations that could generate any given sequence. We could also generate this sequence with
y= (-x^4 + 32 x^3 + -149 x^2 + 250 x - 120) / 12
Or
y = 2^x (2-x)(x-3) (x-4) / 12 + 2^x (x-1)(x-3)(x-4) / 4 + 3 cos(x-3) (x-1)(x-2)(4-x) + 24 ln(x) (x-1)(x-2)(x-3) / ln(4)
As I stated on the web page, "What this technique finds is the simplest polynomial that fits the numbers." For any given finite list of numbers, there are an infinite number of possible rules or formulas to produce that list. For every new value added to the list, we could eliminate some of these possible formulas, but there will always be an infinite number.
I absolutely agree that an algorithm that could find trig functions and exponentials and logs and so on would be cool. But to derive this from a finite set of values would not be possible with a determinitive algorithm, i.e. there would not be a single correct answer. Intuitively you would want the "simplest" answer, i.e. "3x sin(x)" should win over a ten-term polynomial. But simplicity in that sense is difficult to even define mathematically, never mind implement with an algorithm.
The 15-value limit is not imposed by the power of the computer, but was simply an arbitrary choice. The intent when I created this web page was that it would be used with empirical data. If it can't find the right equation with 15 terms, the odds are that what you are looking for is not a polynomial, but a trig or log or some other formula.
Anyway, thanks for the comments.
Curtis Sep 14, 2011
That is a great explanation and definitely makes sense.
I was rude and tactless to say "incorrectly," and also the lack in clarity is on my part. My meaning in saying incorrectly, is that such a deterministic approach lends itself only to polynomial solutions. IT IS possible to fix this and it is not difficult. If you compute the solution using the method of dividing by the greatest power, than you will have the speed to compute two functions and compare them to find which one is "simpler" i.e. less terms. The first function is a polynomial guess (which you can compute efficiently using the method in the last email) and the second function is a NP guess of fairly simplistic forms such as a^x or n!. Here is how to write the function for NP guesses.
To find 2^x functions and factorial functions, the trick is to simply try dividing by f(n-1) for all functions which grow faster than polynomial time. That will do the trick. Dividing by f(n-1) maps n! to n and a^x to a allowing you to get a quick clear solution which you can output extremely efficiently. By comparing this solution with the polynomial solution, you can output whichever has least terms with a very quick comparison.
My reason for suggesting that what you have is incorrect is based on the following. If a user inputs 2 4 8 16 32 64 128 256 512.... up to the 15th digit, and you output a 16 term polynomial, I can't find any good argument to say that anyone realistic person would suggest that a 16 term polynomial solution is any good to them. Relative to a simple output of 2^x, it is of no value. I agree completely that you can find infinite functions which fit discrete values, NO ONE will dispute that point. But we all know a basic intuition of why we write programs... we write them to be useful, to aid us, and to solve problems. An extremely long polynomial solves the problem for a machine, but as humans, it is not useful and it does not aid us when a solution such as 2^x exists and can be obtained extremely quickly simply by checking the case when f(n)/f(n-1) and taking the limit as n->infinite. When someone enters 15 digits they are generally suggesting a pattern for a well described function, not simply an exuberant polynomial which somehow fits the digits. Its all about the point of the writing the program in the first place. If its point is just to map all discrete sets to polynomials than it is perfect. You just asked those with suggestions to comment, and I wanted to help you out by showing you a very quick adjustment that would make it cover a much greater range of functions, be more versatile, become more efficient and faster, and in the case of NP problems, give the correct answer.
Your website is awesome. I wouldn't take the time to write this if you didn't have my respect.
Vanderhoof Oct 31, 2011
I am writing you today in reference to an article you wrote, "How to Find a Formula for a Set of Numbers". I was intrigued by the idea of formulating a polynomial from a table of values and was very pleased to see that you had developed a method for this. I noticed, however, that you had not written a mathematical proof at the time (as you stated on the webpage) and I was wondering if you had written one since then or if you had been informed as to whether or not anyone else had written one. I am currently involved in the mathematics education department at the University of Central Florida and, since I plan on sharing this technique with my colleagues, I would love to have a proof to share as well (if available). Thank you for your time and I look forward to hearing from you.
Jay Johansen Nov 1, 2011
I'm told that my technique is essentially the same as one developed by Isaac Newton, which is called his "method of differences" or "method of divided differences". I just haven't had the time to investigate this but frankly I would have been surprised if someone hadn't figured out this basic technique before me: it's pretty obvious. (Bummer! Newton beat me by only 300 years!)
In any case, the concept behind it is this: When you take the difference between "adjacent" values, you are finding delta-y for delta-x = 1 for a series of x values. You are in effect finding the derivative. For example, say the function is actually x^2 + 1. (The whole idea is that we don't know this starting out, but let's assume this for purposes of discussion.) So our initial table would look like this:
x f(x)
1 2
2 5
3 10
4 17
Now we find the differences
x f(x) f'(x) f''(x)
1 2
3
2 5 2
5
3 10 2
7
4 17
The f'(x) of itself isn't much use, because we don't know what the x values are. We know that f'(x) is 3 for some value of x between 1 and 2, but we don't know what value. (In this case it's halfway between, 1.5, but in general it's not that simple.)
But when we get the f''(x) column, the f'' value is constant, so it doesn't matter what x is, it applies to all x.
So now we know that f''(x) = 2. From this we can deduce that f'(x)=2x+C1 and f(x)=x^2 + C1*x + C2.
In general if the constant term ends up as some value n, and the constant is in column p, then the high-order term is n/p! * x^p.
If we subtract x^2 from each f(x) value, we can then run through the same procedure to find C1. Then we do it all again to find C2.
I realize this is a long way from a formal mathematical proof, but that was the concept. I must admit that I'm just not quite sure how to get from where I am to a real proof. I've never really sat down to try to work it out. I'm a computer geek and not a mathematician, so I was approaching this from more of an engineering perspective: I didn't need a formal mathematical proof, just some logic that sounds plausible and that works when I try it with a variety of data. Sure, this kind of thinking often gets non-mathematicians into trouble: without a formal proof, you can't be sure that it works 100% of the time. There may be special cases. Or for that matter the cases you tried may turn out to be the small number of special cases where it works.
Hmm, just thinking about it now, it occurs to me that the casual thinking I've done about it so far has been trying to go from the differences to the polynomial. But it occurs to me that a more productive approach is probably to go from the polynomial to the differences. That is, if the high order term is ax^n, then we can readily demonstrate that the n-th derivative of f is a*n! . So all that remains to prove is that the n-th column of differences are equal to the constant term of the n-th derivative. If I just worked on this a little I might come up with an actual proof.
Chris Nov 13, 2011
I wasn't quite sure. I was thinking of having a set of functions to test in sequence (of all available data) in each possible format. There would be constants that can be adjusted later, but this stage is designed to try and discover the function for the change, then after, the constants can be adjusted to get it to line up.
It sort of is trial and error, but if a list of suitable tester functions for each element could be assembled, id imagine it could get quite close, maybe find multiple equations for the same outcomes. I recon it would be very similar to the keplar program.
Ashley Feb 10, 2012
I like your website, it's informative. I also tried to use the sequence program and got a huge string of numbers. Maybe you could help me decipher them?
Okay so I typed in a string of 'random' numbers and this is what I got ( was just looking for the next in the sequence):
Final Answer
y = -6169/3228825600 x14 +2045782776184849337/1438777253582157824 x13 +47339270785488829/285103675391207424 x12 -66956796112409307/7879453435756460032 x11 -1345146413511675721/3716083674335348736 x10 -1265383237270535869/1768146741225617408 x9 -103110904635775/1246640676310390784 x8 -12351101043369721/920856574319114240 x7 -28875337419760317/7407460916899644928 x6 +11189711754560527/563704541018865536 x5 +31419522614634823/572908877914485888 x4 -128727978105837323/829860257916322176 x3 +126147133537779719/130875669076421728 x2 +2482680522399570425/516821091708297996 x +433999830631005724/155278720400261061
Jay Johansen Feb 11, 2012
If you type in random numbers, you'll get essentially random output. Assuming the program is working this function should generate the sequence of numbers you gave. Verifying it would be a pain.
You might try it with shorter, more carefully chosen set of numbers. Like put in 1, 4, 9, 16 and see if it says this is x^2, etc.
Ashley Feb 14, 2012
Well, I took the sequence of numbers from a random number generator. Seeing how its a computer function i thought it couldn't actually be random. I guessed using your program could maybe show the pattern ? Apparently not. Is it possible to figure out how numbers are selected through a random number generator using a produced string of numbers? Just curious.
Jay Johansen Feb 14, 2012
My "function finder" wouldn't uncover the algorithm behind any computer random number generator that I know. My function finds polynomials, and computer random number functions tend to use recursive functions and twiddle bits, which would obscure any polynomial.
Yes, it is true that computer random numbers are not random in the same sense that a die roll is random. They are often called "pseudo-random".
Are you wondering how computer random number functions work in general, or how one works in particular?
If the former: The random number function used by Java is pretty typical.
Each random number is generated from the previous random number. It takes the previous number, multiplies by 25214903917 and adds 11. Then it calls another function that I don't have the source code for but I think it's just preventing getting the same number twice in a row. Each request for a random number specifies how many binary digits it wants, so it peels off the first however-many digits and that's what it returns. Then this number (the full number, not the peeled-off digits) is saved to use the next time around.
To get the first number, when you have no "previous", the calling program can set a value. Programmers sometimes do this when testing so they can get the same series of numbers for each test. Normally, though, it uses the number of nanoseconds from the current time as the starting value. I just noticed that Java carefully adds 1 to this each time you create a new random-number sequence, so if you create two random number sequences within one nanosecond they'll get different starting values.
rovingrover Mar 23, 2012
I am contacting you regarding your method of calculating differences between polynomial terms as I too have done work on this area, though with a different approach and not as in depth. I, also was not able to find any demonstration of this work in other places, though would be highly interested in finding out if you find some professional work on this topic. I am currently a high school equivalent student to give you some perspective. I think a formal would proof would be showing that the nth derivative of x^n simplifies to x!, try it for yourself.
Interestingly this may, with a little adapation, work for non integer n by means of an infinite series, [note to self gamma function?]. Your blog has just given me a little eureka moment!
Thanks for the entertaining article, and please contact me if you find a more in depth explanation of this, though I fear it may be too simplistic to be considered in great depth.
Atanas Oct 11, 2012
I remember learning this formula from a math club instructor back in high school. Then I forgot it and I needed it to solve a problem in one of my classes. I gave up on finding the formula and then today I stumbled across your website where you said you derived this yourself and you were interested in knowing if anyone else had either so I guess I'm just emailing you to say thanks for putting that up and that I have seen it before and it's probably my favorite mathematical trick so....thanks again!!
Ed Feb 1, 2013
Thanks for posting this.
I am considering the use of your Formula Generator for an certain application where the numbers inputted represent specific incremental bioenergetic microfrequencies-representative of disorder in a chaotic system.
My query:
I am trying to understand how the formula derived might be understood to represent a harmonic relationship between the individual numbers of increasing values. Should such a relationship be conceivable as such.
I am also interested in knowing which process,or variable, would bring the established formula to zero (theoretically neutralizing the entropy represented by the values)
I am obviously attempting to find a numeric value which stabilizes the disorder.
Your thoughts are very much appreciated.
Many thanks for your time,
Jay Johansen Feb 2, 2013
Short answer: This program will find the simplest POLYNOMIAL which generates the given set of data. I don't want to overstate what it does. If the "true" formula is a harmonic or trigonometric or some such, this program won't generate n^-1 or sin n terms -- it will still generate a polynomial, the polynomial that gives the best fit to your data.
People have occasionally suggested that I extend the program to handle a wider variety of functions. I agree that would be cool, but barring a brainstorm on how to do it easily, that would be a much more complicated project, one that I'm not likely to do in my spare time.
I'm happy to chat with you about how to find solutions for the more general case, but that's about all I can say.
Best of luck! :-)
Tony Jun 5, 2013
My greatest of compliments on your program. I have been using your website to introduce students to linear regression and Taylor/Maclaurin as a tutor for years now. I was wondering if you might be selling "polyme" the program as a stand-alone program that wouldn't require Internet access. If you aren't, again, my greatest of compliments and thanks for the instructive resource that I've been using all of these years.
Stephen Jul 28, 2013
I absolutely LOVE your page on a formula for any set of numbers. i've long wondered about that sort of thing and eventually i found your page after attempting to research such a thing on the internet. I know it's been several years since you made the website, look at the page I think it's been 15, but you updated it 5 years ago if I'm not mistaken. However, I'm very interested in computer science now, and I'm going to be a freshman at a university in North Carolina in the fall, so I'm wondering if there's any chance you would mind giving me access to your page, or spreadsheet, whatever it is that you used to create that program which does the math for you. I understand this is a big request, which is why I'm asking you. I've begun to try to emulate the process you describe, using my limited (though rapidly improving) knowledge of excel, but I can't even get the if-then statements to work :(
anyway, I'm hoping to hear from you soon, and have a great day and thank you for your marvelous work!
ali shahzad Oct 24, 2013
Dear Mr. Jay,
Hope you will be fine and enjoying some great work.
I would like to convey my special thanks to you for such a nice program for generating equation for numbers.
For my M.Sc Engineering in mechanical design thesis work, I have established a equation from your written online program.
This program helped me a lot.
Well done.
Jonah Dec 22, 2013
I came across your "How to Find a Formula for a Set of Numbers" webpage and was intrigued by it. I've done the same thing many times myself; in fact, in 6th grade, this is essentially what our teacher taught us to do. I'm sure you've been told this before, but do you realize that you essentially recreated Taylor Series??
It's quite cool.
Thanks for the fun!
satdeep Jan 30, 2014
I found your "How to Find a Formula for a Set of Numbers" webpage very useful. But I have a small confusion. While generating the formula, especially for your data, 2,8,9,11,20,... you came up with a step like this,
6 / 3! x n3
= 6 / 6 x n3
= n3
I don't know what 'x' and 'n' indicates. Because, in this webpage you finally ended up with the formula,
n3 - 17/2 n2 + 49/2 n - 15
But, if I use your computer program 'Poly Me', the formula comes like this,
y = +1 x3 -17/2 x2 +49/2 x -15
Please clear my confusion.
Jay Johansen Feb 2, 2014
Okay, I was inconsistent. "x" and "n" are the input variables. The letter doesn't really matter. Whether you write y=3x+5 or q=3n+5, it's the same formula.
To take that example, if I plug in x=1 I should get out y=8. If I plug in x=2 I get out y=11. Etc.
satdeep Feb 2, 2014
Thanks for clearing my doubt. I appreciate.
lalit bhardwaj Feb 20, 2014
thanx for this formula.......... pls.send all formula my mail ID . THANKU
just click this Mar 7, 2014
This site inspires me everyday, you should update it more often
Totardo Tobing Mar 19, 2014
Wow thanks... Your program works.... I wish that you write your explanation with a simpler example Sir. Just for a guy like me can follow/understand. For instance: 3,5,9,15
But anyway really thankful for this great work, thank you
Jay Johansen Mar 20, 2014
3
5 -> 2
9 -> 4 -> 2
15 -> 6 -> 2
So first term is 2/2! x ^2 = x^2. Subtract that from each term ...
3-1=2
5-4=1
9-9=0
15-16=-1
Then run thru again ...
2
1->-1
0->-1
-1->-1
so second term is -1/1! x ^ 1 = -x. Subtract that from each term -- remember minus a minus is a plus.
2+1=3
1+2=3
0+3=3
-1+4=3
So last term is 3/0! x^0 = 3.
Putting that together gives x^2 - x + 3.
Check it:
f(x)=x^2 - x +3
f(1)=1-1+3=3
f(2)=4-2+3=5
f(3)=9-3+3=9
f(4)=16-4+3=15
Hooray!
Totardo Tobing Mar 22, 2014
Lots of funn,, thank you Sirrrr, God bless
Johnd611 May 19, 2014
This website is mostly a walkby for all the info you wished about this and didnt know who to ask. Glimpse right here, and also youll undoubtedly uncover it. bcdfdeedeged
Andrew May 21, 2014
Wow! Thanks for this in-depth explanation! Really great method.
I've tried implementing this method in Python for my own interest, however I was unable to get it working correctly on the last few steps.
Is there any chance that you would be able to provide the source code to your program?
Pharmf425 May 22, 2014
Very nice site!
michael kors black ion May 25, 2014
Just to let you know, this post looks a little bit odd from my android phone. Who knows maybe it really is just my cell phone. Great post by the way.
Morteza Oct 17, 2014
Hi. I'd like to translate this article into Persian.
Do you allow me to do that?
All your rights will be reserved..
Answer me via Email.
Ehab saber Dec 17, 2014
what is the formula of :
19, 17, 12, 11, 19, 18, 43
in seq 1 : 7
William Bouris Apr 8, 2015
I used Poly Me, the program, and it worked wonderfully. You have a great website. Thanks, again! Bill
Caitlin Apr 28, 2015
I wanted to mention that this would not be a good method to find the formula for a sequence of real world data. Although this method would be able to fit a n^th order polynomial to the curve, and it would go exactly through all the points, it would not be much use at actually predicting other values. But for cases when you are just trying to find the next value in a sequence, this is very interesting and fine.
Jay Johansen May 1, 2015
Caitlin: Absolutely true, especially if we're talking about using my program. If you have data collected from, say, a scientific experiment, there is going to be some measurement error in there, and my program expects all the numbers to be exact. It's useful for working with MATH problems, not SCIENCE problems. This method could work for real-world data if you accepted some fudging of the numbers, like if you see a column of differences comes out to 1.9, 2.1, 1.88, and 2.02, you might way, "hmm, maybe the real number is 2" and see how it works out. I've been thinking for a long time of adding a feature to my program to allow for some error, but I've never gotten around to it.
Also, note this method will find a POLYNOMIAL, period. It wont find a trig function or a log or many other possibilities. It would take entirely different methods to find such functions.
nick crorkz May 30, 2015
rVhKk4 Generally I do not learn post on blogs, however I would like to say that this write-up very pressured me to take a look at and do it! Your writing style has been amazed me. Thanks, very nice article.