CURVE FITTER version 1.1 by David C. Young For use with MSDOS computers. Copyright 1991 David C. Young PROGRAM FUNCTION: This program finds both coefficients and exponents for a curve of best fit by the least squares definition of best fit. The program contains functions for creating and revising it's data input files. There is also a function titled "Display any ASCII file", which can be used to display the output files. Each data field is read from a data file. Each data point in the file is accompanied by a key piece of information, which uniquely identifies that data point. The program will read in any number of data files and only use those data points containing information for every field. STARTING THE PROGRAM: Start the program by changing to the directory or drive containing the program and typing CURVE. The program can be installed on a hard drive by using the DOS copy command to copy all of the files to the desired location on the hard drive. USING THE PROGRAM: The basic process of using this program is: 1. Create data files. The data in the data files is keyed. An example of using keys would be if you wanted to find an equation to predict the price of your favorite stock, you might want to enter various items that could be used to predict, such as Gross National Product, or the price of tea in China. The price of your stock would also be in a file keyed by month. You would key these values by the date that they are for. Thus you would have a file with the price of tea in China during various months. The month would be the key value. Once you had several files, all keyed by month, you might guess that the price of your stock varies in the same way as the GNP times the price of tea in China. If you don't have all of the necessary data (due to irregular delivery of the Bejing Times), the program will still work. It will just only take into account those months, for which you have a complete set of data, without bothering you with the details. 2. Type in an equation. For the example above, the equation might be S=A*G*T+B or S=A*S^C+B*T^D+E where S is the stock price, G is the GNP, T is the price of tea in China and A through E are numbers that you want the computer to calculate. 3. Specify which letters represent known data points to be read from data files. S, G & T in our example are read from files. C & D are computed exponents. A, B & E are computed coeficients. 4. Input file names for summary and analysis files. A summary file contains the stuff put on screen at the end of the calculation, the values for computed numbers and the average & maximum deviations (measures of how well the equation fits the data). An analysis file contains the key values, actual values (for your stock) and computed values. Analysis files can be read into many popular spread sheets and graphing programs, so that you can graph the data to better see how well the calculation works. 5. For computed exponents, input starting values and the number of decimal places to calculate. The equation to be input consists of single upper or lower case letters to represent both known pieces of data and numbers to be calculated, along with an equals sign "=" and four mathematical operators: "+" - addition "*" - multiplication "/" - division "^" - exponentiation Some examples of valid equations are: a = b * c + d a * b^c + d = e a = b * C^d * E^e + f * g^h + i A=B*C^D+E*F^G+H*I^J+K*L^M+N DATA FILE FORMAT: The data file is an ASCII file consisting of a header and up to 65000 key and data values. The numbers identifying what type of key and data values are present are as follows: 1 - string 2 - real 3 - character 4 - integer The key field can be of any of these types. The data field can be used for curve fitting only if it is of type real or integer. Integer data fields are treated by curve fitter as real values. The data file format is: . . . SUMMARY FILE FORMAT: The summary file is an ASCII file, which contains exactly the same information, which is displayed on the screen at the end of a calculation. The summary file format is: Average Deviation = Maximum Deviation = Known data points Known taken from (same as line above for each data file used) Computed Exponents (if any) Exponent = (same as line above for each exponent computed) Computed Coefficients Coefficients = (same as line above for each coefficient computed) ANALYSIS FILE FORMAT: The analysis file is an ASCII file containing information for comparing the known values to the calculated values for the property being modeled. The analysis file format is: . . . LIMITATIONS OF THE PROGRAM: Some known deficiencies with the program that I hope to get around to fixing in the future are. 1. Parenthesis are not allowed in the equations. 2. Constants are not allowed, although you can create a data file with the same number for every data point if necessary. 3. The list of functions which are not supported is massive. It starts with the trigonometric functions. 4. The exponents that are found represent local minimums only, so pick your initial values wisely, or try a few that you think might be in the right range. However, to the best of my knowledge, for what this program does do, it does it correctly. ABOUT CURVE FITTING: The program generally finds the coefficients of best fit for each term in an equation and finds the exponent of best fit for any variable desired. The exponents are gotten through a multivarible simplex minimization routine. The coefficients are gotten at every step of the way through the matrix algebra least squares method (mathematically equivalent to linear regression). If your theory shows that an equation should have a particular form, it is best to work with that form. However, if you don't know what form to use and want to fit your data by a brute force method, here are some suggestions: 1. Have the program find a coefficient for every term in the equation. 2. Have the program find the constant term. 3. The generic most powerful fitting is one in which every term consists of a fitted coefficient and a single variable with a fitted exponent .. and then the constant term is added on. This is often the best fit because the most parameters are being fitted. 4. Remember that you can fit anything if you are fitting more parameters than there are items in your data set, however this fit may be useless when applied to new data points falling between or past the original data points. For best results always make sure that you have considerably more data points than the number of parameters that are being fitted, otherwise you may be fooling yourself. LEGAL STUFF: Version 1.1 of this program is being offered FREE to the world with no guarantees expressed or implied ... etc, etc, etc. Version 1.1 may be freely distributed to anyone and everyone, as long as these instructions are kept with it. SALES PITCH: I have completed version 2.1 of this program. The new features present in version 2.1 are: 1. It allows multiple character identifiers. 2. It allows user inserted parenthesis. 3. It allows the use of numerical constants. 4. It has twenty new functions covering trigonometry, logarithms and a few others items. If you would like to buy version 2.1, send $15.00 to: David Young 4485 Fairlane Okemos, MI 48864 Please, specify what size and density of disk for me to send. I will send you one disk containing version 2.1 (or whatever the most recent version is) containing the program and documentation files. This price does not include any updates past the version that I send you, but I will try to keep you informed of future versions. Please, send cash, check or money orders. I cannot accept credit card orders. Version 2.1 is sold as is with no guarantees expressed or implied. If you have any questions, you can also reach me by e-mail at: internet: young@slater.cem.msu.edu bitnet: YOUNGDC@MSUCEM