Interpolation is a large topic and there are many techniques. I agree that the GIS or math forum could be more initiated, but for completeness, I'll post some thoughts here.
The choice of method depends on the type of data you are using and the implementation depends on your environment. GIS software should always have a number of methods and parameters to adjust. It's also rather easy to create a script in e.g python or matlab to test and evaluate interpolation methods (E.g this).
Some questions to answer before you chose your method are:
Wikipedia has an overview and Caruso, C., and F. Quarta. "Interpolation methods comparison." Gives a rather good introduction of the techniques and there are some good webpages on GIS and interpolation.
The most important step is to test the method you choose. Again, there are various ways to do so, but the easiest is to simply leave points out from you data set and interpolate the values without them. Try to make the interpolated value as close to the data value as possible.
Kriging are methods based on the actual data instead of a curve. It's often a good first try on spatial data but also in other application like this. Spline methods produces smoother curves. People doing computer graphics are experts on this, but there can also be applications where you'll aspect smooth changes in a model.
Naturally, more complex methods (e.g. kriging) take longer time to process, whiles simpler algorithms (e.g. bilinear or nearest-neighbor) are faster. Lower resolution of your output raster (array) would also speed up the process. You can let your resolution depend on the variance of the data, so that you interpolate less points in a flat surface and more points for a surface with higher relief. This techniques are common practice in signal processing and the experts are here.
Geostatistics was initially developed by George Matheron, a French mathematician and geologist - though when I first heard about him in the 1980s he was described as a mining engineer.
Matheron developed geostatistics to find a better way to determined ore reserves more accurately. His other other aim was to try to ascertain a level of accuracy for ore reserves (i.e x ounces of gold at a confidence of y %) . Since the initial development of geostatistics in the 1960s for ore reserves in mining, the technique has been used in other fields where there is a spatial co-relation between data samples, such a forestry, pollution contamination around metal smelters, soil science, geochemistry, etc.
One of the key tools of geostatistics is the variogram. The variogram is used to determine the distance at which samples can be regarded as being independent of each other.
The mathematics for variograms can look bewildering, but variograms can be explained simply. Consider a line of 100 equally spaced sample points, say 1 metre apart.
The sample distances must be factors of the total distance. You can find the variance for samples 3 m apart because 3 doesn't divide into 100 evenly, but if there were only 99 samples then things are totally different.
The distance between samples is known as the lag distance. On a plot of variance versus lag distance, for low lag values the curve will rise. Where the curve flatten out is the distance at which samples can be regarded as being independent of each other. The variogram has its roots in the correlation diagrams.
For data on a 2D grid, it is prudent to obtain a suite of variograms along the directions north-south, east-west, northwest-southeast and northeast-southwest to obtain an initial variance rosette. Any trend in the data will be seen from this and a better variogram orientation can then be tried.
Where the distance between samples is strictly not uniform, but within accepted tolerances, adjustments will need to be made to the data. Also if data is clustered within areas, the data will need to be declustered first.
When data points are being considered it is common for a search ellipse to be used and for the search ellipse to be orientated. For example, is the data set is soil contamination samples around a metal smelter, or seed dispersal from trees, the search ellipse would be orientated so the major axis would be aligned with the direction of the prevailing winds.
Because of the amount of calculations involved and the size of some data set, particularly metal grades in geological ore bodies, computers are heavily involved with geostatistical analyses.
An initial reference on geostatistics is Isobel Clark, 1979, Practical Geostatistics, Applied Science Publishers
If you want to estimate a value between sample points then either some form of weighting of samples will need to be used. Within geostatistics, kriging is usually used, but there are many forms of kriging, just to name a few: ordinary, disjunctive, log-normal, indicator. What some people also do is to use other forms of weighting such as inverse distance squared, or inverse distance to another power, usually between 1.5 and 3, but inverse distance squared is the more common of these. The results from kriging and inverse distance squared are then compared. The two techniques will rarely give the same results, but the results should be similar.