Tương quan và hồi quy "Hồi quy không phải là dễ dàng, cũng không phải là nó đánh lừa bằng chứng. Hãy xem xét có bao nhiêu kẻ ngu đến nay đã bị bắt. Tuy nhiên, nó là một trong những công cụ mạnh mẽ nhất mà chúng ta có gần như chắc chắn, khi sử dụng một cách khôn ngoan, công cụ mạnh mẽ nhất trong các nghiên cứu quan sát. Như vậy chúng ta không nên ngạc nhiên là: (1) Cochran cho biết 30 năm trước, "hồi quy là một phần tồi tệ nhất giảng dạy thống kê." (2). | C H A P T E R 4 Correlation and Regression Regression is not easy nor is it fool-proof. Consider how many fools it has so far caught. Yet it is one of the most powerful tools we have almost certainly when wisely used the single most powerful tool in observational studies. Thus we should not be surprised that 1 Cochran said 30 years ago Regression is the worst taught part of statistics. 2 He was right then. 3 He is still right today. 4 We all have a deep obligation to clear up each of our own thinking patterns about regression. Tukey 1976 Tukey s comments on the paper entitled Does Air Pollution Cause Mortality by Lave and Seskin 1976 continues with difficulties with causal certainty CANNOT be allowed to keep us from making lots of fits and from seeking lots of alternative explanations of what they might mean. For the most environmental problems health questions the best data we will ever get is going to be unplanned unrandomized observational data. Perfect thoroughly experimental data would make our task easier but only an eternal monolithic infinitely cruel tyranny could obtain such data. We must learn to do the best we can with the sort of data we have . It is not our intent to provide a full treatise on regression techniques. However we do highlight the basic assumptions required for the appropriate application of linear least squares and point out some of the more common foibles frequently appearing in environmental analyses. The examples employed are real world problems from the authors consulting experience. The highlighted cautions and limitations are also as a result of problems with regression analyses found in the real world. Correlation and Regression Association between Pairs of Variables In Chapter 2 we introduced the idea of the variance Equation of a variable x. If we have two variables x and y for each of N samples we can calculate the sample covariance Cxy as N X xi-x yi-y Cxy N ------------------ 2004 CRC Press LLC This is a measure of .