第六周课 多元统计分析
Manova检验
H0:各个总体的均值相同
均类分析 (平均法)
. clusteraveragelinkage price mpg weight length cluster name: _clus_1 . edit - preserve
. cluster list _clus_1
_clus_1 (type: hierarchical, method: average, dissimilarity: L2) vars: _clus_1_id (id variable)
_clus_1_ord (order variable) _clus_1_hgt (height variable)
other: cmd: cluster averagelinkage price mpg weight length varlist: price mpg weight length range: 0 .
. list _clus_1_id _clus_1_ord _clus_1_hgt 用分析谱系图(不超过50个样本)来分析 clusterdendrogram _clus_1
cluster dendrogram _clus_1 ,horizontal【水平版本】 另一种结构,产生差异结果的话说明数据本身不稳健。 . clusterwardslinkage price mpg weight length cluster name: _clus_2
. cluster wardslinkage price mpg weight length, name(class1)#自己命名生成的变量class#
. clusterdendrogram class1
生成相似矩阵ma1,希望在此基础上进行矩阵分析 . matrix dissimilarity ma1=price mpg weight length 构造简单链接singlelinkage . clear
.clustermatsinglelinkage ma1 obs was 0, now 74 cluster name: _clus_1
. clusterdendrogram _clus_1
在谱系图的基础上,分成k()类命名为class2等选项,研究样本个体具体的属类情况。
clusterkmeans price mpg weight length, k(4) name (class2)
. list class2
+--------+ | class2 | |--------| 1. | 1 | 2. | 4 | 3. | 1 | 4. | 4 | 5. | 3 | |--------| 6. | 4 | 7. | 1 | 8. | 4 | 9. | 2 | 10. | 1 | |--------| 11. | 2 | 12. | 2 | 13. | 2 | 14. | 1 | 15. | 4 | |--------| 16. | 1 | 17. | 4 | 18. | 1 | 19. | 1 | 20. | 1 | |--------| 21. | 1 | 22. | 4 | 23. | 4 | 24. | 1 | 25. | 1 | |--------| 26. | 2 | 27. | 2 | 28. | 2 | 29. | 1 | 30. | 4 |
|--------| 31. | 4 | 32. | 1 | 33. | 4 | 34. | 1 | 35. | 3 | |--------| 36. | 4 | 37. | 4 | 38. | 4 | 39. | 1 | 40. | 1 | |--------| 41. | 2 | 42. | 1 | 43. | 1 | 44. | 1 | 45. | 4 | |--------| 46. | 1 | 47. | 4 | 48. | 4 | 49. | 4 | 50. | 4 | |--------| 51. | 1 | 52. | 1 | 53. | 3 | 54. | 4 | 55. | 3 | |--------| 56. | 4 | 57. | 1 | 58. | 1 | 59. | 3 | 60. | 1 | |--------| 61. | 4 | 62. | 1 | 63. | 1 | 64. | 2 | 65. | 1 | |--------| 66. | 1 |
67. | 4 | 68. | 1 | 69. | 4 | 70. | 3 | |--------| 71. | 4 | 72. | 1 | 73. | 3 | 74. | 2 | +--------+
. list price lass2
variable lass2 not found r(111);
. list price class2
+-----------------+ | price class2 | |-----------------| 1. | 4,099 1 | 2. | 4,749 4 | 3. | 3,799 1 | 4. | 4,816 4 | 5. | 7,827 3 | |-----------------| 6. | 5,788 4 | 7. | 4,453 1 | 8. | 5,189 4 | 9. | 10,372 2 | 10. | 4,082 1 | |-----------------| 11. | 11,385 2 | 12. | 14,500 2 | 13. | 15,906 2 | 14. | 3,299 1 | 15. | 5,705 4 | |-----------------| 16. | 4,504 1 | 17. | 5,104 4 | 18. | 3,667 1 | 19. | 3,955 1 | 20. | 3,984 1 | |-----------------|
21. | 4,010 1 | 22. | 5,886 4 | 23. | 6,342 4 | 24. | 4,389 1 | 25. | 4,187 1 | |-----------------| 26. | 11,497 2 | 27. | 13,594 2 | 28. | 13,466 2 | 29. | 3,829 1 | 30. | 5,379 4 | |-----------------| 31. | 6,165 4 | 32. | 4,516 1 | 33. | 6,303 4 | 34. | 3,291 1 | 35. | 8,814 3 | |-----------------| 36. | 5,172 4 | 37. | 4,733 4 | 38. | 4,890 4 | 39. | 4,181 1 | 40. | 4,195 1 | |-----------------| 41. | 10,371 2 | 42. | 4,647 1 | 43. | 4,425 1 | 44. | 4,482 1 | 45. | 6,486 4 | |-----------------| 46. | 4,060 1 | 47. | 5,798 4 | 48. | 4,934 4 | 49. | 5,222 4 | 50. | 4,723 4 | |-----------------| 51. | 4,424 1 | 52. | 4,172 1 | 53. | 9,690 3 | 54. | 6,295 4 | 55. | 9,735 3 | |-----------------| 56. | 6,229 4 | 57. | 4,589 1 |
58. | 5,079 1 | 59. | 8,129 3 | 60. | 4,296 1 | |-----------------| 61. | 5,799 4 | 62. | 4,499 1 | 63. | 3,995 1 | 64. | 12,990 2 | 65. | 3,895 1 | |-----------------| 66. | 3,798 1 | 67. | 5,899 4 | 68. | 3,748 1 | 69. | 5,719 4 | 70. | 7,140 3 | |-----------------| 71. | 5,397 4 | 72. | 4,697 1 | 73. | 6,850 3 | 74. | 11,995 2 | +-----------------+
. tabstat price mpg weight length, by(class2)
Summary statistics: mean by categories of: class2
class2 | price mpg weight length ---------+----------------------------------------
第 1类 | 4163.938 24.5625 2581.25 174.7813 第2类 | 12607.6 15 4041 209.1 第3类 | 8312.143 21 2931.429 188.5714 4 | 5548.88 19.72 3196.4 196.12 ---------+----------------------------------------
Total | 6165.257 21.2973 3019.459 187.9324 -------------------------------------------------- .
反过来做多元方差分析,检验分类是否有效。看均值差异情况。P显著小于0,分类成功。
. manova price mpg weight length=class2
Number of obs = 74
W = Wilks' lambda L = Lawley-Hotelling trace
P = Pillai's trace R = Roy's largest root
Source | Statisticdf F(df1, df2) = F Prob>F -----------+--------------------------------------------------
class2 | W 0.0549 3 12.0 177.6 29.52 0.0000 a
| P 1.2125 12.0 207.0 11.70 0.0000 a
| L 12.6572 12.0 197.0 69.26 0.0000 a
| R 12.3147 4.0 69.0 212.43 0.0000 u
|-------------------------------------------------- Residual | 70
-----------+-------------------------------------------------- Total | 73
--------------------------------------------------------------
e = exact, a = approximate, u = upper bound on F 使用中位数kmedians重整,分类情况发生改变
. clusterkmedians price mpg weight length, k(4) name(class3)
. list class2 class3
+-----------------+ | class2 class3 | |-----------------|
1. | 1 2 | 2. | 4 2 | 3. | 1 3 | 4. | 4 2 | 5. | 3 1 | |-----------------|
6. | 4 1 | 7. | 1 3 | 8. | 4 2 | 9. | 2 4 | 10. | 1 2 | |-----------------|
11. | 2 4 | 12. | 2 4 | 13. | 2 4 | 14. | 1 3 |
15. | 4 2 | |-----------------|
16. | 1 2 | 17. | 4 2 | 18. | 1 3 | 19. | 1 2 | 20. | 1 3 | |-----------------|
21. | 1 2 | 22. | 4 23. | 4 24. | 1 25. | 1 |-----------------|
26. | 2 27. | 2 28. | 2 29. | 1 30. | 4 |-----------------|
31. | 4 32. | 1 33. | 4 34. | 1 35. | 3 |-----------------|
36. | 4 37. | 4 38. | 4 39. | 1 40. | 1 |-----------------|
41. | 2 42. | 1 43. | 1 44. | 1 45. | 4 |-----------------|
46. | 1 47. | 4 48. | 4 49. | 4 50. | 4 |-----------------|
1 | 1 | 3 | 3 | 4 | 4 | 4 | 3 | 2 | 1 | 2 | 1 | 3 | 4 | 2 | 2 | 2 | 2 | 3 | 4 | 2 | 3 | 3 | 1 | 2 | 1 | 2 | 2 | 2 | 51. | 1 2 | 52. | 1 3 | 53. | 3 4 | 54. | 4 1 | 55. | 3 4 | |-----------------|
56. | 4 1 | 57. | 1 3 | 58. | 1 3 | 59. | 3 1 | 60. | 1 3 | |-----------------|
61. | 4 1 | 62. | 1 3 | 63. | 1 3 | 64. | 2 4 | 65. | 1 3 | |-----------------|
66. | 1 3 | 67. | 4 1 | 68. | 1 3 | 69. | 4 1 | 70. | 3 1 | |-----------------|
71. | 4 1 | 72. | 1 3 | 73. | 3 1 | 74. | 2 4 | +-----------------+ .
停止K聚类分析,根据设定的规则。 Pseudo F统计量calinaki
. cluster stop class3, rule(calinski)
+---------------------------+
| | Calinski/ | | Number of | Harabasz | | clusters | pseudo-F | |-------------+-------------|
| 4 | 151.37 | +---------------------------+
在分类图中的个体赋予类别界定线
. clusteraveragelinkage price mpg weight length cluster name: _clus_1
. cluster generate clus5= cut(3500), name( _clus_1)
. list clus5
+-------+ | clus5 | |-------| 1. | 1 | 2. | 1 | 3. | 1 | 4. | 1 | 5. | 2 | |-------| 6. | 1 | 7. | 1 | 8. | 1 | 9. | 2 | 10. | 1 | |-------| 11. | 2 | 12. | 3 | 13. | 3 | 14. | 1 | 15. | 1 | |-------| 16. | 1 | 17. | 1 | 18. | 1 | 19. | 1 | 20. | 1 | |-------| 21. | 1 | 22. | 1 | 23. | 1 | 24. | 1 | 25. | 1 | |-------| 26. | 2 | 27. | 3 | 28. | 3 | 29. | 1 |
30. | 1 | |-------| 31. | 1 | 32. | 1 | 33. | 1 | 34. | 1 | 35. | 2 | |-------| 36. | 1 | 37. | 1 | 38. | 1 | 39. | 1 | 40. | 1 | |-------| 41. | 2 | 42. | 1 | 43. | 1 | 44. | 1 | 45. | 1 | |-------| 46. | 1 | 47. | 1 | 48. | 1 | 49. | 1 | 50. | 1 | |-------| 51. | 1 | 52. | 1 | 53. | 2 | 54. | 1 | 55. | 2 | |-------| 56. | 1 | 57. | 1 | 58. | 1 | 59. | 2 | 60. | 1 | |-------| 61. | 1 | 62. | 1 | 63. | 1 | 64. | 3 | 65. | 1 | |-------|
66. | 1 | 67. | 1 | 68. | 1 | 69. | 1 | 70. | 1 | |-------| 71. | 1 | 72. | 1 | 73. | 1 | 74. | 2 | +-------+ .
聚类分析到此为止,接下来继续讲判定分析 线性+非线性+其他
先看线性判定分析discrimlda(线性的)
. discrimlda price mpg weight length,group(foreign)
Linear discriminant analysis
Resubstitution classification summary
+---------+ | Key | |---------| | Number | | Percent | +---------+
| Classified True foreign | Domestic Foreign | Total -------------+--------------------+---------
Domestic | 43 9 | 52
| 82.69 domestic判别正确率 17.31 | | | Foreign | 0 22 | 22
| 0.00 100.00 foreign判别正确率 | -------------+--------------------+---------
Total | 43 31 | 74 | 58.11 41.89 | 100.00 | | Priors | 0.5000 0.5000 |
.根据分析结果进行下一步检验estat . estatclasstable
100.00 100.00
Resubstitution classification table
+---------+ | Key | |---------| | Number | | Percent | +---------+
| Classified True foreign | Domestic Foreign | Total -------------+--------------------+---------
Domestic | 43 9 | 52 | 82.69 17.31 | 100.00 | | Foreign | 0 22 | 22 | 0.00 100.00 | 100.00 -------------+--------------------+---------
Total | 43 31 | 74 | 58.11 41.89 | 100.00 | | Priors | 0.5000 0.5000 |
. eatatcorr
unrecognized command: eatat r(199);
. estatcorr
Pooled within-group correlation matrix
| price mpg weight length -------------+---------------------------------------- price | 1.00000 mpg | -0.53117 1.00000 weight | 0.70551 -0.77521 1.00000 length | 0.56014 -0.75664 0.91898 1.00000
. estat covariance
Pooled within-group covariance matrix
| price mpg weight length -------------+--------------------------------------------
price | 8799417 mpg | -8438.941 28.6848 weight | 1318950 -2616.617 397186.1 length | 30603.94 -74.63974 10667.35 339.2432
. estaterrorrate错判率
Error rate estimated by error count
| foreign
| Domestic Foreign | Total
-------------+----------------------+----------
Error rate | .1730769 0 | .0865385 -------------+----------------------+----------
Priors | .5 .5 | .
. estatgrsum四个变量的分组描述性统计差异情况
Estimation sample discrimlda Summarized by foreign
| foreign
Mean | Domestic Foreign | Total -------------+----------------------+---------- price | 6072.423 6384.682 | 6165.257 mpg | 19.82692 24.77273 | 21.2973 weight | 3317.115 2315.909 | 3019.459 length | 196.1346 168.5455 | 187.9324 -------------+----------------------+----------
N | 52 22 | 74
每个变量进行方差分析. . estatanova
Univariate ANOVA summaries
| Adj.
Variable | Model MS Resid MS Total MS R-sq R-sq Pr> F
-------------+-------------------------------------------------------------
price | 1507382.7 6.336e+08 6.249e+08 .0024 -.0115 .1713 0.6802 mpg | 378.15352 2065.3059 2042.1943 .1548 .143 13.18 0.0005
F weight | 15496779 28597399 28417939 .3514 .3424 39.02 0.0000 length | 11767.15 24425.512 24252.11 .3251 .3158 34.69 0.0000 ---------------------------------------------------------------------------
Number of obs = 74 Model df = 1 Residual df = 72 生成的判别函数?——典型判别函数法canontest . estatcanontest
Canonical linear discriminant analysis
| | Like- | Canon. Eigen- Variance | lihood
Fcn | Corr.value Prop. Cumul. | Ratio F df1 df2 Prob>F ----+---------------------------------+------------------------------------
1 | 0.7494 1.28083 1.0000 1.0000 | 0.4384 22.094 4 0.0000 e
---------------------------------------------------------------------------
Ho: this and smaller canon. corr. are zero; e = exact F 调用多元函数形式 . estat loadings
Standardized canonical discriminant function coefficients
| function1 -------------+----------- price | -1.084153 mpg | .3115969 weight | 2.04874 length | -.4264069
显示分类函数 statclassfunction
Classification functions
| foreign | Domestic Foreign
-------------+---------------------- price | .0013868 .0022795 mpg | 4.577349 4.435253 weight | -.0341788 -.0421185 length | 2.534884 2.591428
_cons | -241.4898 -231.8288 -------------+----------------------
Priors | .5 .5
69
接下来
说明非线性判别
. discrimlda price mpg weight length,group(clus5)
Linear discriminant analysis
Resubstitution classification summary
+---------+ | Key | |---------| | Number | | Percent | +---------+
| Classified True clus5 | 1 2 3 | Total -------------+------------------------+-------
1 | 59 0 0 | 59 | 100.00 0.00 0.00 | 100.00 | | 2 | 0 10 0 | 10 | 0.00 100.00 0.00 | 100.00 | | 3 | 0 0 5 | 5 | 0.00 0.00 100.00 | 100.00 -------------+------------------------+-------
Total | 59 10 5 | 74 | 79.73 13.51 6.76 | 100.00
| | Priors | 0.3333 0.3333 0.3333 | 三类有两个判别函数 前文皆在此成立 . estat loadings
Standardized canonical discriminant function coefficients
| function1 function2 -------------+---------------------- price | .9994096 .0619517 mpg | .1147418 .3214297 weight | .7125425 -2.05314 length | -.6690039 2.838476
多元方程分析 . estat loadings
Standardized canonical discriminant function coefficients
| function1 function2 -------------+---------------------- price | .9994096 .0619517 mpg | .1147418 .3214297 weight | .7125425 -2.05314 length | -.6690039 2.838476
. estatgrsum
Estimation sample discrimlda Summarized by clus5
| clus5 Mean | 1 2 3 | Total -------------+---------------------------------+----------
price | 4846.746 9981.5 14091.2 | 6165.257 mpg | 22.49153 17.4 15 | 21.2973 weight | 2824.746 3662 4032 | 3019.459 length | 183.4576 205.2 206.2 | 187.9324 -------------+---------------------------------+----------
N | 59 10 5 | 74
. estatmanova
Number of obs = 74
W = Wilks' lambda L = Lawley-Hotelling trace
P = Pillai's trace R = Roy's largest root
Source | Statisticdf F(df1, df2) = F Prob>F -----------+--------------------------------------------------
clus5 | W 0.1043 2 8.0 136.0 35.63 0.0000 e
| P 0.9304 8.0 138.0 15.00 0.0000 a
| L 8.2507 8.0 134.0 69.10 0.0000 a
| R 8.2102 4.0 69.0 141.63 0.0000 u
|--------------------------------------------------
Residual | 71
-----------+-------------------------------------------------- Total | 73
--------------------------------------------------------------
e = exact, a = approximate, u = upper bound on F .
两次判别qta提高判别率
. discrimqda price mpg weight length, group(foreign)
Quadratic discriminant analysis
Resubstitution classification summary
+---------+ | Key | |---------| | Number | | Percent | +---------+
| Classified True foreign | Domestic Foreign | Total -------------+--------------------+---------
Domestic | 45 7 | 52
| 86.54(提高啦!!!!) | | Foreign | 0 22 | 22 | 0.00 100.00 | 100.00 -------------+--------------------+---------
Total | 45 29 | 74 | 60.81 39.19 | 100.00 | | Priors | 0.5000 0.5000 | 原来的——判别之后的类别及概率 .. estat list
+------------------------------------------------+
| | Classification | Probabilities |
| | | | Obs.| True Class. | Domestic Foreign | |-----+----------------------+-------------------|
| 1 | Domestic Domestic | 1.0000 0.0000 | | 2 | Domestic Domestic | 1.0000 0.0000 | | 3 | Domestic Domestic | 0.9935 0.0065 | | 4 | Domestic Domestic | 1.0000 0.0000 |
13.46 | | 100.00 | 5 | Domestic Domestic | 1.0000 0.0000 | |-----+----------------------+-------------------|
| 6 | Domestic Domestic | 1.0000 0.0000 | | 7 | Domestic Foreign * | 0.2235 0.7765 | | 8 | Domestic Domestic | 1.0000 0.0000 | | 9 | Domestic Domestic | 1.0000 0.0000 | | 10 | Domestic Domestic | 1.0000 0.0000 | |-----+----------------------+-------------------|
| 11 | Domestic Domestic | 1.0000 0.0000 | | 12 | Domestic Domestic | 0.8931 0.1069 | | 13 | Domestic Domestic | 1.0000 0.0000 | | 14 | Domestic Foreign * | 0.2789 0.7211 | | 15 | Domestic Domestic | 1.0000 0.0000 |
总体错判率 . estaterrorrate
Error rate estimated by error count
| foreign
| Domestic Foreign | Total
-------------+----------------------+----------
Error rate | .1346154 (降低了!!!) 0 | .0673077 -------------+----------------------+----------
Priors | .5 .5 |【实际上不一样】 . sum foreign
Variable | Obs Mean Std. Dev. Min -------------+--------------------------------------------------------
foreign | 74 .29729730.3 .4601885 0 改变线性概率再检验
. discrimlda price mpg weight length, group(foreign) priors(0.7,0.3)
Linear discriminant analysis
Resubstitution classification summary
+---------+ | Key | |---------| | Number | | Percent | +---------+
| Classified True foreign | Domestic Foreign | Total
Max 1 -------------+--------------------+---------
Domestic | 44 8 | 52 | 84.62 15.38 | 100.00 | | Foreign | 1 21 | 22 | 4.55 95.45 | 100.00 -------------+--------------------+---------
Total | 45 29 | 74 | 60.81 39.19 | 100.00 | | Priors | 0.7000 0.3000 |
主成份分析
对数据进行降维处理PCA选几个变量出来几个主成份 . pca price mpg weight length turn
Principal components/correlation Number of obs = 74
Number of comp. = 5
Trace = 5
Rotation: (unrotated = principal) Rho = 1.0000
--------------------------------------------------------------------------
Component | Eigenvalue Difference Proportion Cumulative
-------------+------------------------------------------------------------
Comp1 | 3.77621 3.01112 0.7552 0.7552
Comp2 | .76509 .488796 0.1530【两个占总体的90%以上,故选这两个】 0.9083
Comp3 | .276294 .139261 0.0553 0.9635
Comp4 | .137033 .0916582 0.0274 0.9909
Comp5 | .0453749 . 0.0091 1.0000 --------------------------------------------------------------------------
Principal components (eigenvectors)
------------------------------------------------------------------------------
Variable | Comp1 Comp2 Comp3 Comp4 Comp5 | Unexplained
. arimad1.gnp96#作一阶差分#,arima(2,1,2)
Ar一阶差分 Ma 随意误差项
对于不显著的两个L2项可以去掉 . arima D1. gnp96, arima(2,0,2)
接下来引入季节差分
. arima gnp96, arima(2,1,2) sarima(0,1,1,4)#季度为4,月份为12#
在此基础上作预测(对差分后的预测)扩充了四个季度的值
再加上y以保证自由度进行预测
预测残差
建立arch 3/2模型——不收敛= =!!(或许是不具有方差性的原因)
检验序列是否平稳
不拒绝H0,表明不平稳(金融数据特征)。H0:正常平行。
一阶差分后变平稳。
一阶差分后的数据和date的图表明数据在围绕一个趋势上下波动 scatter d1.gnp96 date
菲利普斯检验——不拒绝H0
面板数据
时间序列和截面数据的结合 Xtset id year
Xtset id year,yearly Xtsumcpip
分类数据的汇总 Xttabcp 变化趋势图 Xtlinecp
.xtlinecp, overlay放在一张图中的所有曲线,便于比较关系
主要介绍静态模型 1. 混合模型 .regcpip
.xtregcpip, fe固定效应结果【假定个体不同】【看F检验结果】 predictfix_indic,u
listfix_indic #预测个体效应#
.xtregcpip, re 随机效应结果 GLS回归 对随机效应模型考虑用其他方法估计 .xtregcpip, mle最大似然估计 .xtregcpip, be 组间回归 [系数应大体稳定] 2. 总体模型 .xtregcpip,pa
验证使用固定效应or随机效应模型hausman检验 xtregcpip, re 随机效应 est store random 保存
xtregcpip, fe固定效应 est store fix 保存 hausman fix random 检验 拒绝H0 用固定效应模型
百度搜索“70edu”或“70教育网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,70教育网,提供经典综合文库stata课堂命令讲解在线全文阅读。
相关推荐: