机器学习

范茂松 顾丽丽 李渊浩

目录

  • 荣誉证书:示范教学课程
    • ● 证书原件
    • ● 我和周志华院士
  • 机器学习1 绪论
    • ● 机器学习1 绪论
    • ● 实验:开发环境配置(自行完成)
  • 机器学习2 数学基础
    • ● 机器学习2 数学基础
    • ● 实验:Python基础语法与三大结构练习(自行完成)
    • ● 实验参考代码
  • 3 机器学习基础
    • ● 3 机器学习基础
  • 4 科学计算库Numpy
    • ● 科学计算库Numpy
  • 线性回归
    • ● 一元线性回归
    • ● 实验:实现一元线性回归(自行完成)
    • ● 多元线性回归
    • ● 代码实战:解析解求解回归模型
    • ● Anaconda库与问卷星的使用
    • ● 代码实战:sklearn实现多元线性回归模型
    • ● 梯度下降算法
    • ● 代码实战:批量梯度下降算法求解线性回归
    • ● 模型的评价
    • ● 代码实战:随机梯度下降算法
    • ● 代码实战:小批量梯度下降算法
    • ● 多项式回归
    • ● 岭回归及代码实现
  • 对数几率回归
    • ● 对数几率回归基本概念
    • ● 极大似然函数与对数几率回归公式详细推导
    • ● 代码实战:用sklearn实现对数几率回归
    • ● 分类问题的模型评价及代码实现
    • ● 多分类问题及代码实现
  • exam
    • ● 线性回归
    • ● 对数几率回归
    • ● 房价预测
  • 决策树
    • ● 决策树基本概念
    • ● 信息量 信息熵
    • ● 信息增益及计算实例
    • ● 信息增益计算
    • ● 代码实战:sklearn实现决策树
    • ● ID3缺点分析、C4.5、CART决策树
    • ● 基尼指数计算实例、预剪枝与后剪枝
  • 支持向量机
    • ● 超平面
    • ● SVM基本问题
    • ● 拉格朗日对偶函数与KKT条件
    • ● 式 6.11求导过程
    • ● SMO算法
    • ● 软间隔与正则化
    • ● 核函数与核方法
    • ● 代码实战:SVM预测乳腺癌
    • ● 支持向量回归
    • ● 代码实战:SVR预测空气质量指数
  • 贝叶斯分类器
    • ● 贝叶斯定理
    • ● 朴素贝叶斯分类器原理
    • ● 连续值处理方法
    • ● 代码实战:用sklearn实现贝叶斯分类器预测银行营销数据
  • 集成学习
    • ● 集成学习概述与Boosting
    • ● AdaBoost算法详解
    • ● 代码实战:用AdaBoost预测乳腺癌数据
    • ● Bagging与随机森林
    • ● 代码实战:用随机森林预测银行营销数据
    • ● 集成学习组合策略
  • 聚类算法
    • ● 聚类算法概述
    • ● KMeans算法
    • ● 代码实战:KMeans算法实现
  • 降维算法
    • ● 降维算法概述
    • ● PCA降维算法
    • ● PCA降维算法2
    • ● 代码实战:用Sklearn实现PCA算法
  • final-exam
    • ● exam1
    • ● exam2
    • ● exam3
exam2

现有某国外高校的高等数学成绩数据集,其中有30个特征,3个成绩值,说明如下:

特征:

1 school - student's school (binary: "GP" - Gabriel Pereira or "MS" - Mousinho da Silveira)

2 sex - student's sex (binary: "F" - female or "M" - male)

3 age - student's age (numeric: from 15 to 22)

4 address - student's home address type (binary: "U" - urban or "R" - rural)

5 famsize - family size (binary: "LE3" - less or equal to 3 or "GT3" - greater than 3)

6 Pstatus - parent's cohabitation status (binary: "T" - living together or "A" - apart)

7 Medu - mother's education (numeric: 0 - none,  1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)

8 Fedu - father's education (numeric: 0 - none,  1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)

9 Mjob - mother's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")

10 Fjob - father's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")

11 reason - reason to choose this school (nominal: close to "home", school "reputation", "course" preference or "other")

12 guardian - student's guardian (nominal: "mother", "father" or "other")

13 traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)

14 studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)

15 failures - number of past class failures (numeric: n if 1<=n<3, else 4)

16 schoolsup - extra educational support (binary: yes or no)

17 famsup - family educational support (binary: yes or no)

18 paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)

19 activities - extra-curricular activities (binary: yes or no)

20 nursery - attended nursery school (binary: yes or no)

21 higher - wants to take higher education (binary: yes or no)

22 internet - Internet access at home (binary: yes or no)

23 romantic - with a romantic relationship (binary: yes or no)

24 famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)

25 freetime - free time after school (numeric: from 1 - very low to 5 - very high)

26 goout - going out with friends (numeric: from 1 - very low to 5 - very high)

27 Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)

28 Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)

29 health - current health status (numeric: from 1 - very bad to 5 - very good)

30 absences - number of school absences (numeric: from 0 to 93)

 

成绩值:

31 G1 - 第一阶段成绩 (numeric: from 0 to 20)

31 G2 - 第二阶段成绩 (numeric: from 0 to 20)

32 G3 - 最终成绩 (numeric: from 0 to 20, output target)

 

任务:

请仔细分析该数据集,建立一个适当的模型,预测最终成绩。


要求:

1、用自己熟悉的方法建立模型并独立编程实现这个任务。2、模型训练好了需要可视化。3、要输出训练好的模型的参数。4、请用熟知的评价方法对你训练好的模型进行评价。5、遇到代码上的问题可以百度搜索,严禁交头接耳互相讨论。6、如有发现雷同代码,直接计0分。7、请直接将写好的代码复制粘贴至这里,作出的图片也直接上传这里即可,无需另外上传文件。