Guangning Yu's Blog

Neural Network

2019-02-17 01:40:32  |  MachineLearning

Calculate the similarity of two vectors

2019-02-17 01:40:32  |  MachineLearning

Euclidean distance

  1. from sklearn.metrics.pairwise import euclidean_distances
  2. euclidean_distances([[1,2,3], [100,200,300]])
  3. # return:
  4. # array([[ 0. , 370.42408129],
  5. # [370.42408129, 0. ]])

Cosine similarity

  1. from sklearn.metrics.pairwise import cosine_similarity
  2. cosine_similarity([[1,2,3],[100,200,300]])
  3. # return:
  4. # array([[1., 1.],
  5. # [1., 1.]])

Pearson correlation

  1. from scipy.stats.stats import pearsonr
  2. pearsonr([1,2,3], [100,200,300])
  3. # return ('1.0', 0.0) // (Pearson’s correlation coefficient, 2-tailed p-value)

Cosine Similarity and Pearson Correlation Coefficient

2019-02-17 01:40:32  |  MachineLearning

Logistic Regression

2019-02-17 01:40:32  |  MachineLearning

enter image description here
enter image description here

  1. #!/usr/bin/env python
  2. # -*- coding: utf-8 -*-
  3. import urllib2
  4. from numpy import mat, ones, shape, exp, array, arange
  5. import matplotlib.pyplot as plt
  6. def createDataSet():
  7. features = []
  8. labels = []
  9. lines = urllib2.urlopen('https://raw.github.com/pbharrin/machinelearninginaction/master/Ch05/testSet.txt').readlines()
  10. for line in lines:
  11. line = line.strip().split()
  12. features.append([1.0, float(line[0]), float(line[1])]) # set x0 to 1.0
  13. labels.append(int(line[2]))
  14. return features, labels
  15. def sigmoid(value):
  16. return 1.0 / (1 + exp(-value))
  17. def gradAscent(features, labels, alpha=0.001, iterations=500):
  18. '''
  19. 梯度上升算法:
  20. - 批处理算法:每次更新回归系数时都需要遍历整个数据集
  21. '''
  22. featureMatrix = mat(features)
  23. labelMatrix = mat(labels).transpose()
  24. m, n = shape(featureMatrix)
  25. weights = ones((n, 1))
  26. for k in range(iterations):
  27. h = sigmoid(featureMatrix*weights)
  28. error = (labelMatrix - h)
  29. weig

Collaborative Filtering

2019-02-17 01:40:32  |  MachineLearning

user-based collaborative filtering

  1. for each user, find similar users by calculating similarity of the ratings (e.g. euclidean distance, pearson similarity)
  2. for each item of the seleted users, calculate the weighted rating according to each user's similarity
  3. select top n new items for this user

item-based collaborative filtering

  1. for each item, calculate similarity of each other item
  2. select top rating items of this user
  3. for each selected item, find similar items and calculate the weighted rating according to each item's similarity
  4. select top n new items for this user

user-based or item-based?

  • item-based method needs to maintain the item similarity table
  • for sparse dataset, item-based method is better
  • for dense dataset, both methods have the similar performance