Guangning Yu's Blog

Graph

2019-01-28 18:01:15
### Graph Databases - Graphql - Apache Giraph - Apache GraphX - Neo4j - Titan - OrientDB ### Graph Algorithms - PageRank - Connected Components - Triangle Counting ### Graph Theory - Vertex - Edge

test jms

2019-02-17 01:40:29
  1. set proxy
  1. wget http://downloads.lightbend.com/scala/2.11.8/scala-2.11.8.rpm
    3.

Building Java Projects with Maven

2019-02-17 01:40:29  |  Java Maven

Create the directory structure

  1. mkdir -p src/main/java/hello
  2. mkdir -p src/test/java/hello

Create classes

src/main/java/hello/HelloWorld.java

  1. package hello;
  2. import org.joda.time.LocalTime;
  3. public class HelloWorld {
  4. public static void main(String[] args) {
  5. LocalTime currentTime = new LocalTime();
  6. System.out.println("The current local time is: " + currentTime);
  7. Greeter greeter = new Greeter();
  8. System.out.println(greeter.sayHello());
  9. }
  10. }

src/main/java/hello/Greeter.java

  1. package hello;
  2. public class Greeter {
  3. public String sayHello() {
  4. return "Hello world!";
  5. }
  6. }

Write a test

src/test/java/hello/GreeterTest.java

  1. package hello;
  2. import static org.hamcrest.CoreMatchers.containsString;
  3. import static org.junit.Assert.*;
  4. import org.junit.Test;
  5. public class GreeterTest {
  6. private Greeter greeter = new Greeter();
  7. @Test
  8. public void greeterSaysHello() {
  9. assertThat(greeter.sayHel

Crontab basics

2019-02-17 01:40:29
  • every 30 minutes
  1. 0,30 * * * * ...

Plot decay function

2019-02-17 01:40:32
```python import matplotlib.pyplot as plt import numpy as np alpha = [0.99, 0.9, 0.8] x = np.arange(0, 100, 1) y1 = np.vectorize(lambda t: alpha[0]**t)(x) y2 = np.vectorize(lambda t: alpha[1]**t)(x)

Lodash Basics

2019-02-17 01:40:29
  • loop through object
  1. > var _ = require('lodash');
  2. > foo
  3. { a: { num: 1 }, b: { num: 2 }, c: { num: 3 } }
  4. > _.forEach(foo, (vlu, key) => {console.log(vlu, key);})
  5. { num: 1 } 'a'
  6. { num: 2 } 'b'
  7. { num: 3 } 'c'
  8. { a: { num: 1 }, b: { num: 2 }, c: { num: 3 } }
  • filter object of objects
  1. > foo
  2. { a: { num: 1 }, b: { num: 2 }, c: { num: 3 } }
  3. > _.pickBy(foo, (i) => i.num>=2)
  4. { b: { num: 2 }, c: { num: 3 } }

Setup Nginx on Ubuntu 14.04

2018-11-21 10:00:54
- start `nginx` service ``` sudo service apache2 stop sudo service nginx start sudo service nginx status ``` - restart `nginx` when the server is rebooted ``` sudo update-rc.d nginx defaults ``` -

Bash Basics

2019-02-17 01:40:29  |  shell
  • assign default value to variable
  1. DEFAULT=5
  2. RESULT=${VAR:-$DEFAULT}
  • trap error exit
  1. function rollback() {
  2. if [[ $? -ne 0 ]]; then
  3. echo "rollback"
  4. fi
  5. }
  6. set -ueo pipefail
  7. trap rollback EXIT
  • if file exists
  1. #!/bin/bash
  2. FILE="/etc/passwd"
  3. if [[ -f $FILE ]];then
  4. echo "$FILE exists"
  5. else
  6. echo "$FILE doesn't exist"
  7. fi
  • if directory exists
  1. #!/bin/bash
  2. DIR="/var/log"
  3. if [[ -d $DIR ]]; then
  4. echo "$DIR exists"
  5. else
  6. echo "$DIR doesn't exist"
  7. fi
  • eval
  1. function assert_var_exists() {
  2. local condition="[[ ! -n \"\${$1+set}\" ]]"
  3. if eval $condition; then
  4. echo "The variable ${1} does not exist. Exit."
  5. exit 1
  6. fi
  7. }
  8. function export_var() {
  9. assert_var_exists $1
  10. local var_value=$(($1))
  11. eval "export $1=\$var_value"
  12. echo "$1=$var_value"
  13. }

Git Basics

2019-02-17 01:40:29  |  git
  • Setting user name and email
  1. git config --global user.name "Guangning Yu"
  2. git config --global user.email "hi@guangningyu.com"
  • Showing remotes
  1. git remote -v
  • Changing a remote’s URL
  1. git remote set-url origin git@github.com:guangningyu flasky.git
  • Pulling from remotes
  1. git pull origin master
  • Pushing to remotes
  1. git push origin master
  • Commit using default message
  1. git commit no-edit
  • Cancel last commit
  1. git reset HEAD~
  • Undo a git add
  1. git reset filename.txt
  • Define author & committer
  1. GIT_COMMITTER_NAME='Jane Doe' GIT_COMMITTER_EMAIL='jane@doe.com' git commit author="John Doe <john@doe.com>" -m "This is authored by John Doe and committed by Jane Doe."
  • Undo git rm
  1. git checkout HEAD path/to/file
  • Check out a remote branch
  1. git branch -a
  2. git checkout test_branch
  • Create a new branch
  1. git checkout -b [name_of_your_new_branch]
  • Push a local branch to remote
  1. git push -u origi

Javascript Basics

2019-02-17 01:40:29  |  Javascript

select last element in array

  1. my_array.slice(-1)[0]
  1. my_array.pop()

微信小程序组件与接口

2017-12-18 23:15:24

What is Bitcoin

2017-12-12 23:04:19

Regression using Keras

2019-02-17 01:40:32  |  DeepLearning Keras
  1. #!/usr/bin/env python
  2. import urllib2
  3. import numpy as np
  4. from keras.models import Sequential
  5. from keras.layers import Dense
  6. from keras.wrappers.scikit_learn import KerasRegressor
  7. from sklearn.model_selection import cross_val_score
  8. from sklearn.model_selection import KFold
  9. from sklearn.preprocessing import StandardScaler
  10. from sklearn.pipeline import Pipeline
  11. def load_data():
  12. X = []
  13. Y = []
  14. data_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data'
  15. for line in urllib2.urlopen(data_url).readlines():
  16. line = map(float, line.split())
  17. X.append(line[0:13])
  18. Y.append(line[13])
  19. return X, Y
  20. def basic_model():
  21. # create model
  22. model = Sequential()
  23. model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
  24. model.add(Dense(1, kernel_initializer='normal'))
  25. # compile model
  26. model.compile(loss='mean_squared_error', optimizer='adam')
  27. return model
  28. d

消费信贷基本名词和概念

2017-03-04 12:05:54

逾期天数(days past due, DPD)

已逾契约书约定缴款日的延滞天数,贷放型产品自缴款截止日(通常为次一关账日)后第一天开始计算;信用卡比较特别,虽然缴款截止日约为关账日后20天,但逾期天数亦是由次一关账日后起算。

逾期期数(bucket)

逾期1期称为M1,2期称为M2,3期称为M3……以此类推。信用卡缴款截止日与次一关账日之间虽然不计算逾期天数,但其bucket称为M0。
注意,因为每月天数不一定相同,所以各期的长短会有不同。

逾期阶段(stage)

依bucket可分为前期(front end)、中期(middle range)、后期(hot core)、转呆账(write-off)。
stage的划分方式并无硬性规定,可依各银行的催收策略、转呆账政策与产品特性决定。以信用卡为例,一般将M1列为前期,M2-M3列为中期,M4以上列为后期,若已转列呆账者则列入转呆账。

即期指标(coincidental)

即期指标 = 当期各bucket延滞金额 / 当期应收账款

即期指标是计算延滞率时常用的两种方法之一,其概念为分析当期应收帐款的质量结构。一般公开信息所显示的延滞率,若无特别注明,皆是以coincidental的概念计算的。

enter image description here

递延指标(lagged)

递延指标 = 当期各bucket延滞金额 / 各bucket对应的历史月份应收帐款

即期指标的分母一律是当期应收账款,不过其分子实际上是由之前的应收账款产生的;因为为了回溯逾期起源,递延指标将分母改成了相对应的之前月份的应收账款。

enter image description here

期末结算(cycle end)

期末结算为信用卡特有的结算方式。因为信用卡客群最为庞大,作业处理相当耗时,许多银行会将其客户划分至不同账务周期(cycle),因此信用卡产品下通常有多个关账日。
银行必须就各个cycle客户分别管理,尤其是账务及催收单位皆以cycle为作业周期。

月底结算(month end)

月底结算报表主要表达各月月底结算数据,适用于消费金融所有产品,尤其在跨产品并列分析时,为实现资料切点一致,多采用月底结算数据。

参考:《互联网金融时代:消费信贷评分建模与应用》

Neural Network

2019-02-17 01:40:32  |  MachineLearning

Hive Basics

2019-02-17 01:40:29  |  Hive
  • Date functions
  1. -- change date format
  2. from_unixtime(unix_timestamp('20150101' ,'yyyyMMdd'), 'yyyy-MM-dd')
  3. -- add n days
  4. date_add('2015-11-01', 30) -- will return '2015-12-01'
  5. -- calculate date difference
  6. datediff('2015-12-01', '2015-11-01') -- will return 30
  • Generate row number
  1. row_number() over (DISTRIBUTE BY... SORT BY... DESC)
  • Get partition information
  1. analyze table xxx.yyy partition(dt = '2015-12-11') compute statistics;
  2. describe formated xxx.yyy partition (dt = '2015-12-11');

Calculate the similarity of two vectors

2019-02-17 01:40:32  |  MachineLearning

Euclidean distance

  1. from sklearn.metrics.pairwise import euclidean_distances
  2. euclidean_distances([[1,2,3], [100,200,300]])
  3. # return:
  4. # array([[ 0. , 370.42408129],
  5. # [370.42408129, 0. ]])

Cosine similarity

  1. from sklearn.metrics.pairwise import cosine_similarity
  2. cosine_similarity([[1,2,3],[100,200,300]])
  3. # return:
  4. # array([[1., 1.],
  5. # [1., 1.]])

Pearson correlation

  1. from scipy.stats.stats import pearsonr
  2. pearsonr([1,2,3], [100,200,300])
  3. # return ('1.0', 0.0) // (Pearson’s correlation coefficient, 2-tailed p-value)

Cosine Similarity and Pearson Correlation Coefficient

2019-02-17 01:40:32  |  MachineLearning

Setup Shadowsocks on Ubuntu server

2015-03-02 13:35:55

Install

  1. apt-get install python-pip
  2. pip install shadowsocks

Setup

Create config file /etc/shadowsocks.json:

  1. {
  2. "server":"your_ip_address",
  3. "server_port":8388,
  4. "local_address": "127.0.0.1",
  5. "local_port":1080,
  6. "password":"your_password",
  7. "timeout":300,
  8. "method":"aes-256-cfb",
  9. "fast_open": false
  10. }

You can set multiple ports in the config file:

  1. {
  2. "server": "your_ip_address",
  3. "local_address": "127.0.0.1",
  4. "local_port": "1080",
  5. "port_password": {
  6. "8381": "password_1",
  7. "8388": "password_2"
  8. },
  9. "timeout": 300,
  10. "method": "aes-256-cfb"
  11. }

Start

  1. ssserver -c /etc/shadowsocks.json
  2. # run at background
  3. ssserver -c /etc/shadowsocks.json -d start
  4. ssserver -c /etc/shadowsocks.json -d stop

Start on boot

Edit /etc/rc.local:

  1. /usr/local/bin/ssserver -c /etc/shadowsocks.json -d start
  2. exit 0

Logistic Regression

2019-02-17 01:40:32  |  MachineLearning

enter image description here
enter image description here

  1. #!/usr/bin/env python
  2. # -*- coding: utf-8 -*-
  3. import urllib2
  4. from numpy import mat, ones, shape, exp, array, arange
  5. import matplotlib.pyplot as plt
  6. def createDataSet():
  7. features = []
  8. labels = []
  9. lines = urllib2.urlopen('https://raw.github.com/pbharrin/machinelearninginaction/master/Ch05/testSet.txt').readlines()
  10. for line in lines:
  11. line = line.strip().split()
  12. features.append([1.0, float(line[0]), float(line[1])]) # set x0 to 1.0
  13. labels.append(int(line[2]))
  14. return features, labels
  15. def sigmoid(value):
  16. return 1.0 / (1 + exp(-value))
  17. def gradAscent(features, labels, alpha=0.001, iterations=500):
  18. '''
  19. 梯度上升算法:
  20. - 批处理算法:每次更新回归系数时都需要遍历整个数据集
  21. '''
  22. featureMatrix = mat(features)
  23. labelMatrix = mat(labels).transpose()
  24. m, n = shape(featureMatrix)
  25. weights = ones((n, 1))
  26. for k in range(iterations):
  27. h = sigmoid(featureMatrix*weights)
  28. error = (labelMatrix - h)
  29. weig