摘要

智能手机已经成为人们日常生活当中不可或缺的通信交流工具，通过智能收集可以随身随地的获取位置、通话记录、短信、微信等体现人们之间日常交互和社会关系的各种信息，人们之间的交互频率、时间、位置、地点、距离以及轨迹相似性等信息能够直接体现人们之间的交互关系以及关系强度。关系强度体现了人们之间的亲密程度，对研究人们之间的社会关系以及社交网络具有重要的意义。本文针对如何度量日常生活中人们之间的关系强度问题展开研究，提出了一个既可以对 GPS 数据进行处理又可以对基站数据进行处理，从日常轨迹、语义位置以及语义标签三个层次度量人们之间关系强度的层级模型 URSHV(User Relationship Strength Hierarchy Vote)。概括起来，主要研究内容和贡献如下：

首先，由于语义位置及语义标签与用户之间的关系强度密切相关，为此本文采用分段卡尔曼滤波算法对 GPS 位置轨迹数据进行降噪处理；采用基于密度的聚类算法对位置轨迹数据进行聚类，形成语义位置；在此基础上，采用基于规则的语义位置标注机制，通过反地理编码、语义标签推断以及输入自动补全等方式对语义位置进行语义标注；从而将 GPS 位置轨迹数据序列聚类成有意义的语义位置和语义标签，为后续的基于语义位置和语义标签计算用户之间的关系强度奠定了基础。

其次，为了从位置轨迹数据、语义位置以及语义标签三个层次计算用户之间的关系强度，采用 DTW 模型计算用户之间的空间距离来度量用户日常轨迹之间的相似度，进而使用轨迹序列熵值对用户每天轨迹的相似度进行加权处理，并将其作为用户之间的关系强度；采用主题模型 LDA 分别计算用户之间的基于物理位置和语义位置的行为模式的相似性，将其作为用户之间的关系强度；采用集成学习的思想对三个层次的度量结果进行投票，以投票结果作为最终的用户之间的关系强度。

最后，在上述研究基础上，基于 MIT 的 Reality Mining 项目的公开的数据集，利用该数据集当中的用户之间调查问卷的相似度，构造用户之间真实的关系强度作为测试基准，提出一种基于逆序对数的一致性评分度量方法对用户之间的关系强度进行度量，进而对 URSHV 模型的有效性进行实验验证，结果表明该模型能够有效地度量用户之间的关系强度。

关键词: 关系强度；轨迹相似度；DTW；熵；LDA；投票

ABSTRACT

Smart phones have become an integral part of daily life communication tools, we can collect intelligently location, call logs, text messages, WeChat anywhere which reflect a variety of information daily interactions and social relations between people. People interaction frequency, time, location, distance and the similarity of trajectory information reflects the strength of the relationship and the relationship between people.Relationship strength reflects the degree of intimacy between two different persons, which is of great importance in analyzing human’s social relationship as well as social network.

In this paper, we propose a URSHV(User Relationship Strength Hierarchy Vote), which can measure the relationship between people in daily life through GPS data from three levels, namely: daily trajectory, semantic locations and semantic labels.To sum up, the main research contents and contributions are as follows:

First of all, the strength of the relationship between users and semantic location with semantic labels are closely related, this paper uses segmented kalman filtering algorithm on GPS trajectory data to de-noising; using the clustering algorithm based on density of position trajectory data clustering, and form a semantic position; on this basis, the seman- tic annotation mechanism based on Rules, the semantic annotation of semantic encoding, geographic location by anti semantic label inference and input auto completion etc; the GPS position trajectory data sequence clustering into meaningful semantic position and semantic labels, which laid the foundation for the semantic location and semantic labels based on the strength calculation of the relationship between users.

Secondly, in order to calculate the strength of the relationship between users from the three levels of trajectory data, semantic location and semantic labels, using DTW model of spatial distance calculation between users to measure the similarity between the users, the use of trajectory sequence similarity of users every day to track the entropy weight- ing processing, and the strength of relationship between users the LDA were calculated using the topic model; semantic locations similarity and semantic labels based on behav- ior patterns among users, as the strength of the relationship between users; measurement results of three levels of the ensemble learning theory to vote, to vote as the strength of relationship between end users.

Finally, on the basis of the above study, based on the MIT reality mining project of publicly available data sets, similarity between users by using the data set of the ques- tionnaire, users construct between a real relationship strength as a benchmark for test- ing proposed an inverse logarithmic induced score measurement method to measure the strength of the relationship between users based on, and effectiveness model of URSHV for experiments, the results show that the model can effectively measure the strength of the relationship between users.

Key Words: relationship strength; trajectory similarity; DTW; entropy; LDA;

vote

摘要 ........................................................................................................................	i
ABSTRACT ...............................................................................................................	iii
第一章绪论 ............................................................................................................	1
1.1 研究背景 ...................................................................................................	1
1.2 研究现状 ...................................................................................................	2
1.2.1 社会关系强度的研究现状 ............................................................	2
1.2.2 基于移动数据的社会关系研究现状 ............................................	2
1.2.3 基于移动轨迹数据关系强度的研究现状 ....................................	2
1.3 研究内容 ...................................................................................................	3
1.4 论文结构 ...................................................................................................	5
第二章相关技术研究 ............................................................................................	7
2.1 轨迹数据预处理 .......................................................................................	7
2.1.1 滤波算法 ........................................................................................	7
2.1.2 聚类算法 ........................................................................................	9
2.2 时间序列相似度度量方法及序列熵值 ...................................................	11
2.2.1 编辑距离 ........................................................................................	12
2.2.2 Dynamic Time Warping ..................................................................	13
2.2.3 序列熵值 ........................................................................................	14
2.3 自然语言处理模型 ...................................................................................	15
2.3.1 Latent Dirichlet Allocation .............................................................	16
2.3.2 word2vec .........................................................................................	17
2.4 小结 ...........................................................................................................	18
第三章用户关系强度层级投票模型框架 ............................................................	19
3.1 模型框架描述 ...........................................................................................	19
3.1.1 SASLL 系统概述 ...........................................................................	19
3.1.2 用户关系强度计算模型概述 ........................................................	20
3.2 模型完整流程描述 ...................................................................................	21
第四章 GPS 数据处理及语义标签标注技术 ........................................................	23
4.1 SASLL 标注技术 ......................................................................................	23
4.2 计算对应语义位置 ...................................................................................	23
4.2.1 降低数据噪声 ................................................................................	24
4.2.2 剔除路上的点 ................................................................................	28
4.2.3 聚类得到语义位置 ........................................................................	30

ABSTRACT

目 录

目录