基于大数据的地域职位需求和薪资分析

摘要

互联网市场很的繁荣，带动了很多招聘网站的产生。互联网市场技术的市场情况会很好的反映在招聘网站上。如果能够通过研究招聘网站对互联网技术进行研究，分析出它所在地区技能要求和薪资情况，那将会是一件特别有意义的事。

本系统主要完成的是基于大数据的地域职位需求和薪资分析的设计和实现，主要实现了以下几个功能，分别是数据爬取、数据清洗、数据存储、预测薪资、分词统计和数据呈现等功能。数据爬取采用 Python 语言，主要使用的模块有 requests、bs4 和 re 数据清洗系统使用 Hive 框架,代码放在.hql 的脚本中。数据存储系统使用 Hbase 框架，使用

Java 语言先从 Hive 读取数据，然后把读取到的数据存储到 Hbase 上。预测薪资算法采用 Java 语言实现，分词统计采用 Spark 框架，然后使用 Java 语言实现。数据呈现是使用 Java Web 搭建网站，后台使用 SSM 框架调用预测算法和分词算法，前端使用 EChars 呈现出使用预测算法和分词算法得到的结果。

本系统最终呈现的效果是一个网站，因为网站呈现出来的结果具备直观性和可观赏性，很方便用户查看。用户点击地图上的地区，就会出现相应地区的薪资预测图和该地区相应关键字对应的技能词的统计结果。

关键词：大数据，Hive，Hbase，线性规划，分词统计

Analysis of Regional Job Requirements and Salary Based on Big Data Technology

Abstract

The Internet market is booming, which has led to the creation of many recruitment websites. The market situation of Internet market technology will be well reflected in the recruitment website. It would be especially interesting to be able to research Internet technology through research recruitment websites and analyze the skill requirements and salary levels in its region.

This system mainly completes the design and implementation of regional job demand and salary analysis based on big data. It mainly realizes the following functions: data crawling, data cleaning, data storage, salary prediction, word segmentation statistics and data presentation. Data crawling is implemented in python language. The main modules used are requests, bs4 and re. The data cleaning system uses the hive framework, and the code is placed in the script of.Hql. Data storage system uses hbase framework, uses java language to read data from hive, and then stores the read data on hbase. The predictive salary algorithm is implemented in Java language, the word segmentation statistics is implemented in spark framework, and then implemented in Java language. Data presentation is to use java web to build websites, ssm framework is used in the background to call prediction algorithm and word segmentation algorithm, and echars is used in the front end to present the results obtained by using prediction algorithm and word segmentation algorithm.

The final effect of the system is a website, because the results presented by the website are intuitive and enjoyable, which is convenient for users to view. When the user clicks on the area on the map, the salary forecast map of the corresponding area and the statistical result of the skill word corresponding to the corresponding keyword in the area appear.

Key words: BigData, Hive, Hbase, Linear Programming, Word Segmentation Statistics

摘 要

目 录

摘要

目录