pv和uv和转化率如何计算,pv uv怎么统计
本文主要介绍如何通过发动进行页面浏览量(页面视图)和紫外线的计算一般我们经常会计算页面浏览量(页面视图)和紫外线,那么我们计算页面浏览量(页面视图)和紫外线的时候是不是性能最优的呢?
好,我们开始看例子:
首先看一下数据:
{ flag : sendmessage , actionType:success , from : sendmessamplatemessage , openId:otU065OELPd_cccc , timestamp:1543309410741, device:null, ip:null, bucket:1, data : votePost , appType:1, send num :1 } } { flag : sendmessamplatemessage , actionType : success : from : sendmessamplatemessage , openId : otu 065xxxxx
, timestamp:1543309410741, device:null, ip:null, bucket:4, data :{ template name : dailySignPush , appType:3, send num :1 } } { flag : send template message , actionType:success , from:sendTemplateMessage , openId : otu 065 oelpd _ rvm-pppeee , timestamp:154330941, device:null, ip:null, bucket :null, data然后我们按行读取数据,读取后,我们需要算出不同水桶中不同信息的sendNum的页面浏览量(页面视图)和紫外线,其中页面浏览量(页面视图)为sendNum的总和,紫外线为不重复的信息数。
#!/usr/专注的月饼/python# -*-编码:UTF-8-*-从py spark导入spark上下文,SparkConfimport JSON def parse JSON(log _ line):JSON _ DIC=JSON。loads(log _ line)print JSON _ DIC return(JSON _ DIC[ flag ],json_dic[actionType],JSON _ DIC[ data ][ data ][ template name ],json_dic[bucket],JSON _ DIC[ DIC g,k=line1 f2,g2,k2=line2 return (f,g g2,k k2)def main():log file=/user/root/spark/spark study 02txt master= yarn-client appName= Simple App spark study 02 conf=spark conf().setAppName(appName).set master(主)sc=spark context(conf=conf)log data=sc。文本文件(日志文件)日志阶段1=日志数据。map(lambda x:parse JSON(x))日志阶段2=日志阶段1。滤波器(lambda x:filterrdd(x))日志阶段3=日志阶段2。map(lambda x:split rdd(x))log stage 4=log stage 3。reducebykey(lambda x,y:x y)日志阶段5=日志阶段4。map(lambda x:transform rdd(x))对数阶段6=对数阶段
运行结果如下:
((3,udailySignPush ,4),(u otu 065 elpd _ rvm-lhpa 2g PC 7 qbs ,2,1))((1,uvotePost ,1),(u otU065OELPd _ cccc ,1,1))((2,ureplyPost ,5),(u otu 065 elpd _ rvm-eeee ,1,1))((3,udailySignPush ,4),(u otu 065 elpd _ rvm-pppee)