博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
SparkR-Install
阅读量:5228 次
发布时间:2019-06-14

本文共 5087 字,大约阅读时间需要 16 分钟。

1.下载R

 1.1 下载URL:https://cran.r-project.org/src/base/R-3/

 

 1.2 环境变量配置 

 1.3 测试安装:

 

2.下载Rtools33

 URL:https://cran.r-project.org/bin/windows/Rtools/

2.1 配置环境变量

2.2 测试安装成功

3.安装RStudio

   URL: https://www.rstudio.com/products/rstudio/download/

     直接下一步即可安装

    

4.安装JDK并设置环境变量

  4.1环境变量配置:

   

  

  

  4.2测试:

5.下载Spark安装程序

  5.1 URL: http://spark.apache.org/downloads.html

    

 

     5.2解压到本地磁盘的对应目录

 

      

6.安装Spark并设置环境变量

    

   

7.测试SparkR

  

  

  注意:如果发现了提示 WARN NativeCodeLader:Unable to load native-hadoop library for your platform.....using

builtin-java classes where applicable  需要安装本地的hadoop库

8.下载hadoop库并安装

  URL: http://hadoop.apache.org/releases.html

  

   

 

9.设置hadoop环境变量

   

   

10.重新测试SparkR

   10.1 如果测试时候出现以下提示,需要修改log4j文件INFO为WARN,位于\spark\conf下

   

    10.2 修改conf中的log4j文件:

    

       

     10.3 重新运行SparkR,输出就会变少

     

11.运行SprkR代码

    在Spark2.0中增加了RSparkSql进行Sql查询

    dataframe为数据框操作

    data-manipulation为数据转化

    ml为机器学习

    

   11.1 使用crtl+ALT+鼠標左鍵 打开控制台在此文件夹下

  

  11.2 执行spark-submit xxx.R文件即可

 

12.安装SparkR包

    12.1 将spark安装目录下的R/lib中的SparkR文件拷贝到..\R-3.3.2\library中,注意是将整个Spark文件夹,而非里面每一个文件。

    源文件夹:

      

     目的文件夹:

        

 

     12.2  在RStudio中打开SparkR文件并运行代码dataframe.R文件,采用Ctrl+Enter一行行执行即可

     SparkR语言的dataframe.R源代码如下

## Licensed to the Apache Software Foundation (ASF) under one or more# contributor license agreements.  See the NOTICE file distributed with# this work for additional information regarding copyright ownership.# The ASF licenses this file to You under the Apache License, Version 2.0# (the "License"); you may not use this file except in compliance with# the License.  You may obtain a copy of the License at##    http://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.#library(SparkR)# Initialize SparkContext and SQLContextsc <- sparkR.init(appName="SparkR-DataFrame-example")sqlContext <- sparkRSQL.init(sc)# Create a simple local data.framelocalDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))# Convert local data frame to a SparkR DataFramedf <- createDataFrame(sqlContext, localDF)# Print its schemaprintSchema(df)# root#  |-- name: string (nullable = true)#  |-- age: double (nullable = true)# Create a DataFrame from a JSON filepath <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json")peopleDF <- read.json(sqlContext, path)printSchema(peopleDF)# Register this DataFrame as a table.registerTempTable(peopleDF, "people")# SQL statements can be run by using the sql methods provided by sqlContextteenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")# Call collect to get a local data.frameteenagersLocalDF <- collect(teenagers)# Print the teenagers in our dataset print(teenagersLocalDF)# Stop the SparkContext nowsparkR.stop()

13.Rsudio 运行结果

      

 补充:SparkR自带机器学习的例子:(D:\......\spark-1.6.0-bin-hadoop2.6\spark-1.6.0-bin-hadoop2.6\examples\src\main\r)

  源代码如下:     

1 # 2 # Licensed to the Apache Software Foundation (ASF) under one or more 3 # contributor license agreements.  See the NOTICE file distributed with 4 # this work for additional information regarding copyright ownership. 5 # The ASF licenses this file to You under the Apache License, Version 2.0 6 # (the "License"); you may not use this file except in compliance with 7 # the License.  You may obtain a copy of the License at 8 # 9 #    http://www.apache.org/licenses/LICENSE-2.010 #11 # Unless required by applicable law or agreed to in writing, software12 # distributed under the License is distributed on an "AS IS" BASIS,13 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.14 # See the License for the specific language governing permissions and15 # limitations under the License.16 #17 18 # To run this example use19 # ./bin/sparkR examples/src/main/r/ml.R20 21 # Load SparkR library into your R session22 library(SparkR)23 24 # Initialize SparkContext and SQLContext25 sc <- sparkR.init(appName="SparkR-ML-example")26 sqlContext <- sparkRSQL.init(sc)27 28 # Train GLM of family 'gaussian'29 training1 <- suppressWarnings(createDataFrame(sqlContext, iris))30 test1 <- training131 model1 <- glm(Sepal_Length ~ Sepal_Width + Species, training1, family = "gaussian")32 33 # Model summary34 summary(model1)35 36 # Prediction37 predictions1 <- predict(model1, test1)38 head(select(predictions1, "Sepal_Length", "prediction"))39 40 # Train GLM of family 'binomial'41 training2 <- filter(training1, training1$Species != "setosa")42 test2 <- training243 model2 <- glm(Species ~ Sepal_Length + Sepal_Width, data = training2, family = "binomial")44 45 # Model summary46 summary(model2)47 48 # Prediction (Currently the output of prediction for binomial GLM is the indexed label,49 # we need to transform back to the original string label later)50 predictions2 <- predict(model2, test2)51 head(select(predictions2, "Species", "prediction"))52 53 # Stop the SparkContext now54 sparkR.stop()

  运行结果:

    

    

END~

转载于:https://www.cnblogs.com/jackchen-Net/p/6648755.html

你可能感兴趣的文章
ABAP 仓库理货单导出
查看>>
C#获取客户端浏览器信息
查看>>
剑指Offer——删除链表中重复的结点
查看>>
大数相乘练习
查看>>
CentOS 7开机启动脚本
查看>>
超像素分割, 并获取每一个分区
查看>>
shell提示符显示git当前分支
查看>>
iTextSharp生成pdf文档案例
查看>>
设置IIS7/IIS7.5的FTP支持断点续传
查看>>
JPA规范及其它持久层框架
查看>>
CentOS 6 安装Syslog-ng
查看>>
hdu Rikka with string (dfs)
查看>>
我的第一条博客
查看>>
http://qiye.qianzhan.com/ 企业查询宝
查看>>
unity3d-解密加密数据
查看>>
异步多线程处理
查看>>
jenkins 构建后发送钉钉消息通知(插件)
查看>>
自定义admin组件
查看>>
城市小区信息
查看>>
Python迭代器和关键字 global ,nonlocal
查看>>