Data Solution 2019(6)MySQL Data Source
DataSolution2019(6)MySQLDataSource
Makesureourconnectiontodatabaseisgood
>grantallprivilegesondatabase.*toroot@‘142.xxx.xxx.xxx'identifiedby‘xxxxxx';
>flushprivileges;
InmyZeppelinNotebook
Icanusethistoloadthedependencies
%spark.dep
z.load("mysql:mysql-connector-java:5.1.47”)
ConnecttotheTableandDatabase
valhomeAdvisorCompanysRawDF=sqlContext.read
.format("jdbc")
.option("driver","com.mysql.jdbc.Driver")
.option("url","jdbc:mysql://45.55.xx.xx:3306/sillycat_services")
.option("user","root")
.option("password",“xxxxxx")
.option("dbtable","copy_home_companys")
.load()
homeAdvisorCompanysRawDF.printSchema()
homeAdvisorCompanysRawDF.registerTempTable("homeadvisorcompanys")
UsethefunctionwithinonparameterMethod
valcheckPhone:(String=>Int)=(phone:String)=>{
valregexStr="^(1\\-)?[0-9]{3}\\-?[0-9]{3}\\-?[0-9]{4}$"
if(phone.matches(regexStr)){
20
}else{
10
}
}
valcheckPhoneColumn=udf(checkPhone)
valphoneDF=homeAdvisorCompanysRawDF.withColumn("phoneScore",checkPhoneColumn(homeAdvisorCompanysRawDF("phone")))
phoneDF.select("phone","phoneScore").show(2)
UsingtheFunctionwithmultipleparameters
valcheckAddress=(location:String,street_address:String,address_locality:String,address_region:String,postal_code:String)=>{
if(location!=null&&!location.isEmpty()&&postal_code!=null&&!postal_code.isEmpty()){
20
}else{
10
}
}
valcheckAddressColumn=udf(checkAddress)
valaddressDF=phoneDF.withColumn("addressScore",checkAddressColumn(phoneDF("location"),phoneDF("street_address"),phoneDF("address_locality"),phoneDF("address_region"),phoneDF("postal_code")))
addressDF.select("phone","phoneScore","location","postal_code","addressScore").show(2)
SumupalltherelatedcolumnsandgetatotalScore
valcolumnsToSum=List(col("phoneScore"),col("addressScore"))
valresultDF=addressDF.withColumn("totalScore",columnsToSum.reduce(_+_))
resultDF.select("phone","phoneScore","location","postal_code","addressScore","totalScore").show(2);
References:
https://mvnrepository.com/artifact/mysql/mysql-connector-java/5.1.47
https://zeppelin.apache.org/docs/latest/interpreter/spark.html