MySQL Solutions – On Premises and Cloud.: Analyze your MySQL Database Service(MDS) data with Zeppelin

MDS-Heatwave

Data Analysis using Zeppelin

The Complete Guide to access MDS with HeatWave by Zeppelin

In this guide we will walk through how to install and access
MDS with Heatwave.

1.
What is MDS and HeatWave ?

2.
What is Zeppelin  ?

3.
Environment details

4.
How to Install Zeppelin ?

5.
Network Consideration

6.
How to access HeatWave

7.
MDS Data Analysis using Zeppelin

8.
Conclusion

We
are living in the data world and
as the data is increasing with large
velocity, it is important to get the results quicker until earlier MySQL has
challenged to produce results against larger data size in faster way how ever
there is “paradigm shift” on current solutions of MySQL and

Now, MySQL (only in PaaS model of OCI) comes with HeatWave ,which is
a distributed, scalable, shared-nothing, in-memory, hybrid columnar, query
processing engine designed for extreme performance.

It is enabled when you add a heatwave cluster to a MySQL DB System.

So MySQL database service(MDS) will give faster results and  then needed some data analysis tool to make
meaningful of the data , get more insight of data.

In this blog let me introduce Apache
Zeppelin for MDS data analysis.

Apache Zeppelin , an open
source multipurpose notebook which help users to represent and analyze your
data in the form of graphs or charts so that it help the organization to take
quick decision.

We will explore each item in details…

Overall, My Idea is to
show you quick demo , how easily you can connect MDS through Zeppelin.

Zepplein can be installed anywhere (any public/private  cloud
, On-Premises).

What is MDS and Heatwave ?

Oracle MySQL Database
Service(MDS)
is a fully managed database service that lets developers
quickly develop and deploy secure, cloud native applications using the
world’s most popular open source database.

MySQL Database Service is the only MySQL cloud service with an
integrated, high performance, in-memory query accelerator – Heatwave. It enables customers to run
sophisticated analytics directly against their operational MySQL
databases—eliminating the need for complex, time-consuming, and expensive
data movement and integration with a separate analytics database.

MySQL Database Service is 100% built, managed, and supported by the
OCI and MySQL engineering teams.

more info:- https://www.oracle.com/mysql/

Heatwave:- https://www.oracle.com/mysql/heatwave

MDS Business benefits :-

https://mysqlsolutionsarchitect.blogspot.com/2022/02/understanding-mysql-database-servicemds.html

What is Zeppelin ?

Zeppelin is web-based
notebook that enables data-driven, interactive  data analytics and collaborative documents
with language like SQL, Scala, Python, R and more.

Things
like data ingestion, data exploration, data visualization, and data analytics
can be done in the zeppelin notebook.

Features of Apache Zeppelin

v  Data Ingestion.

v  Data Discovery

v  Data Analytics

v  Data
Visualization and Collaboration

More
info:-
https://zeppelin.apache.org/

Network Consideration

Make sure the port of zeppelin 8080
is whitelisted in your environment

and in case you are using Oracle Cloud Infrastructure (OCI) then make
sure ingress rule is configured to white list the compute instance where
Zeppelin is installed and also compute instance is able to ping pong MDS
instance.

To access the Zeppelin, always access with proper user with Interpreter access otherwise any user
can access your Zeppelin,

Sometime anonymous user get into the portal of Zeppelin by entering
the Zeppelin public IP address but however if your interpreter is restricted
with particular user then your workspaces is safer.

Hence, It is important to have data source authorization in Apache
Zeppelin

https://zeppelin.apache.org/docs/0.10.0/setup/security/datasource_authorization.html

Note:- I couldn’t dig much in details about security and my major focused
on where you are , how are you accessing the Zeppelin , how eill have seamless
experiences with MDS and HeatWave and do awesome visualization and data
analysis

Installation of Zeppelin

In this blog, installation of Zeppelin will be on Oracle
Cloud Infrastructure (https://www.oracle.com/in/cloud/)

 With below
specifications

 

Step 1
#Install JDK

sudo yum install
java-11-openjdk-devel

Step 2:- Download the Zeppelin by
using below command

wget https://dlcdn.apache.org/zeppelin/zeppelin-0.10.0/zeppelin-0.10.0-bin-all.tgz

Step 3 :- create user and give
the permission

sudo adduser -d /home/opc/zeppelin -s /sbin/nologin zeppelin

sudo chown -R zeppelin:zeppelin /home/opc/zeppelin

Step 4:- rename zeppelin site
template to zeppelin site

cd /home/opc/zeppelin/conf

sudo cp zeppelin-site.xml.template zeppelin-site.xml

sudo cp zeppelin-site.xml.template
zeppelin-site.xml

Step 5 :- #Start Zeppelin

Connect Zeppelin

http://<IP Address of Compute Instance/Local IP>::8080/#/

 

Connect MDS and Heatwave through Zeppelin

Download MySQL Connector/J

https://dev.mysql.com/downloads/connector/j/

rpm -ivh
mysql-connector-java-8.0.28-1.el7.noarch.rpm

warning: mysql-connector-java-8.0.28-1.el7.noarch.rpm: Header V4 RSA/SHA256
Signature, key ID 3a79bd29: NOKEY

error:
Failed dependencies:

        java-headless >=
1:1.8.0 is needed by mysql-connector-java-1:8.0.28-1.el7.noarch

[

Fix:-
yum -y install java-headless
 

Create a MySQL Interpreter

#Create directory called mds
under zeppelin’s interpreter folder

# mkdir mds

#move the “mysql-connector-java.jar” file into MDS folder

#cp /usr/share/java/mysql-connector-java.jar
/home/opc/zeppelin/interpreter/mds/

Navigate to the  Interpreter

http://<IP
Address of Compute Instance/Local IP>:8080/#/interpreter

Search for MDS interpreter and
fill below details

 

 

 

Finally once changes are final
then it looks like below

Access data from MDS and Heatwave

Please ensure MDS and
Heatwave is up and running.

Create a notebook and access MDS with Heatwave via Zeppelin

Awesome! , Zeppelin Connected to MDS

Let’s do data analysis by using zeppelin

##Run Below SQL statement to generate report

use airportdb;

SELECT airline.airlinename, SUM(booking.price) as
price_tickets, count(*) as nb_tickets

FROM booking, flight, airline, airport_geo

WHERE booking.flight_id=flight.flight_id AND

airline.airline_id=flight.airline_id
AND

flight.from=airport_geo.airport_id
AND

airport_geo.country
= “UNITED STATES”

GROUP BY

airline.airlinename

ORDER BY

nb_tickets desc,
airline.airlinename limit 10;

More
info about Heatwave:- 
https://www.oracle.com/mysql/heatwave/

Conclusion

Apache Zeppelin is a kind of tool, which makes Data
Scientist life smooth, they can do everything they need in one place. Things
like data ingestion, data exploration, data visualization, and data analytics
can be done in the zeppelin notebook and MDS HeatWave is a massively parallel,
high performance, in-memory query accelerator for Oracle MySQL Database Service
that accelerates MySQL performance by orders of magnitude for analytics and
mixed workloads and cost lesser than Specialist analytics product like Amazon
Redshift, Aurora ,Snowflake ,Azure Synpase ,google Big Query etc.

Leave a Reply

Your email address will not be published.