This tutorial is adapted from the Web Age course https://www.webagesolutions.com/courses/WA3208-programming-on-azure-databricks-with-pyspark-sql-and-scala.
In this tutorial, you will learn how to create and use SQL Notebooks in Databricks that enable developers and business users to query data cataloged as tables using standard SQL commands.
This tutorial depends on the resources you created in the Creating a Table in Databricks lab, where you:
• Uploaded a CSV file to the Databricks environment,
• Created and cataloged a table in Databricks
Part 1 – Create an SQL Notebook
1. Click through Create > Notebook
In the Create Notebook page that opens, perform the following activities:
2. For Name, enter SqlNB
3. For Default Language, select SQL
4. Use the name of a currently active cluster, provision one if necessary, for Cluster
5. Click Create
A new SQL Notebook will be provisioned to you and you can start coding in it.
Part 2 – Work in the SQL Notebook
Databricks SQL notebooks accept most of the standard SQL commands, as well as DDL and DML commands. For the supported SQL commands, visit
https://docs.databricks.com/sql/language-manual/index.html
Note: SQL commands (including table and view names) are case-insensitive except for function names and other such programming components.
1. Enter the following command (press Shift-Enter to submit):
— this is a comment
Note: Whenever possible (always), add comments to document your code.
In the Creating a Table in Databricks lab you created and cataloged the txn_2022_03_15_csv table in the default database.
2. Enter the following command:
show tables
You should see the following output (some of the detail will differ):
3. Enter the following command to read the meta-information about the table txn_2022_03_15_csv as cataloged in the default database:
desc txn_2022_03_15_csv
You should see the following output:
You can download the table as a CSV file by pressing the rightmost button at the footer of the output (the arrow down).
Note: Databricks UI supports the command name auto-completion feature on the Tab button. Use this to your advantage to save your development time and avoid frustrating typos.
4. To get the taste of the data, enter the following command to fetch the first 5 rows:
select * from txn_2022_03_15_csv limit 5
You should see:
5. Enter the following command:
select count(*) row_count from txn_2022_03_15_csv
You should see the following output:
Take note of the time it took the command to complete (it is reported as 3.02 seconds above) Copyright 2022 Web Age Solutions Inc. 112 Databricks notebooks come with the embedded visualization capability that allows you to view the output of your SQL queries using various graphs and charts. Let’s see how it works.
6. Enter the following command:
select DC, count(*) DC_COUNT from txn_2022_03_15_csv group by DC
You will recall from the Creating a Table in Databricks lab that DC – debit or credit transaction — is only available for BILL_PAY and TRANSFER, hence the null in the first row, which acts as an “all other transactions” bucket.
The null value in the DC column is rather uninformative at best, and we can fix this by replacing it with a more descriptive label N/A using the ifnull function.
7. Enter the following command:
select ifnull(DC, ‘N/A’) DC, count(*) DC_COUNT from txn_2022_03_15_csv group by DC
8. Click the middle (Graph) button at the bottom of the output to render the output as a bar graph.
You should see the following output:
If you hover your mouse pointer over a bar, you will see the number of rows that fall into the specific GROUP BY bucket.
The Plot Options … button that appears next to the graph icon at the bottom will not offer you some of the options you can reasonably expect here, like selecting a different color of the visuals. That (arguably, purely cosmetic) deficiency might be addressed in newer
versions of the Databricks notebooks’ UI.
Let’s see what you can achieve with other visuals, like the Pie chart.
9. Expand the Graph button, and select Pie
You should see the following output:
This is the “donut” shape that is preselected for the Pie plot option (that you can see if you click Plot Options …)
Note: You can download your notebook by clicking through File > Export > Source File
A text file named as .sql (SqlNB.sql, in our case) will be generated and downloaded to your local computer.
Note: Databricks automatically saves your notebook in the cloud so you do not need to worry about losing your work.