Skip to main content

Relational Database in Python-SQLite

Another post starts with you beautiful people!
Hope you have enjoyed and learnt my previous post about file importing
Today we will learn how as a Data Scientist we connect to the database step by step-

Before starting the code first you must read about the very powerful library -SQLAlchemy which we will use in our exercises and you can find the details here- SQLAlchemy documentation

How To Create a Database Engine-
Here, we're going to fire up our very first SQL engine. We'll create an engine to connect to the SQLite database 'Chinook.db'. You can find the download and installation steps of this sample database from here- download sqlite sample database

An engine is just a common interface to a database, and the information it requires to
connect to one is contained in a connection string, such as sqlite:///C:\sqlite\db\chinook.db .Here, sqlite is the database driver, while chinook.db is a SQLite db file contained in the local directory.

A little bit of background on the Chinook database: the Chinook database contains information about a semi-fictional digital media store in which media data is real and customer, employee and sales data has been manually created.

We can see the database structure through sqlite3 tool and it looks like as below-

Now let's create a engine and see what tables it contains-

Output-

Now, it's time for liftoff! In this exercise, we'll perform the Hello World of SQL queries-
SELECT, in order to retrieve all columns of the table employees in the Chinook database.

Run the above query in your notebook and find what Employees table has !

Customizing the Hello World of SQL Queries-
Congratulations on executing your first SQL query! Now we're going to figure out how to customize your query in order to:
Select specified columns from a table;
Select a specified number of rows;
Import column names from the database table.

Output-

Filtering our database records using SQL's WHERE-
We can now execute a basic SQL query to select records from any table in our database and we can also perform simple query customizations to select particular columns and numbers of rows.
We can filter any SELECT statement by any condition using a WHERE clause. This is called filtering our records.

Output-

Ordering our SQL records with ORDER BY-
We can also order our SQL query results. For example, if you wanted to get all records from the Employees table of the Chinook database and order them in increasing order by the column BirthDate.

Pandas and The Hello World of SQL Queries-
Here, we'll take advantage of the power of pandas to write the results of our SQL query to a DataFrame in one swift line of Python code!

Output-

Pandas for more complex querying-
Here, we'll become more familiar with the pandas function read_sql_query() by using it to execute a more complex query:
a SELECT statement followed by both a WHERE clause AND an ORDER BY clause.
We'll build a DataFrame that contains the rows of the Employee table for which the EmployeeId is greater than or equal to 6 and we'll order these entries by BirthDate.

Output-

Relationships between tables: INNER JOIN
Here, we'll perform our first INNER JOIN!
For each record in the Albums table, we'll extract the Title along with the Name of the Artist. The latter will come from the Artist table and so we will need to INNER JOIN these two tables on the ArtistID column of both.

Output-

Filtering our INNER JOIN-
Congrats on performing your first INNER JOIN! You're now going to finish this chapter with one final exercise in which you perform an INNER JOIN and filter the result using a WHERE clause.

Output-

Try the above examples in your notebook and explore other databases also!
In the next post we will learn CRUD operations in relational database.

Comments

Post a Comment

Popular posts from this blog

Machine Learning-Cross Validation & ROC curve

Another post starts with you beautiful people! Hope you enjoyed my previous post about improving your model performance by  confusion metrix . Today we will continue our performance improvement journey and will learn about Cross Validation (k-fold cross validation) & ROC in Machine Learning. A common practice in data science competitions is to iterate over various models to find a better performing model. However, it becomes difficult to distinguish whether this improvement in score is coming because we are capturing the relationship better or we are just over-fitting the data. To find the right answer of this question, we use cross validation technique. This method helps us to achieve more generalized relationships. What is Cross Validation? Cross Validation is a technique which involves reserving a particular sample of a data set on which we do not train the model. Later, we test the model on this sample before finalizing the model. Here are the steps involved in...

Relational Database in Python-CRUD operations

Another post starts with you beautiful people! Hope you have enjoyed and learn from my previous post about  Relational Database In Python-sqlite Previously, we used the Table object to reflect a table from an existing database , but what if you wanted to create a new table? We'd still use the Table object; however, we'd need to replace the autoload and autoload_with parameters with Column objects . The Column object takes a name, a SQLAlchemy type with an optional format, and optional keyword arguments for different constraints. After defining the table, we can create the table in the database by using the .create_all() method on metadata and supplying the engine as the only parameter. You can find more details about metadata here-  tell me more about metadata! Output:- Well done! When creating a table, it's important to carefully think about what data types each column should be. Constraints and Data Defaults- We're now going to practice creating a t...

My solution to HackerEarth's Identify the dance form challenge

Another post starts with you beautiful people! Today an interesting deep learning challenge is finished in  HackerEarth  and I got 91.17026 mAP score in the leader board. One drawback I see in HackerEarth is due to small dataset many participants manually prepare the submission files and show 100% score in the leader board. Many aspiring data scientists see this and become nervous. Even with getting score 75+, they become demotivated and leave their experiments in between the challenge. Also the winning approach is not disclosed after the challenge. With this post I will try to motivate my all aspiring data scientists and I will share my solution so that in their next challenge they can easily get 85+ score or even 92+ score :) Problem statement An event management company organized an evening of Indian classical dance performances to celebrate the rich, eloquent, and elegant art of dance. After the event, the company plans to create a micro site to promote and raise aw...