Skip to main content

Understanding the keras workflow with Google Colaboratory


Another post starts with you beautiful people!
Hope you have learnt the core concepts of Deep Learning from my previous post. If not please visit once because it is required before creating our first keras model. Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Keras workflow has following four steps- Specify the architecture, Compile the model, Fit the model, Predict. Let's understand how we can achieve each steps-
  1. Specify the architecture:- In the first step you define the architecture of your model like- how many layers do you want? how many nodes in each layer? what activation function do you want to use?
  2. Compile the model:- This step specifies the loss function and some details about optimization.
  3. Fit the model:- This step is the cycle of Backpropagation and model optimization of model weights with your data.
  4. Predict:- In this last step you use your model to make predictions.
Now we will explore each step with respect to code. Hope you have setup your Google Colab notebook as mentioned in previous step. Open the notebook and import the required basic libraries as below-

The other two imports besides pandas and numpy are keras libraries to build our model. Next, we load a dataset in Colab and then we will read to understand the next three steps. You can upload any local dataset using following code-

Next, we will read the dataset so that we can find the number of the nodes in the input layer and then we need to specify how many columns are in the input when building a keras model because this is the number of nodes in the input layer-

Next, we will start to build our model, The first line of model specification is initializing the Sequential model. For this step we will use Sequential() that is a linear stack of layers. The sequential models require that each layer has weight only to the one layer coming directly after it in the network diagram. Then we start adding layers using .add() method of the model. Here the standard layer type is Dense layer. It is called Dense because all the nodes in the previous layers connect to all the nodes in the current layer. In each layer we define first argument as number of nodes, then an activation function, then input shape-

In input_shape argument, we are passing number of columns followed by comma and then blank value which means there can be any number of rows or data points. Here the last layer has only 1 node because this is the output layer and it matches those diagrams where we ended with only a single node as the output or the prediction of the model. So in our model there are two hidden layers and an output layer. Here in the hidden layers we are using 50 or 32 nodes for example, but you can put any larger number of nodes here and keras will do all the maths for you! So don''t afraid and try with bigger network.

Next, we will compile and fit the model. The compile() method has two arguments- first one is the optimizer which basically controls the learning rate. There are few algorithms also which can select optimized learning rate automatically. 'Adam' is one the best algorithm for that task. Second argument is the loss function. We have seen 'mean squared error' is a common choice for a regression problem-

After compiling we can fit our model using fit() method. Here fitting the model means keras is applying Backpropagation and gradient descent with our data to update the weights-

When you run the above cells, you will see the output showing some optimization progress like below-
Now change the number of nodes to 150 in each hidden layer and see the loss function value in output. That's it! You now know how to specify, compile, and fit a deep learning model using keras!

Now we will learn how can we use keras for a classification problem? There are some changes required to use keras for a classification problem like the loss function here we use the most common classification loss function- 'categorical_crossentropy', we add a metric like 'accuracy' to print the accuracy score at the end of the each epoch, we change the activation function as 'softmax' to interpret the predictions as probabilities. To understand each change we will apply keras model to a classification problem dataset-


In this problem our goal is take information about the passengers and predict which ones survived. So we will separate the target variable from the dataframe, convert this target from  a class vector (integers) to binary class matrix using keras's to_categorical and then we will perform all four basic steps required for keras-

Here we are using 'Stochastic Gradient Descent' (SGD) as optimizer which you have learnt in last post. In output layer we are using number of nodes as 2 because our prediction class is two- survived or not survived. You can try with more number of nodes in hidden layer.

We have learnt how to use keras for classification as well as for regression problem. Now we will learn how can we save,reload our model and then make prediction for a new data. For saving purpose keras has save() method and for reloading your saved model, keras model has load_model api. For making prediction, keras model has predict() method-

Great! Now it's time to learn how to use different learning rates and select best one to optimize our model. Although we are using optimizer algorithms like SGD in our our model compilation yet there may be a situation where your model is not improving any more at some time. This situation is known as dying neuron problem. To solve this problem you should try with another optimizer algorithm. We will use same titanic dataset but here we will create a function get_new_model() to create an unoptimized model to optimize shown as below-

Next, we will use this function, iterate over the learning rate list and compile & fit model-

Once you run the above cell, you can see your model accuracy for all three learning rates we have mentioned as below-

You can see that with learning rate 0.01, loss is minimum! So in this way you can select appropriate learning rate.
If you remember we had cross validated (for example: k-fold) our training data before training our various machine learning models. In deep learning instead of cross validation we do validation split because in general deep learning is used on large data and repeated training from cross validation would take longer time to train our model. keras make this step easy for us. We just need to use keyword argument 'validation_split' with fit() method. So for a classification problem code will be like below-

In general we should keep training our model until it stops to improve anymore. keras provide a way to do this with the help of 'Early Stopping'. For early stopping we need to set early stopping monitor before the fitting of the model. This early stopping monitor takes one argument- 'patience' which is how many epochs a model can go without improving before we stop training. Generally 2 or 3 is a good choice for patience. You can pass this argument with 'callbacks' argument under fit() method as below-

Default value of epochs is 10, here we are passing as 30 because optimization will automatically stop when it is no longer helpful, it is okay to specify the maximum number of epochs as 30 rather than using the default of 10.

Always remember creating a great model in deep learning requires experimentation. So start doing it with different architectures, more or fewer layers etc. In next post we will work on a image classification problem and we will use keras library to solve this image problem. Till then Go chase your dreams, have an awesome day, make every second count and see you later in my next post.

Comments

Post a Comment

Popular posts from this blog

YOLObile- a new state of the art Real-Time Object Detection model for Mobile Devices

  Another post starts with you beautiful people! Thanks for giving so many views on my previous post 👍. I am glad to see my previous posts are helping people to use state of the art object detection and recognition deep learning model in their projects. If you are new to my blog, I recommend seeing once my previous posts, and you will not be disappointed if your goal is to learn applied computer vision free of cost. Continuing my journey of sharing knowledge in this post I am going to share with you a new state of the art framework for object detection on mobile devices-  YOLObile  📱 There has been a trade-off between speed and the accuracy of object detections. For example, the state of the art,  YOLOv4 model gives us a very accurate detection but its speed is slow if we want to use it on a mobile device. On the other hand, its lighter version YOLOv4-tiny works very fast on a mobile device but its accuracy reduces. For a detailed comparison of FPS vs mAP you can ...

Learn the fastest way to build data apps

Another post starts with you beautiful people! I hope you have enjoyed and learned something new from my previous three posts about machine learning model deployment. In one post we have learned  How to deploy a model as FastAPI?  I n the second post, we have learned  How to deploy a deep learning model as RestAPI ? and in the third post, we have also learned  How to scale your deep learning model API?   If you are following my blog posts, you have seen how easily you have transit yourselves from aspiring to a mature data scientist. In this new post, I am going to share a new framework-  Streamlit which will help you to easily create a beautiful app with Python only. I will show here how had I used the Streamlit framework to create an app for my YOLOv3 custom model. What is Streamlit? Streamlit’s open-source app framework is the easiest way for data scientists and machine learning engineers to create beautiful, performant apps in only a few hours!...

Can you build a model to predict toxic comments?

Another post starts with you beautiful people! Hope you have learnt something new and very powerful machine learning model from my previous post-  How to use LightGBM? Till now you must have an idea that there is no any area left that a machine learning model cannot be applied; yes it's everywhere! Continuing our journey today we will learn how to deal a problem which consists texts/sentences as feature. Examples of such kind of problems you see in internet sites, emails, posts , social media etc. Data Scientists sitting in industry giants like Quora, Twitter, Facebook, Google are working very smartly to build machine learning models to classify texts/sentences/words. Today we are going to do the same and believe me friends once you do some hand on, you will be also in the same hat. Challenge Link :  jigsaw-toxic-comment-classification-challenge Problem : We’re challenged to build a multi-headed model that’s capable of detecting different types of toxicity like thre...