Another post starts with you beautiful people!
I hope you have enjoyed my previous post about Exploring The File Import where we learned about flat file import.
But there are a number of datatypes that cannot be saved easily to flat files, such as lists and dictionaries.
In this exercise we will deal with Pickle,Excel,SAS and HDF5 files.
First we will see how to import the pickle file-
In this exercise, we'll import the pickle package, open a previously pickled data structure from a file and load it.You can find more details about pickle here-what is pickle?
Run the above code snippet in your notebook and discover what Python datatype it yields.
Second we see how to load a excel file-
Whether you like it or not, any working data scientist will need to deal with Excel spreadsheets at some point in time. You won't always want to do so in Excel, however!
Here, we'll learn how to use pandas to import Excel spreadsheets and how to list the names of the sheets in any loaded .xls file.
Specifically, we'll be loading and checking out the spreadsheet 'PRIO Battle Deaths Dataset 3.1.xls', modified from the Peace Research Institute Oslo's (PRIO) dataset.
This data contains age-adjusted mortality rates due to war in various countries over several years.
There may be more than 1 sheet in a xls file so in the below code snippet we learn how to read each sheet-
Third we will see how to load a SAS file and plot the data-
In this exercise, we'll figure out how to import a SAS file as a DataFrame using SAS7BDAT and pandas. Tell me more about sas7bdat
Result-
Next we will learn how to load a HDF5 file-
In this exercise, we'll import it using the h5py library [tell me more]. We'll also print out its datatype to confirm we have imported it correctly.You can find the LIGO data used in on Signal Processing plus loads of documentation and tutorials here-LIGO Dataset
Result-
Extracting data from your HDF5 file-
In this exercise, we'll extract some of the LIGO experiment's actual data from the HDF5 file and we'll visualize it. You can find more about this type of file here- What is HDF5?
Result-
How to load a MATLAB file-
In this exercise, we'll figure out how to load a MATLAB file using scipy.io.loadmat().
This file contains gene expression data from the Albeck Lab at UC Davis. You can find the data and some great documentation here-download dataset
Run the above code in your notebook and discover what Python datatype it yields.
I hope you have enjoyed my previous post about Exploring The File Import where we learned about flat file import.
But there are a number of datatypes that cannot be saved easily to flat files, such as lists and dictionaries.
In this exercise we will deal with Pickle,Excel,SAS and HDF5 files.
First we will see how to import the pickle file-
In this exercise, we'll import the pickle package, open a previously pickled data structure from a file and load it.You can find more details about pickle here-what is pickle?
Run the above code snippet in your notebook and discover what Python datatype it yields.
Second we see how to load a excel file-
Whether you like it or not, any working data scientist will need to deal with Excel spreadsheets at some point in time. You won't always want to do so in Excel, however!
Here, we'll learn how to use pandas to import Excel spreadsheets and how to list the names of the sheets in any loaded .xls file.
Specifically, we'll be loading and checking out the spreadsheet 'PRIO Battle Deaths Dataset 3.1.xls', modified from the Peace Research Institute Oslo's (PRIO) dataset.
This data contains age-adjusted mortality rates due to war in various countries over several years.
There may be more than 1 sheet in a xls file so in the below code snippet we learn how to read each sheet-
Third we will see how to load a SAS file and plot the data-
In this exercise, we'll figure out how to import a SAS file as a DataFrame using SAS7BDAT and pandas. Tell me more about sas7bdat
Result-
Next we will learn how to load a HDF5 file-
In this exercise, we'll import it using the h5py library [tell me more]. We'll also print out its datatype to confirm we have imported it correctly.You can find the LIGO data used in on Signal Processing plus loads of documentation and tutorials here-LIGO Dataset
Extracting data from your HDF5 file-
In this exercise, we'll extract some of the LIGO experiment's actual data from the HDF5 file and we'll visualize it. You can find more about this type of file here- What is HDF5?
Result-
How to load a MATLAB file-
In this exercise, we'll figure out how to load a MATLAB file using scipy.io.loadmat().
This file contains gene expression data from the Albeck Lab at UC Davis. You can find the data and some great documentation here-download dataset
Run the above code in your notebook and discover what Python datatype it yields.
Comments
Post a Comment