This tutorial is adapted from Web Age course Applied Data Science with Python.
This tutorial provides quick overview of Python modules and high-power features, NumPy library, pandas library, SciPy library, scikit-learn library, Jupyter notebooks and Anaconda distribution.
1.1 Importing Modules
Command
|
Comments |
from lib import x
from lib import x as y
|
Importing a single object from a module(lib) allows direct referencing of the object, e.g
x() or y() Introduces potential variable name collision with other module imports |
from lib import * |
Not clean — potential variable name clobbering |
import lib as alias | You import the whole module giving it an alias; use the ‘.’ dot prefix to access objects in it:
alias.x() That’s the preferred way to import modules |
1.2 Listing Methods in a Module
dir(module_alias), e.g. dir(np) or dir('module_name'), e.g. dir('sys')
1.3 Creating Your Own Modules
# my_utils.py def foo(a, b): return a + b
├── my_library # directory │ ├── __init__.py # empty file, without it, Python will not allow import from a directory │ ├── my_utils.py # your module (def …)
from my_library import my_utils my_utils.foo(4, 2)
def foo(a, b): return a + b if __name__ == "__main__": # Application entry point is here
1.4 Random Numbers
import random # can alias the import random.random() # a float in a range of [0, 1) random.randint(x,y) # an integer in a range of [x,y]
Notes:
Python uses the Mersenne Twister as the pseudo-random number generator. It produces 53-bit precision floats and has a generation cycle of 2**19937-1 (the cycle is 6002 digits long). The underlying implementation in C, which is both fast and threadsafe.
1.5 Zipping Lists
a = [1,2,3,4,5] b = [10,20,30,40,50] [str(x) + ':' + str(y) for x, y in zip(a,b)] Output: ['1:10', '2:20', '3:30', '4:40', '5:50']
1.6 List Comprehension
y = 10 [ i**2 + y for i in range (1, 20) if (i % 2 == 0) ] Output: [14, 26, 46, 74, 110, 154, 206, 266, 334]
1.7 Python Data Science-Centric Libraries
1.8 NumPy
1.9 NumPy Arrays
import numpy as np
# Aliasing it as np – the accepted convention
# Simple arrays:
a1 = np.array ([1,2,3,4,5])
# Takes in a Python list as input and outputs a numpy.ndarray object
a2 = np.arange(5)
# A numpy.ndarray object holding [0, 1, 2, 3, 4]
# A matrix structure (note the nested brackets)
m = np.array ([[1,2,3,4], [5,6,7,8]])
m.shape
Output: (2, 4)
1.10 Select NumPy Operations
10 * np.arange(5) + np.arange(5) Output: array([ 0, 11, 22, 33, 44]) np.cos( np.pi * np.array([0,1,2,3])) Output: array([ 1., -1., 1., -1.])
a = np.array ([1,2,3,4,5]) a [a % 2 == 0] Output: array([2, 4])
m = np.array ([1,2,3,4,5,6,7,8]).reshape(4,2) Output: a 4x2 matrix
1.11 SciPy
Notes:
The SciPy module includes the following sub-packages:
constants: physical constants and conversion factors (since version 0.7.0[5]) cluster: hierarchical clustering, vector quantization, K-means fftpack: Discrete Fourier Transform algorithms integrate: numerical integration routines interpolate: interpolation tools io: data input and output lib: Python wrappers to external libraries linalg: linear algebra routines misc: miscellaneous utilities (e.g. image reading/writing) ndimage: various functions for multi-dimensional image processing optimize: optimization algorithms including linear programming signal: signal processing tools sparse: sparse matrix and related algorithms spatial: KD-trees, nearest neighbors, distance functions special: special functions stats: statistical functions weave: tool for writing C/C++ code as Python multiline strings
1.12 pandas
1.13 Creating a pandas DataFrame
import pandas as pd import numpy as np # You build a DataFrame as a matrix (array of arrays) m = np.array ([1,2,3,4,5,6,7,8]).reshape(4,2) df = pd.DataFrame(m, columns = ["Col1", "Col2"]) # Now you have this structure: Col1 Col2 0 1 2 1 3 4 2 5 6 3 7 8
1.14 Fetching and Sorting Data
df.Col1 # or df['Col1'] # Col1 values with the their indexes as the left-most column Output: 0 1 1 3 2 5 3 7 Name: Col1, dtype: int32 df.iloc[1, :] # Getting the second row via its index Output: Col1 3 Col2 4 Name: 1, dtype: int32 df.Col2.sort_values(ascending=False) # Sorting values in descending order (ascending order is default) Output: 3 8 2 6 1 4 0 2 Name: Col2, dtype: int32
1.15 Scikit-learn
1.16 Matplotlib
1.17 Python Dev Tools and REPLs
1.18 IPython
1.19 Jupyter
jupyter notebook
Notes:
The name Jupyter is an indirect acronym of the three core languages it was designed for: JUlia, PYThon, and R.
1.20 Jupyter Operation Modes
1.21 Jupyter Common Commands
Notes:
You can preview and edit the command shortcuts by navigating to Help > Edit Keyboard Shortcuts using the menu bar. Unfortunately, for now, Jupyter does not support macros / scripting.
1.22 Anaconda
Notes:
The conda package manager supports the following commands:
clean | Remove unused packages and caches |
config | Modify configuration values in .condarc. This is modeled after the git config command. Writes to the user .condarc file (C:\Users\Mikhail\.condarc) by default |
create |
Create a new conda environment from a list of specified packages
|
help | Displays a list of available conda commands and their help strings |
info | Display information about current conda install |
install | Installs a list of packages into a specified conda environment |
list | List linked packages in a conda environment |
package | Low-level conda package utility. (EXPERIMENTAL) |
remove | Remove a list of packages from a specified conda environment |
uninstall | Alias for conda remove. See conda remove –help. |
search | Search for packages and display associated information. The input is a MatchSpec, a query language for conda packages. |
update | Updates conda packages to the latest compatible version. This command accepts a list of package names and updates them to the latest versions that are compatible with all other packages in the environment. Conda attempts to install the newest versions of the requested packages. To accomplish this, it may update some packages that are already installed, or install additional packages. To prevent existing packages from updating, use the –no-update-deps option. This may force conda to install older versions of the requested packages, and it does not prevent additional dependency packages from being installed. If you wish to skip dependency checking altogether, use the ‘–force’ option. This may result in an environment with incompatible packages, so this option must be used with great caution |
upgrade | Alias for conda update. See conda update –help |