{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab 1 - Basic iPython Tutorial\n", "\n", "Authors:\n", "\n", "v1.0 (2014 Fall) Rishi Sharma \\*\\*\\*, Sahaana Suri \\*\\*\\*, Kangwook Lee \\*\\*\\*, Kannan Ramchandran \\*\\*\\* \n", "v1.1 (2015 Fall) Kabir Chandrasekher \\*\\*, Max Kanwal \\*\\*, Kangwook Lee \\*\\*\\*, Kannan Ramchandran \\*\\*\\* \n", "v1.2 (2016 Spring) Ashvin Nair \\*, Kabir Chandrasekher \\*\\*, Kangwook Lee \\*\\*\\*, Kannan Ramchandran \\*\\*\\*\n", "\n", "modified from Berkeley Python Bootcamp 2013 https://github.com/profjsb/python-bootcamp\n", "\n", "and Python for Signal Processing http://link.springer.com/book/10.1007%2F978-3-319-01342-8\n", "\n", "and EE 123 iPython Tutorial http://inst.eecs.berkeley.edu/~ee123/sp14/lab/python_tutorial.ipynb\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## General iPython Notebook usage instructions (overview)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Start by clicking `Help >> User Interface Tour` to get yourself familiar with the iPython Notebook environment.\n", "- Click the `Play` button to run and advance a cell. The short-cut for it is `shift-enter`.\n", "- To add a new cell, either select `\"Insert >> Insert New Cell Below\"` or click the `Plus` button.\n", "- You can change the cell mode from code to text in the pulldown menu. Use `Markdown` for writing text.\n", "- You can change the text in `Markdown` cells by double-clicking it. The short-cut for this is `enter`.\n", "- To save your notebook, either select `\"File >> Save and Checkpoint\"` or hit `Command-s` for Mac and `Ctrl-s` for Windows.\n", "- To undo edits within a cell, hit `Command-z` for Mac and `Ctrl-z` for Windows.\n", "- `Help >> Keyboard Shortcuts` has a list of all useful keyboard shortcuts.\n", "- The `Help` menu also has links to many reference docs you may find useful this semester (e.g. Markdown, Python, NumPy, Matplotlib, SciPy)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installing Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Follow the instructions on the class website to install python (if you're reading this, you've probably already done that):\n", "\n", "http://ipython.org/install.html\n", "\n", "Make sure you install the notebook dependencies for iPython in addition to the basic package" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tab Completion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One useful feature of iPython is tab completion:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = 1\n", "y = 2\n", "x_plus_y = x+y\n", "\n", "# type `x_` then hit TAB to auto-complete the variable\n", "# then press shift + enter to run the cell\n", "print x_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Help" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another useful feature is the help command. Type any function followed by `?` and run the cell to return a help window. Hit the `x` button to close it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "abs?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Floats and Integers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Doing math in Python is easy, but note that there are `int` and `float` types in Python. Integer division returns the floor. Always check this when debugging!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "59 / 87" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "59 / 87.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The division operator ('`/`') was changed from floor division to float division in the update from Python 2.x to Python 3.x. So if you import division from the future, then everything works fine." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from __future__ import division\n", "\n", "59 / 87" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Strings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Double quotes and single quotes are the same thing. \n", "- `'+'` concatenates strings." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# This is a comment\n", "\"Hi \" + 'Bye'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Printing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some fancy ways of printing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "numStudents = 83" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"There are currently %d students enrolled in EE 126.\" % numStudents" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"At least {0} of you will probably end up dropping. That is {1}%.\"\\\n", " .format(15, 15.0/numStudents * 100 )" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"Good Luck! Prepare to work hard and learn a lot of cool stuff!\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A list is a mutable array of data, i.e. it can constantly be modified. See http://stackoverflow.com/questions/8056130/immutable-vs-mutable-types-python for more info. If you are not careful, using mutable data structures can lead to bugs in code that passes common data to many different functions.\n", "\n", "Important functions: \n", "- Created a list by using square brackets `[ ]`.\n", "- `'+'` appends lists. \n", "- `len(x)` gets the length of list `x`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = [1, 2, \"asdf\"] + [4, 5, 6]\n", "\n", "print x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print len(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tuples" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A tuple is an immutable list. They can be created using round brackets ( ). \n", "\n", "They are usually used as inputs and outputs to functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "t = (1, 2, \"asdf\") + (3, 4, 5)\n", "print t" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# cannot do assignment\n", "t[0] = 10\n", "\n", "# errors in ipython notebook appear inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Arrays (NumPy)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A numpy array is like a list with multidimensional support and more functions. We will be using it a lot.\n", "\n", "Arithmetic operations on numpy arrays correspond to elementwise operations. \n", "\n", "Important functions:\n", "\n", "- `.shape` returns the dimensions of the array\n", "\n", "- `.ndim` returns the number of dimensions. \n", "\n", "- `.size` returns the number of entries in the array\n", "\n", "- `len()` returns the first dimension\n", "\n", "\n", "To use functions in numpy, we have to import numpy to our workspace. This is done by the command `import numpy`. By convention, we rename `numpy` as `np` for convenience." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np # by convention, import numpy as np\n", "\n", "x = np.array( [ [1, 2, 3], [4, 5, 6] ] )\n", "\n", "print x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"Number of dimensions:\", x.ndim" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"Dimensions:\", x.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"Size:\", x.size" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"Length:\", len(x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "a = np.array([1, 2, 3])\n", "\n", "print \"a=\", a\n", "\n", "print \"a*a=\", a * a #element-wise arithmetic" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "b = np.array(np.ones((3,3)))*2\n", "print \"b=\\n\", b\n", "c = np.array(np.ones((3,3)))\n", "print \"c=\\n\", c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Multiply elementwise:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"b*c =\\n\", b*c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now multiply as matrices (not arrays):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"b*c =\\n\", np.dot(b,c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we can just convert to or create a matrix instead of an array and then use normal multiplication:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"b*c =\\n\", np.matrix(b)*np.matrix(c)\n", "\n", "d = np.matrix([[1,1j,0],[1,2,3]])\n", "e = np.matrix([[1],[1j],[0]])\n", "\n", "print \"d*e =\\n\", d*e" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Slicing for numpy arrays" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Numpy uses pass-by-reference semantics so it creates views into the existing array, without implicit copying. This is particularly helpful with very large arrays because copying can be slow." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = np.array([1,2,3,4,5,6])\n", "print x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We slice an array from a to b-1 with `[a:b]`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y = x[0:4]\n", "print y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since slicing does not copy the array, changing `y` changes `x`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "y[0] = 7\n", "print x\n", "print y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To actually copy x, we should use .copy():" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = np.array([1,2,3,4,5,6])\n", "y = x.copy()\n", "y[0] = 7\n", "print x\n", "print y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Commonly used Numpy functions: r\\_ and c\\_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use `r_` to create integer sequences\n", "\n", "`r_[0:N]` creates an array listing every integer from 0 to N-1\n", "\n", "`r_[0:N:m]` creates an array listing every `m` th integer from 0 to N-1 " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from numpy import r_ # import r_ function from numpy directly, so that we can call r_ directly instead of np.r_\n", "\n", "print r_[-5:5] # every integer from -5 ... 4\n", "\n", "print r_[0:5:2] # every other integer from 0 ... 4\n", "\n", "print abs( r_[-5:5] )\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "r\\_ stands for row concatenation, and the function r\\_[-5:5] is saying to row concatenate every element generated in the range [-5:5]. This is just one use case for row concatenation, but as you can imagine there are many others. The same goes for its cousin the c\\_ function, which performs a column concatenation." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from numpy import r_\n", "from numpy import c_\n", "\n", "row1 = [[1,0,0]]\n", "row2 = [[0,1,0]]\n", "row3 = [[0,0,1]]\n", "\n", "# we want to stack these three rows to create a 3x3 identity matrix\n", "# this is where the r_ function comes in handy\n", "\n", "print np.r_[row1,row2,row3] # 3x3 identity matrix appending vectors as rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What would have happened above if you had used `c_` instead of `r_` on `row1`, `row2`, and `row3`? \n", "Try it in the box below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print np.c_[# fill in # ] # vector created by appending elements as new columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some more examples:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "X = np.eye(3) # 3x3 Identity Matrix\n", "Y = np.ones([3,3]) # 3x3 Matrix of ones\n", "print \"X = \"\n", "print X\n", "print \"Y = \"\n", "print Y\n", "\n", "Z = r_[X,Y] # concatenate y to x as rows\n", "print \"\\n Row Concatenated [X ; Y] : \"\n", "print Z\n", "\n", "W = c_[X,Y] # concatenate y to x as columns\n", "print \"\\n Column Concatenated [X Y] : \\n \"\n", "print W" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plotting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this class we will use `matplotlib.pyplot` to plot signals and images.\n", "\n", "To begin with, we import `matplotlib.pyplot` as `plt` (again for convenience)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt # by convention, we import pyplot as plt\n", "from numpy import r_ # import r_ function from numpy\n", "\n", "\n", "x = r_[:1:0.01] # if you don't specify a number before the colon, the starting index defaults to 0\n", "a = np.exp( -x )\n", "b = np.sin( x*10.0 )/4.0 + 0.5\n", "\n", "# plot in browser instead of opening new windows\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`plt.plot(x, a)` plots `a` against `x`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure()\n", "plt.plot( x, a )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you started a figure, you can keep plotting to the same figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure()\n", "plt.plot( x, a, 'green' )\n", "plt.plot( x, b )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To plot different plots, you can create a second figure" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure()\n", "plt.plot( x, a )\n", "plt.figure()\n", "plt.plot( x, b )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To label the axes, use `plt.xlabel()` and `plt.ylabel()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure()\n", "plt.plot( x, a )\n", "plt.plot( x, b )\n", "\n", "plt.xlabel( \"time\" )\n", "plt.ylabel( \"space\" )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also add title and legends using `plt.title()` and `plt.legend()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure()\n", "plt.plot( x, a )\n", "plt.plot( x, b )\n", "plt.xlabel( \"time\" )\n", "plt.ylabel( \"space\" )\n", "\n", "plt.title( \"Most important graph in the world\" )\n", "\n", "plt.legend( (\"e^-x\", \"sin(x)\") )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many options you can specify in `plot()`, such as color and linewidth. You can also change the axis using `plt.axis`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure()\n", "plt.plot( x, a ,':r',linewidth=20)\n", "plt.plot( x, b ,'--k')\n", "plt.xlabel( \"time\" )\n", "plt.ylabel( \"space\" )\n", "\n", "plt.title( \"Most important graph in the world\" )\n", "\n", "plt.legend( (\"blue\", \"red\") )\n", "\n", "plt.axis( [0, 4, -2, 3] )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are many other plotting functions. For example, we will use `plt.imshow()` for showing images and `plt.stem()` for plotting discretized signal" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# image\n", "plt.figure()\n", "\n", "data = np.outer( a, b ) # plotting the outer product of a and b\n", "\n", "plt.imshow(data)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# stem plot\n", "plt.figure()\n", "plt.stem(x[::5], a[::5]) # subsample by 5" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# xkcd style plots \n", "# Note: requires matplotlib version 1.3.1 or higher\n", "plt.xkcd()\n", "plt.plot( x, a )\n", "plt.plot( x, b )\n", "plt.xlabel( \"time\" )\n", "plt.ylabel( \"space\" )\n", "\n", "plt.title( \"Most important graph in the world\" )\n", "\n", "plt.legend( (\"blue\", \"red\") )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### To turn off xkcd style plotting, restart the kernel or run the command `plt.rcdefaults()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Logic" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### For-Loop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indentation matters in Python. Everything indented belongs to the loop:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for i in [4, 6, \"asdf\", \"jkl\"]:\n", " print i\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for i in r_[0:10]:\n", " print i" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### If-Else" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Same goes for If-Else:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "if 1 != 0:\n", " print \"1 != 0\"\n", "elif 1 == 0: \n", " print \"1 = 0\"\n", "else:\n", " print \"Huh?\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Random Library" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*The numpy random library should be your resource for all Monte Carlo simulations which require generating instances of random variables.*\n", "\n", "The documentation for the library can be found here: http://docs.scipy.org/doc/numpy/reference/routines.random.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function $\\operatorname{rand}()$ can be used to generates a uniform random number in the range $[0,1)$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from numpy import random\n", "\n", "print random.rand() # random number\n", "print random.rand(5) # random vector\n", "print random.rand(3,3) # random matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how we can use this to generate a fair coin toss (i.e. a discrete $\\mathcal{B}\\left(\\frac{1}{2}\\right)$ random variable)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = round(random.rand()) # Bernoulli(1/2) random variable\n", "print x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's generate several fair coin tosses and plot a histogram of the results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "k = 100\n", "x1 = [round(random.rand()) for _ in xrange(k)]\n", "plt.figure()\n", "plt.hist(x1)\n", "\n", "# we could also use numpy's round function to element-wise round the vector\n", "x2 = np.round(random.rand(k))\n", "plt.figure()\n", "plt.hist(x2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do something similar for several other distributions, and allow the histogram to give us a sense of what the distribution looks like. As we increase the number of samples we take from the distribution $k$, the more and more our histogram looks like the actual distribution." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "k = 1000\n", "\n", "discrete_uniform = random.randint(0,10,size=k) # k discrete uniform random variables between 0 and 9\n", "plt.figure(figsize=(6,3))\n", "plt.hist(discrete_uniform)\n", "plt.title('discrete uniform')\n", "\n", "continuous_uniform = random.rand(k)\n", "plt.figure(figsize=(6,3))\n", "plt.hist(continuous_uniform)\n", "plt.title('continous uniform')\n", "\n", "std_normal = random.randn(k) # randn generates elements from the standard normal\n", "plt.figure(figsize=(6,3))\n", "plt.hist(std_normal)\n", "plt.title('standard normal')\n", "\n", "# To generate a normal distribution with mean mu and standard deviation sigma, we must mean shift and scale the variable\n", "mu = 100\n", "sigma = 40\n", "normal_mu_sigma = mu + random.randn(k)*sigma\n", "plt.figure(figsize=(6,3))\n", "plt.hist(normal_mu_sigma)\n", "plt.title('N(' + str(mu) + ',' + str(sigma) + ')')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "^ We could do this all day with all sorts of distributions. I think you get the point." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Specifying a Discrete Probability Distribution for Monte Carlo Sampling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function takes $n$ sample from a discrete probability distribution specified by the two arrays `distribution` and `values`.\n", "\n", "As an example, let us suppose a random variable $X$ follows the following distribution:\n", "\n", "$$\n", "X = \\begin{cases} 1 \\ \\text{w/ probability 0.1} \\\\ 2 \\ \\text{w/ probability 0.4} \\\\ 3 \\ \\text{w/ probability 0.2} \\\\ 4 \\ \\text{w/ probability 0.2} \\\\ 5 \\ \\text{w/ probability 0.05} \\\\ 6 \\ \\text{w/ probability 0.05} \\end{cases}\n", "$$\n", "\n", "Then we would have:\n", "`distribution` = $\\begin{bmatrix} 0.1 , 0.4 , 0.2 , 0.2 , 0.05 , 0.05 \\end{bmatrix}$ and\n", "`values` = $\\begin{bmatrix} 1, 2, 3, 4, 5, 6 \\end{bmatrix}$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def nSample(distribution, values, n):\n", " if sum(distribution) != 1:\n", " distribution = [distribution[i] / sum(distribution) for i in range(len(distribution))]\n", " rand = [random.rand() for i in range(n)]\n", " rand.sort()\n", " samples = []\n", " samplePos, distPos, cdf = 0, 0, distribution[0]\n", " while samplePos < n:\n", " if rand[samplePos] < cdf:\n", " samplePos += 1\n", " samples.append(values[distPos])\n", " else:\n", " distPos += 1\n", " cdf += distribution[distPos]\n", " return samples" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# collect k samples from X and plot the histogram\n", "samplesFromX = nSample([0.1, 0.4, 0.2, 0.2, 0.05, 0.05], [1, 2, 3, 4, 5, 6], k)\n", "plt.hist(samplesFromX)\n", "plt.ylim((0,1000))\n", "print \"Wow, if we normalized the y-axis that would be a PMF. Incredible! I should try that.\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Normalize the samples from X to plot the probability mass function below:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## $\\mathcal{Q}$uestion 1: Sampling and Plotting a Binomial Random Variable\n", "\n", "A binomial random variable $X \\sim \\text{Bin}(n,p)$ can be thought of as the number of heads in $n$ coin flips where each flip has probability $p$ of coming up heads. We can equivalently think of it as the sum of $n$ Bernoulli random variables: $X = \\sum_{i=1}^{n}X_i$ where $X_i \\sim \\text{Bernoulli}(p)$. \n", "\n", "In this question, you will put your new plotting skills to work and sample the values of a binomial random variable. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def plot_binomial(n, trials=[10,50,100,1000,10000], p_values=[0.1,0.2,0.5,0.8]):\n", " \"\"\"\n", " On different figures, plot a histogram of the results of the given\n", " number of trials of the binomial variable with parameters n and p for all\n", " values in the given list.\n", " \"\"\"\n", " \n", "plot_binomial(100) #Feel free to play around with other values of n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that you have plotted many values of a few different binomial random variables, do the results coincide with what you expect them to?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## $\\mathcal{Q}$uestion 2: Monty Hall Proof and Simulation\n", "\n", "Simulate the Monty Hall problem for the case when you do switch doors and when you don't. Run the simulation 100,000 times for each case and determine the simulated probability of winning given each strategy. You can use the function below as a guide (or not). Make sure your simulation results match the analytical result you expect. If you need a refresher on the Monty Hall problem, the following video may be of some help. Additionally, provide a rigorous proof of the probability of winning the prize given a strategy of switching doors vs. a strategy of staying put." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from IPython.display import YouTubeVideo\n", "YouTubeVideo('Zr_xWfThjJ0')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulate the Monty Hall problem below:\n", "A version of the simulation that does not use NumPy at all is given. Your job is to re-implement the simulation using as many functions from the NumPy library as you can!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from __future__ import division\n", "import time\n", "\n", "def MontyHall(switch, n):\n", " # switch is a boolean which tells you whether or not to switch from the original door chosen\n", " # n is the number of times you want to simulate the action\n", " results = []\n", " for trial in range(n):\n", " car_door = int(random.random()*3+1)\n", " my_door = int(random.random()*3+1)\n", " other_goat_doors = [i for i in range(1,4) if (i!=car_door and i!=my_door)]\n", " if len(other_goat_doors)==1:\n", " revealed_door = other_goat_doors[0]\n", " else:\n", " revealed_door = other_goat_doors[random.randint(0,1)]\n", " if switch:\n", " my_door = [i for i in range(1,4) if (i!=my_door and i!=revealed_door)][0]\n", " else:\n", " my_door = my_door\n", " result = my_door == car_door\n", " results.append(result)\n", " return sum(results)/n\n", " \n", " \n", "def MontyHall_numpy(switch, n):\n", " # switch is a boolean which tells you whether or not to switch from the original door chosen\n", " # n is the number of times you want to simulate the action\n", " \n", " # Your beautiful code here... #\n", " pass\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "start = time.clock()\n", "print \"Probability of Winning if You Switch Doors: \", MontyHall(True,100000)\n", "print \"Probability of Winning if You Don't Switch: \" , MontyHall(False,100000)\n", "end = time.clock()\n", "total1 = end - start" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "start = time.clock()\n", "print \"Probability of Winning if You Switch Doors: \", MontyHall_numpy(True,100000)\n", "print \"Probability of Winning if You Don't Switch: \" , MontyHall_numpy(False,100000)\n", "end = time.clock()\n", "total2 = end - start" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print \"w/o NumPy:\\t %f s\\nw/ NumPy:\\t %f s\" %(total1,total2)\n", "print \"Total Speedup: \" + str(total1/total2) + \"x\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For reference, our Monte Carlo simulation using NumPy was actually slower than than without NumPy. Why should we bother with NumPy at all then?\n", "\n", "While many of you may not have used NumPy previously, or not see the purpose in learning this library, we strongly urge you to force yourself to use NumPy as much as possible while doing virtual labs. It will benefit you significantly in this class. Furthermore, NumPy is widely used in industry and academic research, and so it will benefit you greatly to become comfortable with it this semester! NumPy (and using matrix operations rather than loops) is your friend when it comes to dealing with lots of data or doing elaborate simulations.\n", "\n", "In this simple simulation there was basically no manipulation of data, and only one loop. If you find yourself using many loops or list comprehensions to process data, think about using NumPy. :)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Provide a rigorous proof for the results you observed:" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }