hatena

<body>
*1320221317*Draw a scatter plot in NumPy.
I had been neglecting the NumPy + Matplotlib environment because I was having trouble setting it up, but now that I can't leave it alone any longer, I decided to take the approach of "setting clear goals and learning only the minimum necessary to achieve them in a time trial".

I have ipython, numpy and matplotlib installed. However, the matplotlib tutorial doesn't show anything when I hit show, so I'm not sure if it's installed or not. Last time I did that much and left it. <a href='http://matplotlib.sourceforge.net/users/pyplot_tutorial.html'>Pyplot tutorial &#8212; Matplotlib v1.1.0 documentation</a>

First, a quick look at the NumPy tutorial. <a href='http://www.scipy.org/Tentative_NumPy_Tutorial#head-864862d3f2bb4c32f04260fac61eb4ef34788c4c'>Tentative NumPy Tutorial -</a>

ndarray is the parent class of matrix, which is an N-dimension array. ndarray is created by array. import * and I don't like it because it makes it hard to know what came from where, but I'll follow it anyway.
>|python|
In [1]: from numpy import *

In [2]: array([1, 2, 3])
Out[2]: array([1, 2, 3])
||<

It is an N-dimensional array, which means that it is extended to take values in xs[3, 3], just as Python's ordinary sequence type can take values in xs[3]. And just as xs[:] can, xs[:, :] is valid. interesting.

I tried array([1, 2, 3]).transpose() to test vector operations, but it didn't work as expected.
>|python|
In [3]: x = array([1, 2, 3])

In [4]: x.transpose()
Out[4]: array([1, 2, 3])
||<
You can see why by looking at x.shape. If you think that matrices have a two-dimensional structure and vectors have a one-dimensional structure, you are wrong. So I reshaped (3,) from a one-dimensional structure to (1, 3) and now it behaves as expected.
>||
In [5]: x.shape
Out[5]: (3,)

In [6]: x.reshape(1, 3)
Out[6]: array([[1, 2, 3]])

In [7]: _.transpose()
Out[7]: 
array([[1],
       [2],
       [3]])
||<

I know that using matrix would solve the problem, but there are rumors that matrix is full of traps, so I decided to avoid it. I wonder what exactly are the traps. I wonder if c_ or r_ would make it easier.

>|python|
In [14]: c_[[1, 2, 3]]
Out[14]: 
array([[1],
       [2],
       [3]])
||<

When you do an operation on an array and a scalar, the scalar is broadcast to each element of the array, and you can pass an array as the index of the array, which is selected if the array is an int, or selected if the array is a bool, where True is selected. And you can do destructive assignments to it. Cool.

>|python|
In [19]: x = array([1, 2, 3])

In [20]: x > 1
Out[20]: array([False,  True,  True], dtype=bool)

In [21]: x + 1
Out[21]: array([2, 3, 4])

In [22]: x[x % 2 == 1] -= 1

In [23]: x
Out[23]: array([0, 2, 2])
||<

x.view() is shallow copy and x.copy() is deep copy.

There is max, min, sum. cov, mean, std, var. svd. Easy.

Now it's time to write a scatter plot. I'm not sure what to do, but I'll try the simplest from pylab import * for now. ref: <a href='http://d.hatena.ne.jp/pashango_p/ 20090618/1245333148'>Memorandum on matplotlib, a python graph library - Pashango's Blog</a> By the way, I imported pylab and pressed TAB on pylab. to see what members were there. I was so disappointed that there were 917 members. fromfunction is one of them?

>||
In [27]: from pylab import *

In [28]: scatter(randn(100), randn(100))
Out[28]: <matplotlib.collections.CircleCollection at 0x103fabe90>
||<

randn creates an array of random numbers that follow a standard normal distribution. And I heard that in some environments, show() here brings up a graph window, but in my environment, it does not. I was looking up the problem and found that I could just output the graph to a file. ref: <a href='http://loumo.jp/wp/archive/20100219183704/'>Drawing graphs with matplotlib. - loumo.jp</a>

>||
In [30]: import matplotlib

In [31]: matplotlib.pyplot.savefig('test.png')

In [32]: !open test.png
||<
<img src="http://gyazo.com/43e25e76797368a364a92e5d7e4bffa0.png">

It worked. By the way, I forgot to mention that my environment is MacOSX 10.6.8 and open is a MacOSX command. It opens the file with an appropriate application. In my case, it opens a preview. And preview automatically reloads the open file if it is changed. Well, I guess it doesn't really matter if show() doesn't work.

You may want to try to redraw the graph here with different parameters, but if you redraw it as is, it will be appended to the previous graph. If you want to experiment interactively, you need to remember that clf() is a function to clear and redraw a graph. (I guess it stands for clear figure.)

summary
>||
In [1]: from numpy import *
In [27]: from pylab import *
In [30]: import matplotlib

In [28]: scatter(randn(100), randn(100))
Out[28]: <matplotlib.collections.CircleCollection at 0x103fabe90>

In [31]: matplotlib.pyplot.savefig('test.png')
In [32]: !open test.png

# Try to redraw with different parameters
In [33]: clf()

In [39]: data = c_[randn(100), randn(100)]

In [40]: data = data.dot(array([[1, 1], [0, 1]])) # try to slant it

In [42]: scatter(data[:, 0], data[:, 1])
Out[42]: <matplotlib.collections.CircleCollection at 0x104411290>

In [43]: matplotlib.pyplot.savefig('test.png')
||<

<img src="http://gyazo.com/8f6627b2553050a9c3d13fd304a41e2c.png">

PS: I forgot to mention that marker="+" can be used as an optional argument to scatter.

PS: Specifying alpha=0.5 is cool.

*1320222463* How to combine ipython and pdb to make life easier.
If an unforeseen exception occurs:.
>||
In [55]: scatter(data[:, 0], data[:, 1], marker=".")
---------------------------------------------------------------------------
(omitted)
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, faceted, verts, **kwargs)
   5736             sym = syms.get(marker)
   5737             if sym is None and verts is None:
-> 5738                 raise ValueError('Unknown marker symbol to scatter')
   5739             numsides, rotation, symstyle = syms[marker]
   5740 

ValueError: Unknown marker symbol to scatter
||<

Where the hell is the list of available symbols (<a href='http://matplotlib.sourceforge.net/api/artist_api.html?highlight=marker#matplotlib.lines.Line2D. set_marker'>matplotlib.lines.Line2D.set_marker</a>)? Line2D.set_marker</a>).

>||
In [56]: pdb
Automatic pdb calling has been turned ON

In [57]: scatter(data[:, 0], data[:, 1], marker=".")
> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/axes.py(5738)scatter()
   5737             if sym is None and verts is None:
-> 5738                 raise ValueError('Unknown marker symbol to scatter')
   5739             numsides, rotation, symstyle = syms[marker]

ipdb> syms
{'x': (4, 0.7853981633974483, 2), '^': (3, 0, 0), 'd': (4, 0, 0), 'h': (6, 0, 0), '+': (4, 0, 2), 'o': (0, 0, 3), 'p': (5, 0, 0), 's': (4, 0.7853981633974483, 0), 'v': (3, 3.141592653589793, 0), '8': (8, 0, 0), '<': (3, 4.71238898038469, 0), '>': (3, 1.5707963267948966, 0)}
||<

Hmmm, not many. I wish I had left the dots in, so I didn't have to include four different triangles.

By the way, you can get out of pdb mode with q. If you type pdb one more time, Auto-pdb is turned off.

And by the way, as for "where can I find a list of symbols I can use?", I found it in help(scatter), sorry.
</body>

Hatena Diary 2011-11-02

This page is auto-translated from /nishio/Hatena2011-11-02 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.