1 [[!meta title="HDF5 and h5py"]]
3 [h5py][] is a [[Python]] interface to the [Hierarchical Data
4 Format][HDF] library, version 5. It provides a mature, stable, open
5 way to store data. The HDF5 [tutorial][] provides an excellent
6 introduction to the basic concepts of HDF5.
8 Useful utilities included with the HDF5 library:
10 * `h5dump` (command line HDF5 extraction)
11 * `h5stat` (command line HDF5 database statistics)
13 I'll walk through the HDF5 tutorial with `h5py` to give you a feel for
14 how things work. It may help to keep in mind the following HDF5 to
15 filesystem concept map:
18 <tr><th>HDF5</th><th>filesystem</th></tr>
19 <tr><td>dataset</td><td>file</td></tr>
20 <tr><td>attribute</td><td>metadata/header</td></tr>
21 <tr><td>group</td><td>directory</td></tr>
24 [h5py]: http://code.google.com/p/h5py/
25 [HDF]: http://www.hdfgroup.org/HDF5/
26 [tutorial]: http://www.hdfgroup.org/HDF5/Tutor/
33 >>> f = h5py.File('file.h5', 'w')
49 >>> f = h5py.File('dset.h5', 'w')
50 >>> f['dset'] = numpy.zeros((6,4), dtype=numpy.int32)
59 DATATYPE H5T_STD_I32LE
60 DATASPACE SIMPLE { ( 6, 4 ) / ( 6, 4 ) }
73 Reading from and writing to a dataset
74 -------------------------------------
78 >>> f = h5py.File('dset.h5', 'w')
79 >>> f['dset'] = numpy.arange(24, dtype=numpy.int32).reshape((4, 6))
82 <HDF5 dataset "dset": shape (4, 6), type "<i4">
84 array([[ 0, 1, 2, 3, 4, 5],
85 [ 6, 7, 8, 9, 10, 11],
86 [12, 13, 14, 15, 16, 17],
87 [18, 19, 20, 21, 22, 23]])
96 DATATYPE H5T_STD_I32LE
97 DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
99 (0,0): 0, 1, 2, 3, 4, 5
100 (1,0): 6, 7, 8, 9, 10, 11
101 (3,0): 12, 13, 14, 15, 16, 17
102 (4,0): 18, 19, 20, 21, 22, 23
108 Creating an attribute
109 ---------------------
111 Using our file from the previous example:
115 >>> f = h5py.File('dset.h5', 'a')
117 >>> dset.attrs['Units'] = [100, 200]
126 DATATYPE H5T_STD_I32LE
127 DATASPACE SIMPLE { ( 6, 4 ) / ( 6, 4 ) }
132 (3,0): 12, 13, 14, 15,
133 (4,0): 16, 17, 18, 19,
134 (5,0): 20, 21, 22, 23
137 DATATYPE H5T_STD_I32LE
138 DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
151 >>> f = h5py.File('group.h5', 'w')
152 >>> g = f.create_group('/MyGroup')
154 <HDF5 group "/MyGroup" (0 members)>
167 Creating groups using absolute and relative names
168 -------------------------------------------------
171 >>> f = h5py.File('groups.h5', 'w')
172 >>> g1 = f.create_group('/MyGroup')
173 >>> g2 = f.create_group('/MyGroup/Group_A')
174 >>> g3 = g1.create_group('Group_B')
177 >>> f['MyGroup'].keys()
178 ['Group_A', 'Group_B']
195 Creating datasets in groups
196 ---------------------------
198 Using our file from the previous example:
201 >>> f = h5py.File('groups.h5', 'a')
202 >>> f['/MyGroup/dset1'] = [3, 3]
203 >>> g = f['/MyGroup/Group_A']
204 >>> g['dset2'] = [2, 10]
215 DATATYPE H5T_STD_I32LE
216 DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
225 DATATYPE H5T_STD_I32LE
226 DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
235 Reading from or writing to a subset of a dataset
236 ------------------------------------------------
238 Just use the [Numpy slice indexing][slice] you're used to.
242 >>> f = h5py.File('hype.h5', 'w')
243 >>> f['IntArray'] = numpy.ones((8, 10))
244 >>> dset = f['IntArray']
246 array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
247 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
248 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
249 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
250 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
251 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
252 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
253 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
254 >>> f['IntArray'][:,5:] = 2
256 array([[ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
257 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
258 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
259 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
260 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
261 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
262 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
263 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.]])
264 >>> dset[1:4,2:6] = 5
265 >>> f['IntArray'].value
266 array([[ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
267 [ 1., 1., 5., 5., 5., 5., 2., 2., 2., 2.],
268 [ 1., 1., 5., 5., 5., 5., 2., 2., 2., 2.],
269 [ 1., 1., 5., 5., 5., 5., 2., 2., 2., 2.],
270 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
271 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
272 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
273 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.]])
276 [slice]: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
281 Your array's `numpy.dtype` will be preserved.
284 >>> f = h5py.File('dtype.h5', 'w')
285 >>> f['complex'] = 2 + 3j
286 >>> f['complex'].dtype
288 >>> type(f['complex'].value)
290 >>> f['complex array'] = [1 + 2j, 3 + 4j]
291 >>> f['complex array'].dtype
293 >>> type(f['complex array'].value)
294 <type 'numpy.ndarray'>
303 DATATYPE H5T_COMPOUND {
315 DATASET "complex array" {
316 DATATYPE H5T_COMPOUND {
320 DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
340 Chunking and extendible datasets
341 --------------------------------
343 Extendible datasets must be chunked.
347 >>> f = h5py.File('ext.h5', 'w')
348 >>> f['simple'] = [1, 2, 3] # not chunked
353 Traceback (most recent call last):
355 TypeError: Only chunked datasets can be resized
356 >>> c = f.create_dataset('chunked', (3,), numpy.int32, chunks=(2,))
362 array([1, 2, 3, 0, 0, 0])
364 Traceback (most recent call last):
366 TypeError: New shape length (2) must match dataset rank (1)
369 The "chunkiness" of data is not listed by `h5dump`,
375 DATATYPE H5T_STD_I32LE
376 DATASPACE SIMPLE { ( 6 ) / ( 6 ) }
378 (0): 1, 2, 3, 0, 0, 0
382 DATATYPE H5T_STD_I32LE
383 DATASPACE SIMPLE { ( 3 ) / ( 3 ) }
393 >>> f = h5py.File('ext.h5', 'a')
394 >>> f['chunked'].chunks
396 >>> f['simple'].chunks == None
400 [[!tag tags/programming]]