1 [[!meta title="HDF5 and h5py"]]
3 [h5py][] is a [[Python]] interface to the [Hierarchical Data
4 Format][HDF] library, version 5. It provides a mature, stable, open
5 way to store data. The HDF5 [tutorial][] provides an excellent
6 introduction to the basic concepts of HDF5.
8 Useful utilities included with the HDF5 library:
10 * `h5dump` (command line HDF5 extraction)
11 * `h5stat` (command line HDF5 database statistics)
13 There's also [[HDFView]] which provides a nice graphical interface.
15 I'll walk through the HDF5 tutorial with `h5py` to give you a feel for
16 how things work. It may help to keep in mind the following HDF5 to
17 filesystem concept map:
20 <tr><th>HDF5</th><th>filesystem</th></tr>
21 <tr><td>dataset</td><td>file</td></tr>
22 <tr><td>attribute</td><td>metadata/header</td></tr>
23 <tr><td>group</td><td>directory</td></tr>
26 [h5py]: http://code.google.com/p/h5py/
27 [HDF]: http://www.hdfgroup.org/HDF5/
28 [tutorial]: http://www.hdfgroup.org/HDF5/Tutor/
35 >>> f = h5py.File('file.h5', 'w')
51 >>> f = h5py.File('dset.h5', 'w')
52 >>> f['dset'] = numpy.zeros((6,4), dtype=numpy.int32)
61 DATATYPE H5T_STD_I32LE
62 DATASPACE SIMPLE { ( 6, 4 ) / ( 6, 4 ) }
75 Reading from and writing to a dataset
76 -------------------------------------
80 >>> f = h5py.File('dset.h5', 'w')
81 >>> f['dset'] = numpy.arange(24, dtype=numpy.int32).reshape((4, 6))
84 <HDF5 dataset "dset": shape (4, 6), type "<i4">
86 array([[ 0, 1, 2, 3, 4, 5],
87 [ 6, 7, 8, 9, 10, 11],
88 [12, 13, 14, 15, 16, 17],
89 [18, 19, 20, 21, 22, 23]])
98 DATATYPE H5T_STD_I32LE
99 DATASPACE SIMPLE { ( 4, 6 ) / ( 4, 6 ) }
101 (0,0): 0, 1, 2, 3, 4, 5
102 (1,0): 6, 7, 8, 9, 10, 11
103 (3,0): 12, 13, 14, 15, 16, 17
104 (4,0): 18, 19, 20, 21, 22, 23
110 Creating an attribute
111 ---------------------
113 Using our file from the previous example:
117 >>> f = h5py.File('dset.h5', 'a')
119 >>> dset.attrs['Units'] = [100, 200]
128 DATATYPE H5T_STD_I32LE
129 DATASPACE SIMPLE { ( 6, 4 ) / ( 6, 4 ) }
134 (3,0): 12, 13, 14, 15,
135 (4,0): 16, 17, 18, 19,
136 (5,0): 20, 21, 22, 23
139 DATATYPE H5T_STD_I32LE
140 DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
153 >>> f = h5py.File('group.h5', 'w')
154 >>> g = f.create_group('/MyGroup')
156 <HDF5 group "/MyGroup" (0 members)>
169 Creating groups using absolute and relative names
170 -------------------------------------------------
173 >>> f = h5py.File('groups.h5', 'w')
174 >>> g1 = f.create_group('/MyGroup')
175 >>> g2 = f.create_group('/MyGroup/Group_A')
176 >>> g3 = g1.create_group('Group_B')
179 >>> f['MyGroup'].keys()
180 ['Group_A', 'Group_B']
197 Creating datasets in groups
198 ---------------------------
200 Using our file from the previous example:
203 >>> f = h5py.File('groups.h5', 'a')
204 >>> f['/MyGroup/dset1'] = [3, 3]
205 >>> g = f['/MyGroup/Group_A']
206 >>> g['dset2'] = [2, 10]
217 DATATYPE H5T_STD_I32LE
218 DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
227 DATATYPE H5T_STD_I32LE
228 DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
237 Reading from or writing to a subset of a dataset
238 ------------------------------------------------
240 Just use the [Numpy slice indexing][slice] you're used to.
244 >>> f = h5py.File('slice.h5', 'w')
245 >>> f['IntArray'] = numpy.ones((8, 10))
246 >>> dset = f['IntArray']
248 array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
249 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
250 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
251 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
252 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
253 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
254 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
255 [ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])
256 >>> f['IntArray'][:,5:] = 2
258 array([[ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
259 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
260 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
261 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
262 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
263 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
264 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
265 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.]])
266 >>> dset[1:4,2:6] = 5
267 >>> f['IntArray'][...]
268 array([[ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
269 [ 1., 1., 5., 5., 5., 5., 2., 2., 2., 2.],
270 [ 1., 1., 5., 5., 5., 5., 2., 2., 2., 2.],
271 [ 1., 1., 5., 5., 5., 5., 2., 2., 2., 2.],
272 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
273 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
274 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
275 [ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.]])
278 Here's an example of altering a scalar value:
282 >>> f = h5py.File('scalar.h5', 'w')
287 >>> f['int'][...] = 2
293 I haven't been able to track down official documentation for the
294 `dataset[...]` syntax, but it is mentioned in [the 1.3 release
295 announcement][message] that Andrew sent to the `scipy-user` list.
297 [slice]: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
298 [message]: http://mail.scipy.org/pipermail/scipy-user/2010-February/024364.html
303 Your array's `numpy.dtype` will be preserved.
306 >>> f = h5py.File('dtype.h5', 'w')
307 >>> f['complex'] = 2 + 3j
308 >>> f['complex'].dtype
310 >>> type(f['complex'][...])
312 >>> f['complex array'] = [1 + 2j, 3 + 4j]
313 >>> f['complex array'].dtype
315 >>> type(f['complex array'][...])
316 <type 'numpy.ndarray'>
325 DATATYPE H5T_COMPOUND {
337 DATASET "complex array" {
338 DATATYPE H5T_COMPOUND {
342 DATASPACE SIMPLE { ( 2 ) / ( 2 ) }
362 Chunking and extendible datasets
363 --------------------------------
365 Extendible datasets must be chunked.
369 >>> f = h5py.File('ext.h5', 'w')
370 >>> f['simple'] = [1, 2, 3] # not chunked
375 Traceback (most recent call last):
377 TypeError: Only chunked datasets can be resized
378 >>> c = f.create_dataset('chunked', (3,), numpy.int32, chunks=(2,))
384 array([1, 2, 3, 0, 0, 0])
386 Traceback (most recent call last):
388 TypeError: New shape length (2) must match dataset rank (1)
391 The "chunkiness" of data is not listed by `h5dump`,
397 DATATYPE H5T_STD_I32LE
398 DATASPACE SIMPLE { ( 6 ) / ( 6 ) }
400 (0): 1, 2, 3, 0, 0, 0
404 DATATYPE H5T_STD_I32LE
405 DATASPACE SIMPLE { ( 3 ) / ( 3 ) }
415 >>> f = h5py.File('ext.h5', 'a')
416 >>> f['chunked'].chunks
418 >>> f['simple'].chunks == None
422 [[!tag tags/programming]]