marshal 、pickle、cPickle 比较
Feb 20, 2014
本文内容适用于 python 2.x 版本
marshal、pickle、cPickle都可用实现数据的序列化,其中cPickle是用C语言实现的pickle版本,比pickle速度快很多。
三个模块的比较如下
速度 #
经stackoverflow上网友juliomalegria的测试,三个模块在dump和load的速度上:
- dump:marshal最快,cPickle其次,pickle最慢。
- load:cPickle最快,marshal其次,pickle最慢
综合来说,marshal和cPickle 在dump和load上各有优劣,不过相差不太明显。pickle模块落后较大。
Johan Dahlin在该问题的回答中给出了原因:cPickle对存储的数据进行了优化,而marshal没有。
cPickle has a smarter algorithm than marshal and is able to do tricks to reduce the space used by large objects. That means it’ll be slower to decode but faster to encode as the resulting output is smaller.marshal is simplistic and serializes the object straight as-is without doing any further analyze it. That also answers why the marshal loading is so inefficient, it simply has to do more work – as in reading more data from disk – to be able to do the same thing as cPickle.
marshal and cPickle are really different things in the end, you can’t really get both fast saving and fast loading since fast saving implies analyzing the data structures less which implies saving a lot of data to disk.
移植性 #
marshal不是一个通用的序列化模块,不同版本的Python对应的marshal不保证相同(即marshal不保证能跨版本)
This is not a general “persistence” module.
it may change between Python versions (although it rarely does) 摘自官方文档
推荐使用cPickle模块,它是一个标准的序列化模块,用户不需要考虑不同版本的移植问题,而且效率比较好。
结论 #
忘掉marshal和pickle吧,与cPickle为友!