Document Type
Conference Proceeding
Publication Date
5-2017
Publisher
IEEE, IFIP, IEEE Communications Society
Abstract
The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge for parallel analysis of large networks, where packet traces can be many gigabytes in size. In this work we present RAPCAP, a novel method for random access into variable-length record collections like PCAP by identifying a record boundary within a small number of bytes of the access point. Unlike related heuristic methods that can limit scalability with a nonzero probability of error, the new method offers a correctness guarantee with a well formed file and does not rely on prior knowledge of the contents. We include a practical implementation of the algorithm with an extension to the Hadoop framework, and a performance comparison to serial ingestion. Finally, we present a number of similar storage types that could utilize a modified version of RAPCAP for random access.
Recommended Citation
Please use the publisher's recommended citation.
Comments
Presented at the 2nd IFIP/IEEE International Workshop on Analytics for Network and Service Management (AnNet 2017), held in conjunction with IFIP/IEEE International Symposium on Integrated Network Management
May 8, 2017 in Lisbon, Portugal