![]() ![]() However, for the shortest path problem (not analysed in their paper) it lags behind all other packages. That is consistent with the findings of their research paper where they claim that using some of the latest state of the art algorithms led to their processing speed being faster by an order of magnitude. For the k-core decomposition it is also 10 times faster than all other competitors or 2000 times networkx. On the pokec dataset it takes just 0.2s to run the page rank algorithm (graph-tool: 1.7s, igraph: 59.6s, snap: 19.5s). When networkit is fast, it is extremely fast. Networkit and graph-tool takes the top spot in most of the tests with graph-tool having the shortest run time for the single source shortest path and connected components problems and networkit winning the race for k-core and page rank. The other 3 packages should be using C libraries to read the files which result in better performance. I was reading the datasets as a tab delimited file and graph-tool basically uses a Python code to parse the input. Looking at the plots above, graph-tool and networkit loads data much more slowly than the other two libraries. Here are the run times of the remaining four packages:įull results can be seen from the table below: dataset Hence, I left it out of the comparison plots. Page rank took more than 10 minutes to run compared to 1 minute for igraph. ![]() 2 For example, it took 67s to run the single source shortest path problem on the Pokec dataset compared to 6.8s for networkit (the next slowest). Across all computation tasks and for all datasets it is around 10 times slower than the slowest library. ![]() Networkx is much slower than any of the other libraries. ResultsĪll timings reported are normalised to reflect the run time for a single run of the task. Most of them were written in 2015/2016 and it will be interesting to see if anything has changed since then. Networkit technical paper, compares networkit with igraph and graph-tool.SNAP research paper, compares snap with igraph and networkx.Graph-tool performance comparison, compares graph-tool with igraph and networkx.Here's a list of other comparative benchmarks for the interested viewer to check out: I try to offer a more subjective view based on my experience with these packages. While it is the easiest to rank the packages based on the run-time of the algorithms, it is only one of the many considerations of what makes a good package. Edit: Some of the observed differences in performance might be a result of different stopping criteria used - see algorithms for more information.ģ datasets from the Stanford Large Network Dataset Collection were used in the exercise: Disclaimer: I try as much as possible to specify the same parameters for each algorithm but differences in API across the packages could translate to actual differences in how the algorithm is run and the final output. Loading is more of an I/O task while the other 4 are common graph algorithms. In the end, I decided to focus on 5 specific problems: Selecting what tasks to compare on is not really a trivial task with each package offering various tools and capabilities. The other 3 libraries (snap, networkit and graph-tool) have an additional emphasis on performance with multi-processing capabilities built in. Igraph has a R and Mathematica binding as well but to be consistent the following benchmark was based on the Python one. Networkx is written in Python while the other four packages are based on C / C++ but have Python APIs. The benchmark was carried out using a Google Compute n1-standard-16 instance (16vCPU Haswell 2.3GHz, 60 GB memory). Instructions on how to setup and install the packages are also located in the repository. To replicate the benchmark study and for the full codes, please refer to my github repository. I think the trend of powerful single machines will eliminate a lot of the need for enterprise clusters so it will be interesting to see how far we can push a single machine using optimised algorithms. Running large scale computations is also much easier nowadays with the availability of virtual machines that could be easily spinned up thanks to the growth of cloud computing. Having tried out a few (networkx in Python and igraph in R) but on different problems, I thought it would be nice to have a head to head comparison. Recently, I have been working with large networks (millions of vertices and edges) and often wonder what is the best currently available package/tool that would scale well and handle large scale network analysis tasks. This was inspired by two questions I had: In this post I benchmark the performance of 5 popular graph/network packages. This post is superseded by an updated benchmark ![]()
0 Comments
Leave a Reply. |