Ritwik Gupta and Anusha Sinha discuss SEI work on NetFlow that aims to distinguish human-generated flows from machine-generated flows to identify human actors on networks and potential network threats more easily.
With millions of flows on a given network a day, clustering and labeling them becomes challenging. For analysts to be able to take action on potential network threats, it is critical to label flows as quickly and efficiently as possible—often in near-real time. In this SEI Cyber Talk Episode, Ritwik Gupta and Anusha Sinha discuss SEI work on NetFlow that aims to find the most effective way of clustering flows for determining which are generated by humans and which are generated by machines. The work involves constructing a graph that establishes server IPs as nodes, and then formulates the flow partitioning problem as an instance of max-flow using two super nodes and various similarity metrics between server IPs. Partitioning is then accomplished by finding a minimum cut in the graph. Ritwik and Anusha discuss sparsification of the abstract flow network via spectral similarity of graph Laplacians as a technique for improving the algorithm’s efficiency. The ability to quickly label human-generated flows and machine-generated flows could assist analysts in identifying potential network threats as well as with network profiling efforts.