[PAM-24]
Wang, Chao and Finamore, Alessandro and Pietro, Michiardi and Gallo, Massimo and Rossi, Dario,
"Data Augmentation for Traffic Classification"
Passive and Active Measurements (PAM)
apr.
2024,
arXiv Conference Runner-up
@inproceedings{PAM-24,
title = {{Data Augmentation for Traffic Classification}},
author = {Wang, Chao and Finamore, Alessandro and Pietro, Michiardi and Gallo, Massimo and Rossi, Dario},
year = {2024},
month = apr,
booktitle = {Passive and Active Measurements (PAM)},
note = {bestpaperrunnerup},
arxiv = {https://arxiv.org/abs/2401.10754},
howpublished = {https://arxiv.org/abs/2401.10754}
}
@inproceedings{DR:CoNEXT-21a,
author = {Gioacchini, Luca and Vassio, Luca and Mellia, Marco and Drago, Idilio and Ben Houidi, Zied and Rossi, Dario},
title = {DarkVec: Automatic Analysis of Darknet Traffic with Word Embeddings},
booktitle = {ACM CoNEXT, Runner-up for best paper award},
year = {2021},
note = {bestpaperrunnerup},
month = dec,
partner = {polito-mellia},
howpublished = {https://nonsns.github.io/paper/rossi21conext-a.pdf},
topic = {network-security}
}
Darknets are passive probes listening to traffic reaching IP addresses that host no services. Traffic reaching them is unsolicited by nature and often induced by scanners, malicious senders and misconfigured hosts. Its peculiar nature makes it a valuable source of information to learn about malicious activities. However, the massive amount of packets and sources that reach darknets makes it hard to extract meaningful insights. In particular, multiple senders contact the darknet while performing similar and coordinated tasks, which are often commanded by common controllers (botnets, crawlers, etc.). How to automatically identify and group those senders that share similar behaviors remains an open problem. We here introduce DarkVec, a methodology to identify clusters of senders (i.e., IP addresses) engaged in similar activities on darknets. DarkVec leverages word embedding techniques (e.g., Word2Vec) to capture the co-occurrence patterns of sources hitting the darknets. We extensively test DarkVec and explore its design space in a case study using one month of darknet data. We show that with a proper definition of service, the generated embeddings can be easily used to (i) associate unknown senders’ IP addresses to the correct known labels (more than 96% accuracy), and (ii) identify new attack and scan groups of previously unknown senders. We contribute DarkVec source code and datasets to the community also to stimulate the use of word embeddings to automatically learn patterns on generic traffic traces
@inproceeding{DR:ITC-20,
author = {Navarro, Jose M. and Rossi, Dario},
title = {HURRA! Human readable router anomaly detection},
booktitle = {International Teletraffic Congress (ITC32)},
month = sep,
year = {2020},
volume = {},
pages = {},
doi = {},
note = {bestpaperaward},
howpublished = {https://nonsns.github.io/paper/rossi20itc.pdf},
topic = {ad-fs}
}
This paper presents HURRA, a system that aims to reduce the time spent by human operators in the process of network troubleshooting. To do so, it comprises two modules that are plugged after any anomaly detection algorithm: (i) a first attention mechanism, that ranks the present features in terms of their relation with the anomaly and (ii) a second module able to incorporates previous expert knowledge seamlessly, without any need of human interaction nor decisions. We show the efficacy of these simple processes on a collection of real router datasets obtained from tens of ISPs which exhibit a rich variety of anomalies and very heterogeneous set of KPIs, on which we gather manually annotated ground truth by the operator solving the troubleshooting ticket. Our experimental evaluation shows that (i) the proposed system is effective in achieving high levels of agreement with the expert, that (ii) even a simple statistical approach is able to extracting useful information from expert knowledge gained in past cases to further improve performance and finally that (iii) the main difficulty in live deployment concerns the automated selection of the anomaly detection algorithm and the tuning of its hyper-parameters.
@inproceedings{DR:NOSSDAV-18,
author = {Samain, Jacques and Carofiglio, Giovanna and Tortelli, Michele and Rossi, Dario},
title = {A simple yet effective network-assisted signal for enhanced DASH quality of experience},
booktitle = {28th ACM SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV'18)},
month = jun,
year = {2018},
note = {bestpaperaward},
howpublished = {https://nonsns.github.io/paper/rossi18nossdav.pdf}
}
propose and evaluate simple signals coming from in-network telemetry that are effective to enhance the quality of DASH streaming. Specifically, in-network caching is known to positively affect DASH streaming quality but at the same time negatively affect the controller stability, increasing the quality switch ratio. Our contributions are to first (i) consider the broad spectrum of interaction between the network and the application, and then (ii) to devise how to effectively exploit in a DASH controller a very simple signal (i.e., per-quality hit ratio) that can be exported by framework such as Server and Network Assisted DASH (SAND) at fairly low rate (i.e., a timescale of 10s of seconds). Our thorough experimental campaign confirms the soundness of the approach (that significantly ameliorate performance with respect to network-blind DASH), as well as its robustness (i.e., tuning is not critical) and practical appeal (i.e., due to its simplicity and compatibility with SAND).
[PAM-18b]
da Hora, Diego Neves and Asrese, Alemnew Sheferaw and Christophides, Vassilis and Teixeira, Renata and Rossi, Dario,
"Narrowing the gap between QoS metrics and Web QoE using Above-the-fold metrics"
International Conference on Passive and Active Network Measurement (PAM), Receipient of the Best dataset award
mar.
2018,
Conference Award
@inproceedings{DR:PAM-18b,
title = {Narrowing the gap between QoS metrics and Web QoE using Above-the-fold metrics},
author = {da Hora, Diego Neves and Asrese, Alemnew Sheferaw and Christophides, Vassilis and Teixeira, Renata and Rossi, Dario},
booktitle = {International Conference on Passive and Active Network Measurement (PAM), Receipient of the Best dataset award},
address = {Berlin, Germany},
month = mar,
year = {2018},
note = {bestpaperaward},
howpublished = {https://nonsns.github.io/paper/rossi18pam-b.pdf}
}
Page load time (PLT) is still the most common application Quality of Service (QoS) metric to estimate the Quality of Experience (QoE) of Web users. Yet, recent literature abounds with proposals for alternative metrics (e.g., Above The Fold, SpeedIndex and their variants) that aim at better estimating user QoE. The main purpose of this work is thus to thoroughly investigate a mapping between established and recently proposed objective metrics and user QoE. We obtain ground truth QoE via user experiments where we collect and analyze 3,400 Web accesses annotated with QoS metrics and explicit user ratings in a scale of 1 to 5, which we make available to the community. In particular, we contrast domain expert models (such as ITU-T and IQX) fed with a single QoS metric, to models trained using our ground-truth dataset over multiple QoS metrics as features. Results of our experiments show that, albeit very simple, expert models have a comparable accuracy to machine learning approaches. Furthermore, the model accuracy improves considerably when building per-page QoE models, which may raise scalability concerns as we discuss.
[ITC28a]
Araldo, Andrea and Dan, Gyorgy and Rossi, Dario,
"Stochastic Dynamic Cache Partitioning for Encrypted Content Delivery"
ITC28, Runner-up for best paper award and receipient of the IEEE ComSoc/ISOC Internet Technical Committee Best paper award 2016-2017
sep.
2016,
Conference Award
@inproceedings{DR:ITC28a,
title = {Stochastic Dynamic Cache Partitioning for Encrypted Content Delivery},
author = {Araldo, Andrea and Dan, Gyorgy and Rossi, Dario},
year = {2016},
month = sep,
booktitle = {ITC28, Runner-up for best paper award and receipient of the IEEE ComSoc/ISOC Internet Technical Committee Best paper award 2016-2017},
topic = {icn,optimization,streaming},
note = {bestpaperaward},
howpublished = {https://nonsns.github.io/paper/rossi16itc28-a.pdf}
}
In-network caching is an appealing solution to cope with the increasing bandwidth demand of video, audio and data transfer over the Internet. Nonetheless, an increasing share of content delivery services adopt encryption through HTTPS, which is not compatible with traditional ISP-managed approaches like transparent and proxy caching. This raises the need for solutions involving both Internet Service Providers (ISP) and Content Providers (CP): by design, the solution should preserve business-critical CP information (e.g., content popularity, user preferences) on the one hand, while allowing for a deeper integration of caches in the ISP architecture (e.g., in 5G femto-cells) on the other hand. In this paper we address this issue by considering a contentoblivious ISP-operated cache. The ISP allocates the cache storage to various content providers so as to maximize the bandwidth savings provided by the cache: the main novelty lies in the fact that, to protect business-critical information, ISPs only need to measure the aggregated miss rates of the individual CPs and does not need to be aware of the objects that are requested, as in classic caching. We propose a cache allocation algorithm based on a perturbed stochastic subgradient method, and prove that the algorithm converges close to the allocation that maximizes the overall cache hit rate. We use extensive simulations to validate the algorithm and to assess its convergence rate under stationary and non-stationary content popularity. Our results (i) testify the feasibility of content-oblivious caches and (ii) show that the proposed algorithm can achieve within 10% from the global optimum in our evaluation.
[SIGCOMM-QoE-16]
Bocchi, Enrico and De Cicco, Luca and Rossi, Dario,
"Measuring the Quality of Experience of Web users"
ACM SIGCOMM Workshop on QoE-based Analysis and Management of Data Communication Networks (Internet-QoE 2016), selected as best paper in the workshop for reprint in ACM SIGCOMM Comput. Commun. Rev.
aug.
2016,
Conference Award
@inproceedings{DR:SIGCOMM-QoE-16,
title = {Measuring the Quality of Experience of Web users},
author = {Bocchi, Enrico and De Cicco, Luca and Rossi, Dario},
year = {2016},
month = aug,
booktitle = {ACM SIGCOMM Workshop on QoE-based Analysis and Management of Data Communication Networks (Internet-QoE 2016), selected as best paper in the workshop for reprint in ACM SIGCOMM Comput. Commun. Rev.},
topic = {internetmeasurement,qoe},
note = {bestpaperaward},
howpublished = {https://nonsns.github.io/paper/rossi16internet-qoe.pdf}
}
[INFOCOM-IC-16]
Bocchi, Enrico and De Cicco, Luca and Rossi, Dario,
"Web QoE: Moving beyond Google’s SpeedIndex"
Finalist at the IEEE INFOCOM Innovation Challenge,
apr.
2016,
Conference Runner-up
@inproceedings{DR:INFOCOM-IC-16,
title = {Web QoE: Moving beyond Google's SpeedIndex},
author = {Bocchi, Enrico and De Cicco, Luca and Rossi, Dario},
year = {2016},
month = apr,
booktitle = {Finalist at the IEEE INFOCOM Innovation Challenge,},
note = {bestpaperrunnerup},
topic = {qoe,internetmeasurement},
howpublished = {https://nonsns.github.io/paper/rossi16infocom-innovation-challenge.pdf}
}
[CoNEXT-15]
Cicalese, Danilo and Auge, Jordan and Joumblatt, Diana and Friedman, Timur and Rossi, Dario,
"Characterizing IPv4 Anycast Adoption and Deployment"
ACM CoNEXT, awarded the IRTF Applied Network Research Prize at IETF96
dec.
2015,
Conference Award
@inproceedings{DR:CoNEXT-15,
title = {Characterizing IPv4 Anycast Adoption and Deployment},
author = {Cicalese, Danilo and Auge, Jordan and Joumblatt, Diana and Friedman, Timur and Rossi, Dario},
booktitle = {ACM CoNEXT, awarded the IRTF Applied Network Research Prize at IETF96},
address = {Heidelberg},
month = dec,
year = {2015},
topic = {anycast,internetmeasurement},
note = {bestpaperaward},
howpublished = {https://nonsns.github.io/paper/rossi15conext.pdf}
}
This paper provides a comprehensive picture of IP-layer anycast adoption in the current Internet. We carry on multiple IPv4 anycast censuses, relying on latency measurement from PlanetLab. Next, we leverage our novel technique for anycast detection, enumeration, and geolocation to quantify anycast adoption in the Internet. Our technique is scalable and, unlike previous efforts that are bound to exploiting DNS, is protocol-agnostic. Our results show that major Internet companies (including tier-1 ISPs, over-the-top operators, Cloud providers and equipment vendors) use anycast: we find that a broad range of TCP services are offered over anycast, the most popular of which include HTTP and HTTPS by anycast CDNs that serve websites from the top-100k Alexa list. Additionally, we complement our characterization of IPv4 anycast with a description of the challenges we faced to collect and analyze large-scale delay measurements, and the lessons learned.
[TRAC-15]
Bocchi, Enrico and Safari, Ali and Traverso, Stefano and Finamore, Alessandro and Di Gennaro, Valeria and Mellia, Marco and Munafo, Maurizio and Rossi, Dario,
"Impact of Carrier-Grade NAT on Web Browsing"
6th International Workshop on TRaffic Analysis and Characterization (TRAC), Best paper award
aug.
2015,
Conference Award
@inproceedings{DR:TRAC-15,
title = {Impact of Carrier-Grade NAT on Web Browsing},
author = {Bocchi, Enrico and Safari, Ali and Traverso, Stefano and Finamore, Alessandro and Di Gennaro, Valeria and Mellia, Marco and Munafo, Maurizio and Rossi, Dario},
year = {2015},
booktitle = {6th International Workshop on TRaffic Analysis and Characterization (TRAC), Best paper award},
month = aug,
topic = {internetmeasurement,passivemeasurement},
note = {bestpaperaward},
howpublished = {https://nonsns.github.io/paper/rossi15trac.pdf}
}
Public IPv4 addresses are a scarce resource. While IPv6 adoption is lagging, Network Address Translation (NAT) technologies have been deployed over the last years to alleviate IPv4 exiguity and their high rental cost. In particular, Carrier- Grade NAT (CGN) is a well known solution to mask a whole ISP network behind a limited amount of public IP addresses, significantly reducing expenses. Despite its economical benefits, CGN can introduce connectivity issues which have sprouted a considerable effort in research, development and standardization. However, to the best of our knowledge, little effort has been dedicated to investigate the impact that CGN deployment may have on users traffic. This paper fills the gap. We leverage passive measurements from an ISP network deploying CGN and, by means of the Jensen- Shannon divergence, we contrast several performance metrics considering customers being offered public or private addresses. In particular, we gauge the impact of CGN presence on users web browsing experience. Our results testify that CGN is a mature and stable technology as, if properly deployed, it does not harm users web browsing experience. Indeed, while our analysis lets emerge expected stochastic differences of certain indexes (e.g., the difference in the path hop count), the measurements related to the quality of users browsing are otherwise unperturbed. Interestingly, we also observe that CGN
@inproceedings{DR:TRAC-14,
title = {A per-Application Account of Bufferbloat: Causes and Impact on Users},
author = {Araldo, Andrea and Rossi, Dario},
booktitle = {5th International Workshop on TRaffic Analysis and Characterization (TRAC), Best paper award},
year = {2014},
address = {Nicosia, Cyprus},
month = aug,
note = {bestpaperaward},
howpublished = {https://nonsns.github.io/paper/rossi14trac.pdf}
}
We propose a methodology to gauge the extent of queueing delay (aka bufferbloat) in the Internet, based on purely passive measurement of TCP traffic. We implement our methodology in Tstat and make it available as open source software. We leverage Deep Packet Inspection (DPI) and behavioral classification of Tstat to breakdown the queueing delay across different applications, in order to evaluate the impact of bufferbloat on user experience. We show that there is no correlation between the ISP traffic load and the queueing delay, thus confirming that bufferbloat is related only to the traffic of each single user (or household). Finally, we use frequent itemset mining techniques to associate the amount of queueing delay seen by each host with the set of its active applications, with the goal of investigating the root cause of bufferbloat.
@inproceedings{DR:ITC-13,
title = {Modeling the interdependency of low-priority congestion control and active queue management},
author = {Gong, YiXi and Rossi, Dario and Leonardi, Emilio},
booktitle = {The 25th International Teletraffic Congress (ITC25), Runner-up for Best Paper Award},
year = {2013},
month = sep,
note = {bestpaperrunnerup},
howpublished = {https://nonsns.github.io/paper/rossi13itc.pdf}
}
Recently, a negative interplay has been shown to arise when scheduling/AQM techniques and low-priority conges- tion control protocols are used together: namely, AQM resets the relative level of priority among congestion control protocols. This work explores this issue by (i) studying a fluid model that describes system dynamics of heterogeneous congestion control protocols competing on a bottleneck link governed by AQM and (ii) proposing a system level solution able to reinstate priorities among protocols.
@inproceedings{DR:NTMS-12,
author = {},
title = {Adaptive Probabilistic Flooding for Multi-path Routing},
booktitle = {IFIP NTMS, Best paper award},
year = {2012},
pages = {1-6},
note = {bestpaperaward},
howpublished = {https://nonsns.github.io/paper/rossi12ntms.pdf}
}
In this work, we develop a distributed routing algorithm for topology discovery, suitable for ISP transport networks, that is however inspired by opportunistic algorithms used in ad hoc wireless networks. We propose a plug-and-play control plane, able to find multiple paths toward the same destination, and introduce a novel algorithm, called adaptive probabilistic flooding, to achieve this goal. By keeping a small amount of state in routers taking part in the discovery process, our technique significantly limits the amount of control messages exchanged with flooding \u2013 and, at the same time, it only minimally affects the quality of the discovered multiple path with respect to the optimal solution. Simple analytical bounds, confirmed by results gathered with extensive simulation on several topologies (up to 10,000 nodes), show our approach to be of high practical interest.