Loading…

Semi-Automated Protocol Disambiguation and Code Generation

For decades, Internet protocols have been specified using natural language. Given the ambiguity inherent in such text, it is not surprising that protocol implementations have long exhibited bugs. In this paper, we apply natural language processing (NLP) to effect semi-automated generation of protoco...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2021-02
Main Authors: Yen, Jane, Lévai, Tamás, Ye, Qinyuan, Ren, Xiang, Govindan, Ramesh, Raghavan, Barath
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:For decades, Internet protocols have been specified using natural language. Given the ambiguity inherent in such text, it is not surprising that protocol implementations have long exhibited bugs. In this paper, we apply natural language processing (NLP) to effect semi-automated generation of protocol implementations from specification text. Our system, SAGE, can uncover ambiguous or under-specified sentences in specifications; once these are clarified by the spec author, SAGE can generate protocol code automatically. Using SAGE, we discover 5 instances of ambiguity and 6 instances of under-specification in the ICMP RFC; after clarification, SAGE is able to automatically generate code that interoperates perfectly with Linux implementations. We show that SAGE generalizes to BFD, IGMP, and NTP. We also find that SAGE supports many of the conceptual components found in key protocols, suggesting that, with some additional machinery, SAGE may be able to generalize to TCP and BGP.
ISSN:2331-8422