Research Activity
 
General Statement
I address three professional fields:
 
     Applied Research in Textual Information Access and Knowledge Management
    Design and Engineering of Natural Language Processing Systems
     Project Management of Advanced IT and AI Projects
 
 
Areas of Interest
I currently investigate research topics in:
 
     Automatic Natural Language Understanding
     Frame Semantics, Shallow Semantic Parsing, and Semantic Role Labeling
     Machine Learning Approaches to Natural Language Processing
     Question Answering
     Textual Entailment and Paraphrasing
 
 
Research Directions
Area descriptions and personal contributions
 
Shallow Semantic Parsing
Work in this area attempts the conceptual analysis of natural language, oriented to general understanding of text meaning. The analysis is performed assigning semantic labels from a predefined set to the bare sentence constituents. Popular inspiring models, which also provide large annotated text collections, are PropBank and FrameNet. The specialized task of Semantic Role Labeling is usually defined on PropBank semantic roles, and it raised relevant interest in international competitions (CoNLL 2004 and 2005). On Conversely, FrameNet provides a more complex model based on the notion of Frame, i.e. the abstract representation of a specific event, situation, or property (e.g. "Commerce Scenario"). Each Frame defines its local set of participants (e.g. “Buyer”, “Goods”, “Money”, “Seller”). FrameNet-based Shallow Semantic Parsing is very recent, and a first international evaluation task was organized in SemEval 2007. My first works in this area contributed to experiments in feature engineering and assessment of Tree Kernels, and eventually led to a well ranked system at CoNLL 2005. More recently, I focused on the more challenging FrameNet-based analysis, making it the topic of my forthcoming Ph.D. dissertation (see section below).
 
Question Answering
Question Answering (QA) is the Natural Language Processing (NLP) task of automatically answering user's questions expressed in natural language. In the traditional task definition (TREC, CLEF), a QA system looks for answers into a collection of plain text documents as agency news or newspaper columns, and it provides short answers back. FBK-irst developed in the past a successful QA system ("Diogene") which participated in international competitions. I have been supporting its development in generating text pattern libraries (Master Thesis), performing error analysis, and designing architecture re-engineering (WebFAQ Project). More recently, I have been working on the QALL-ME European Project concerning ontology based QA on mobile devices for tourism and entertainment. I designed and developed the QALL-ME core functions for ontology population from multiple information sources, and for web-based ontology querying in SPARQL, exploiting the Jena Semantic Web Framework.
 
Textual Entailment and Inference
Recognizing Textual Entailment (TE) is the task of automatically assessing whether or not, given two general text fragments T and H, the meaning of T entails the meaning of H (which stands for “hypothesis”). For example, entailment holds between T="Aspirin reduces the risk of heart attack", and H="Aspirin prevents heart attack". T and H may differ for a wide range of lexical and syntactic variations, still preserving entailment. TE provides a general framework for the evaluation of NLP techniques addressing language variability. Successful evaluation campaigns (RTE Challenges 2005, 2006, and 2007) have been recently organized. I first contributed to TE research addressing the problem of automatically collecting on the Web large collections of abstract entailment patterns, used to manage TE examples as the above one. I also focused on the problem of producing strict guidelines for TE evaluation and for the development of reference data sets. At a later time, when visiting ISI, I realized experiments for automatically learning selectional preferences over the terms appearing in entailment patterns, thus improving precision.
 
Spoken Dialog Systems
I approached this area very recently, jointly with the AMI2 Lab at the University of Trento. Our goal is to show the capability of Shallow Semantic Parsing (see above description) to improve the state of the art in real-world conversational systems. In fact, current Spoken Language Understanding technology is based on simple concept annotation of word sequences, where the interdependencies between concepts and their compositional semantics are generally neglected. This prevents an effective handling of language phenomena, with a consequent limitation on the design of complex Spoken Dialog systems. Therefore, we aim at integrating and evaluating the promising Shallow Semantic Parsing framework in current Spoken Dialog models. I performed preliminary experiments exploiting a reference corpus of domain-specific Italian dialogs developed by the LUNA Project. The achieved results (SLT 2008, see Publications below) showed the feasibility and the effectiveness of the approach. Subsequent work is now in progress.
 
Engineering of NLP Architectures
Effective, applied research must commit to show its impact very early. In this spirit, I consider engineering of real-world NLP systems as a very relevant part of my activity. Being NLP very young, robust analytical models still completely miss. This has an heavy impact on the design of software system architectures, which must rely on always improving empirical methods. Thus, critical architectural features must be assured. To mention a few, modularity (to support the quick embedding of novel available subsystems), scalability (to support the continuous growth of data driven models), distributedness (to allow parallel processing of large data sets, and the exploitation of specialized remote services). A special role in NLP is played by error analysis: it constitutes the most valuable feedback from a running system to its originating research, and it is only effective when relying on a sound system design. These principles lay down my design priorities whenever an experimental workbench is worth being developed into a functional prototype.
 
 
PhD Dissertation - forthcoming
 
My Doctoral Dissertation is focused on the application of Frame Semantics in Textual Information Processing. It aims at three main goals: 1) realize automatic, robust, multi-language FrameNet-based Shallow Semantic Parsing (see dedicated section) of free text, 2) improve the state-of-the-art in this area, and 3) apply such achievements in enhancing real world, end-user NLP applications.
The foundations of the work rely on past research in Semantic Role Labeling (SRL), where I give special attention to Machine Learning techniques for the processing of structured objects. Namely, I intensively exploit the consolidated joint framework of Support Vector Machines with Tree Kernels for the treatment of syntactic trees. On this concern, my thesis leverages excellent SRL work by Alessandro Moschitti and Daniele Pighin. My novel effort is in porting state-of-the-art PropBank-based SRL to the far more expressive framework of Frame Semantics. In this perspective, I refer to the valuable lexical-semantic resource made available by the FrameNet Project at University of Berkeley. FrameNet includes definitions for more than 800 semantic frames (e.g. “Commerce Scenario” and “Killing”), and more than 4000 local, frame-based participants (e.g. “Buyer”, “Goods”, “Seller” for the former, and “Killer”, “Victim”, “Instrument”) for the latter.
Automatic semantic annotation in such a rich environment requires careful architectural design (see dedicated section above) to manage the complexity of underlying machine learning models. My system implementation pays special attention to flexibility, while keeping computation time controlled. The resulting features include multi-language and multi-domain portability, transparent parallel processing, and a mechanism of selective information sharing, which e.g. enables the sharing of semantic roles across different frame models.
Extensive, yet preliminary evaluation has been carried out. The system has been checked against the natural FrameNet benchmark, and against a corpus of Italian Spoken Dialog transcriptions developed by the LUNA Project. Current results are published in SLT 2008 and ICDM 2008 (see Publications below). Work in progress includes the integration of Frame Semantics into Spoken Dialog models, and later the application to Question Answering and Information Retrieval.
 
 
Research Projects
I have been working for the following projects:
 
   LUNA - European Union, IST-FP6-33549
    Spoken Language Understanding in Multilingual Communication Systems
 
   QALL-ME - European Union, IST-FP6-33860
    Question Answering Learning Technologies in a Multilingual
    and Multimodal Environment
 
   OntoText - Provincia Autonoma di Trento
    From Text to Knowledge for the Semantic Web
 
   Dot.Kom - European Union, IST-FP5-34038
    From Text to Knowledge for the Semantic Web
 
   WebFAQ - Provincia Autonoma di Trento
    Web Flexible Access and Quality
 
 
My pleasure co-authoring with
 
Roberto Basili (University of Rome “Tor Vergata”); Rahul Bhagat (University of Southern California); Elena Cabrio (FBK-irst, Trento); Timothy Chklovski (Structured Commons Inc., former USC-ISI); Danilo Croce (University of Rome “Tor Vergata”); Ido Dagan (Bar-Ilan University, Tel-Aviv); Diego De Cao (University of Rome “Tor Vergata”); Aldo Gangemi (STLab, ISTC-CNR, Rome); Ana-Maria Giuglea (Accenture); Alfio Gliozzo (STLab, ISTC-CNR, and Reinvent Technology, Vancouver); Roberto Gretter (FBK-irst, Trento); Eduard Hovy (University of Southern California); Milen Kouylekov (FBK-irst, Trento); Bernardo Magnini (FBK-irst, Trento); Alessandro Moschitti (University of Trento); Matteo Negri (FBK-irst, Trento); Patrick Pantel (Yahoo! Research, former USC-ISI); Davide Picca (Université de Lausanne, Switzerland); Daniele Pighin (FBK-irst, Trento); Valentina Presutti (STLab, ISTC-CNR, Rome); Giuseppe Riccardi (University of Trento); Idan Szpektor (Bar-Ilan University, Tel-Aviv); Hristo Tanev (Joint Research Center, European Union); Sara Tonelli (FBK-irst and Ca’ Foscari University, Venezia);
 
 
Publications
 
Bonaventura Coppola, Alessandro Moschitti, Giuseppe Riccardi
Shallow Semantic Parsing for Spoken Language Understanding
In Proceedings of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies Conference, short paper (NAACL HLT 2009).
Boulder, Colorado.   BibTeX --forthcoming.
 
Bonaventura Coppola, Aldo Gangemi, Alfio Gliozzo, Davide Picca, Valentina Presutti
Frame Detection over the Semantic Web
In Proceedings of the 6th European Semantic Web Conference (ESWC 2009).
Heraklion, Greece.  BibTeX --forthcoming.
 
Roberto Basili, Diego De Cao, Danilo Croce, Bonaventura Coppola, Alessandro Moschitti
Cross-Language Frame Semantics Transfer in Parallel Corpora
In Proceedings of the 10th International Conference on Intellligent Text Processing
and Computational Linguistics (CICLing 2009), Best Paper Award.
Mexico City, Mexico.  BibTeX
 
Bonaventura Coppola, Alessandro Moschitti, Daniele Pighin
Generalized Framework for Syntax-based Relation Mining
In Proceedings of the IEEE International Conference on Data Mining (ICDM 2008).
Pisa, Italy.  BibTeX
 
Bonaventura Coppola, Alessandro Moschitti, Sara Tonelli, Giuseppe Riccardi
Automatic FrameNet-based Annotation of Conversational Speech
In Proceedings of the IEEE Workshop on Spoken Language Technology (SLT 2008).
Goa, India.  BibTeX
 
E. Cabrio, B. Coppola, R. Gretter, M. Kouylekov, B. Magnini, M. Negri
Question Answering Based Annotation for a Corpus of Spoken Requests
In Proceedings of the Workshop on the Semantic Representation of
Spoken Language (SRSL7). Salamanca, Spain.  BibTeX
 
Patrick Pantel, Rahul Bhagat, Bonaventura Coppola, Timothy Chklovski, Eduard Hovy
ISP: Learning Inferential Selectional Preferences
In Proceedings of North American Association for Computational Linguistics / Human Language Technology (NAACL HLT 2007). Rochester, NY.  BibTeX
 
Milen Kouylekov, Matteo Negri, Bernardo Magnini, Bonaventura Coppola
FBK-irst at CLEF 2007
In Proceedings of the Cross Language Evaluation Forum (CLEF 2007).
Budapest, Hungary.  BibTeX
 
Matteo Negri, Milen Kouylekov, Bernardo Magnini, Bonaventura Coppola
Reconstructing DIOGENE: ITC-irst at TREC 2006
In Proceedings of the Fifteenth Text Retrieval Conference (TREC 2006).
Gaithersburg, MD.  BibTeX
 
Milen Kouylekov, Matteo Negri, Bernardo Magnini, Bonaventura Coppola
Towards Entailment-based Question Answering: ITC-irst at CLEF 2006
In Working Notes for the Cross Language Evaluation Forum Workshop (CLEF 2006).
Alicante, Spain.   BibTeX
 
Alessandro Moschitti, Bonaventura Coppola, Daniele Pighin, Roberto Basili
Semantic Tree Kernels to classify Predicate Argument Structures
In Proceedings of 17th European Conference on Artificial Intelligence (ECAI 2006).
Riva del Garda, Italy.   BibTeX
 
Alessandro Moschitti, Bonaventura Coppola, Daniele Pighin, Roberto Basili
Engineering of Syntactic Features for Shallow Semantic Parsing
In Proceedings of the ACL05 Workshop on Feature Engineering for Machine Learning
in Natural Language Processing, 2005. Ann Arbor, USA.  BibTeX
 
Alessandro Moschitti, Ana-Maria Giuglea, Bonaventura Coppola, Roberto Basili
Hierarchical Semantic Role Labeling
In Proceedings of the 9th Conference on Computational Natural Language Learning - Shared Task (CoNLL 2005). Ann Arbor, USA.  BibTeX
 
Idan Szpektor, Hristo Tanev, Ido Dagan, Bonaventura Coppola
Scaling Web-based Acquisition of Entailment Relations
In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pages 41--48, Barcelona, Spain.  BibTeX
 
H. Tanev, M. Kouylekov, M. Negri, B. Coppola, B. Magnini
Multilingual Pattern Libraries for Question Answering: a Case Study for
Definition Questions
In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004), volume VI, pages 1935--1938, Lisboa, Portugal.  BibTeX
 
 
University Dissertations
 
Master of Science Thesis (Laurea Specialistica)
Apprendimento Automatico di Pattern di Risposta per Sistemi di Question Answering
(Automatic Answer Pattern Acquisition for Question Answering Systems)
University of Rome “Tor Vergata”, Faculty of Engineering, 2003.
 
Bachelor of Science Thesis (Laurea)
Alberi di decisione per l’estrazione automatica di terminologia da testi
(Decision Trees for Automatic Terminology Extraction from Texts)
University of Rome “Tor Vergata”, Faculty of Engineering, 2001.