Loading…
Multimodal modeling of collaborative problem-solving facets in triads
Collaborative problem-solving (CPS) is ubiquitous in everyday life, including work, family, leisure activities, etc. With collaborations increasingly occurring remotely, next-generation collaborative interfaces could enhance CPS processes and outcomes with dynamic interventions or by generating feed...
Saved in:
Published in: | User modeling and user-adapted interaction 2021-09, Vol.31 (4), p.713-751 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Collaborative problem-solving (CPS) is ubiquitous in everyday life, including work, family, leisure activities, etc. With collaborations increasingly occurring remotely, next-generation collaborative interfaces could enhance CPS processes and outcomes with dynamic interventions or by generating feedback for after-action reviews. Automatic modeling of CPS processes (called facets here) is a precursor to this goal. Accordingly, we build automated detectors of three critical CPS facets—construction of shared knowledge, negotiation and coordination, and maintaining team function—derived from a validated CPS framework. We used data of 32 triads who collaborated via a commercial videoconferencing software, to solve challenging problems in a visual programming task. We generated transcripts of 11,163 utterances using automatic speech recognition, which were then coded by trained humans for evidence of the three CPS facets. We used both standard and deep sequential learning classifiers to model the human-coded facets from linguistic, task context, facial expressions, and acoustic–prosodic features in a team-independent fashion. We found that models relying on nonverbal signals yielded above-chance accuracies (area under the receiver operating characteristic curve, AUROC) ranging from .53 to .83, with increases in model accuracy when language information was included (AUROCS from .72 to .86). There were no advantages of deep sequential learning methods over standard classifiers. Overall, Random Forest classifiers using language and task context features performed best, achieving AUROC scores of .86, .78, and .79 for construction of shared knowledge, negotiation/coordination, and maintaining team function, respectively. We discuss application of our work to real-time systems that assess CPS and intervene to improve CPS outcomes. |
---|---|
ISSN: | 0924-1868 1573-1391 |
DOI: | 10.1007/s11257-021-09290-y |