XLLM ACL 2025 Shared Task-II: Speech Event Extraction

Introduction

We propose the Speech Event Extraction (SpeechEE) challenge on the XLLM platform. Unlike traditional textual event extraction, SpeechEE aims to detect event triggers and arguments directly from audio speech, even when transcripts are unavailable. The SpeechEE is defined as: Given a speech audio input consisting of a sequence of acoustic frames, the goal is to extract structured event records comprising four elements: 1) the event type, 2) the event trigger, 3) event argument roles, and 4) the corresponding event arguments.

The challenge contains three subtasks, from easy to hard.

Challenge Task Definition and Metrtics

Task 1: Event Detection.

Task-1: Event Detection aims to identify event trigger words and classify them into corresponding event types. Participants are required to develop models that can accurately extract the event triggers. A trigger is considered correctly extracted if both the trigger mention and the event type match the reference. The evaluation metric for this sub-task is the F1 score.

Task 2: Event Argument Exraction.

Task-2: Event Argument Exraction aims to the identify the event argument and classify the argument role. An argument is considered correctly extracted if its event type, argument role, and argument mention all match the reference. The evaluation metric for this sub-task is the F1 score.

Task 3: Event Quadruple Extraction.

Task-3: Event Quadruple Extraction aims to extract the complete event record quadruple including event trigger, event type, argument role and argument mention. The evaluation metric for this sub-task is the F1 score.

Finally, we use the following score for ranking:

overall score=0.3s₁+0.3s₂+0.4s₃

We provide python scripts for evaluation, please refer to the baseline codes "/challenge/scoring.py".

Dataset

The dataset for this challenge is derived from ACE2005-EN+. It is a benchmark dataset for event extraction in the English language. It extended the original ACE05-EN data by considering multi-token event triggers and pronoun roles. This dataset contains 33 event types and 22 argument roles. The detailed information is recorded in the baseline codes "/challenge/event-schema.json".

Example of a sample:

{"id": "train-3", "event": [{"trigger": "landed", "type": "Transport", "arguments": [{"name": "boat", "role": "Vehicle"}, {"name": "men", "role": "Artifact"}, {"name": "shores", "role": "Destination"}]}]}

The datasets (audio) are avaliable on Google Drive and the label json files are in the baseline codes "/challenge/data/ACE05EN".

Submission

Please sumbit predicted results with a json files "results.json".

[
{
    "id": "test-0",
    "event": [{
        "trigger": "advance",
        "type": "Transport",
        "arguments": [{
            "name": "elements",
            "role": "Artifact"
        }, {
            "name": "city",
            "role": "Origin"
        }, {
            "name": "Baghdad",
            "role": "Destination"
        }]
    }]
},
{
    "id": "test-1",
    "event": [{
       ...
    }]
},
...
]

Participants can submit at Codabench. We will review the submissions publish the ranking here. Feel free to contact with us if you have any questions ( fir430179@gmail.com )

Baseline

Link to the code https://github.com/SpeechEE/SpeechEE

References

[1] Wang, B., Zhang, M., Fei, H., Zhao, Y., Li, B., Wu, S., ... & Zhang, M. (2024, October). SpeechEE: A Novel Benchmark for Speech Event Extraction. In Proceedings of the 32nd ACM International Conference on Multimedia (pp. 10449-10458).