Live Football Commentary Dataset

Dataset Overview

LFC is a dataset of football commentary data from 40 matches. The dataset consists of 40 files, with each file corresponding to one match. Each file is text data in SRT format, with the data structure recording "utterance number, time information (start time, end time), and utterance content" in chronological order from the start to the end of the match. These matches are from the J-League 2021 season, and the match list is included in the readme.txt file (in Japanese).

Public Data

The dataset contains the following data:
  • Transcribed data of football commentary audio:
    • Commentary on 40 J-League 2021 season matches
    • SRT format: structured data including utterance numbers, time information (start time, end time), and utterance content
  • Match list included in the dataset's readme.txt file (in Japanese)

Important

  • Redistribution and commercial use are prohibited.
  • AIST is not responsible for any damages resulting from the use of the data.
  • When publishing research results using this dataset, please include the following paper in your citations.

    Taiga Someya, Tatsuya Ishigaki, and Hiroya Takamura, "Live Football Commentary (LFC): A Large‑Scale Dataset for Building Football Commentary Generation Models". In Proceedings of the 18th International Natural Language Generation Conference (INLG), 2025.

Data Release Information

Release Date: September 3, 2025

Contact: kirt-contact-ml@aist.go.jp

Data Usage Terms

  1. Purpose

    These terms set forth the conditions for using the Live Football Commentary Dataset (hereinafter referred to as "the Data"), in order to ensure its proper use for research purposes and appropriate management from the perspectives of privacy and ethics.

  2. Scope of Application and Consent
    1. The Data is provided to users who agree to these terms.
    2. When a user downloads the Data, it is deemed that they have agreed to the contents of these terms.
  3. Scope of Permission
    1. The Data may be used only for non-commercial academic research purposes.
    2. Commercial use of the Data (directly or indirectly for profit) is prohibited.
    3. Redistribution, publication, sale, or licensing of the Data to third parties is prohibited.
    4. If the Data is modified in whole or in part to create a new dataset, commercial use of the created dataset (directly or indirectly for profit) and redistribution, publication, sale, or licensing to third parties are prohibited. Other matters shall also be subject to the conditions of these terms.
  4. Citation and Credit

    Users must cite the following paper when publishing research results (papers, reports, presentations, etc.) using this dataset:

    Taiga Someya, Tatsuya Ishigaki, and Hiroya Takamura, "Live Football Commentary (LFC): A Large‑Scale Dataset for Building Football Commentary Generation Models". In Proceedings of the 18th International Natural Language Generation Conference (INLG), 2025.
  5. Data Management and Protection
    1. Users are responsible for properly managing the Data and ensuring that third parties cannot access it.
    2. If the Data contains personal or sensitive information, users must ensure confidentiality in accordance with applicable laws and regulations (e.g., the Act on the Protection of Personal Information, GDPR, etc.).
    3. Upon termination of use, users are responsible for properly deleting and disposing of the Data and any copies thereof.
  6. Prohibited Acts

    Users must not engage in the following acts:

    1. Acts that violate the conditions of these terms
    2. Conducting research that is ethically problematic using the Data
  7. Suspension of Use and Liability
    1. If the provider of the Data determines that a user has violated these terms, the provider may demand that the user cease use of the Data.
    2. Users shall be liable for damages caused by improper use of the Data.
  8. Disclaimer
    1. The provider does not guarantee the accuracy, completeness, or applicability of the Data.
    2. The provider assumes no responsibility for any damages arising from the use of the Data.
  9. Governing Law and Jurisdiction

    These terms shall be governed by the laws of Japan.

  10. Changes to the Terms

    The contents of these terms may be changed at the discretion of the provider. If changes are made, they will be posted on the website and will take effect from the time of posting.

Live Football Commentary Dataset Copyright © 2025 National Institute of Advanced Industrial Science and Technology (AIST) [2025PRO-3272]