BabyLM Challenge
Sample-efficient pretraining on a developmentally plausible corpus
Submissions should be implemented in Huggingface's Transformers library. Participants can choose whatever model architecture they wish, as long as submissions can assign log-likelihoods or pseudo log-likelihoods to strings of text.
Submission Tracks
There are three competition categories: multimodal , which is new for this year's competition, strict , and strict-small . We will additionally accept papers in a dedicated paper track
• Strict and Strict-Small Tracks: The strict and strict-small tracks require that submissions are trained on 100M words (for strict) or 10M words (for strict small) of text data. Unlike last year, participants can construct their own pretraining datasets, as long as they are under the 100M or 10M word budget. To help you get started, we release 100M and 10M word pretraining datasets, which are largely similar to the ones developed for last year's competition. Submissions in these tracks will be evaluated on language-only evaluation tasks.
• Multimodal Track: Submissions must be trained on 100M words or less, however they can be trained on any amount of non-linguistic data. Submissions will be evaluated on language-only and language-vision tasks, meaning that successful entrants will likely pretrain on text-image data. To help you get started, we are releasing a 50M word text-only and 50M word paired text-image dataset.
• Paper Track: In this track, we will accept any contribution related to human-scale or data-efficient language modeling, or cognitive modeling using language models. While contributions can describe language modeling architectures, we also welcome new datasets and evaluation tasks.
Pretraining Data
[ Click here to access data (via OSF) ] To help you get started, we distribute pretraining datasets for the strict, strict-small and multimodal track. Note, however that, unlike last year, participants are free to construct their own datasets if they wish. Below, we give a few more details on the datasets we provide:
• Text-only Dataset: Our text-only dataset is an updated version of last year's BabyLM training corpus. It comes in 10M and 100M word variants, consists mostly of transcribed speech, and has a large proportion of simplified language, such as child-directed speech, childrens' storybooks, and simple Wikipedia.
• Multimodal Dataset: Our multimodal dataset consists of a 50M word down-sampled version of our text-only dataset, and 50M words of paired image-text data.
See the updated call for papers for a detailed breakdown of the pretraining datasets.
Evaluation Pipeline
Models will be evaluated on a shared pipeline, which will be released on Github in late April. The evaluation pipeline will come in two variants, a text-only evaluation, which is required for strict and strict-small track participants, and a vision-language varient that is required for multimodal track participants.
Results Submissions
The deadline for results submissions is September 16, 23:59 anywhere on earth (UTC-12).
Submissions must be made through OpenRevew . To fill out the submission, please prepare these two things: A HuggingFace link to your models. A download link to your results, assembled via the `collect_results.py` script in babylm/evaluation-pipeline-2024 .
Paper Submissions
Along with their model submissions, everyone must submit a paper. This can be a short technical description of the proposed approach, or a longer contribution, up to 8 pages. The deadline for paper submissions is September 20, 23:59 anywhere on earth (UTC-12).
Submissions will be made through our OpenRevew portal. Note that hyperparameters and decisions should be stated in the paper but also filled in a form to assure same format and ease of future use
Submissions of both types are:
given unlimited space for references,
given unlimited space for appendices,
given extra space for ethics/limitations, though these sections are optional
We allow dual submissions of archival papers. In the event that an archival paper is accepted to both BabyLM and another venue, it can only appear in one of their proceedings (i.e., it must be withdrawn from one venue).
BabyLM will hold its own review process, and the proceedings will appear in their own volume. The acceptance criteria are based on soundness and fit: We plan only to reject submissions that make incorrect or unjustified claims, or else are not related to the BabyLM topic. Other feedback will be directed at the improvement of submissions.
Outstanding Paper Awards
In addition to track winners, we will also award several "outstanding paper" awards. We intend to give these awards to submissions that are innovative or unusual, or make novel and significant connections between language modeling and psycholinguistics research topics.