BabyLM Challenge
Submissions should be implemented in Huggingface's Transformers library. Participants can choose whatever model architecture they wish, as long as submissions can assign log-likelihoods or pseudo log-likelihoods to strings of text.
Submission Tracks
There are three competition categories: Interaction , which is new for this year's competition; multimodal , strict , and strict-small . We will additionally accept papers in a dedicated paper track
• Strict and Strict-Small Tracks: The strict and strict-small tracks require that submissions are trained on 100M words (for strict) or 10M words (for strict small) of text data. Unlike last year, participants can construct their own pretraining datasets, as long as they are under the 100M or 10M word budget. To help you get started, we release 100M and 10M word pretraining datasets, which are largely similar to the ones developed for last year's competition. Submissions in these tracks will be evaluated on language-only evaluation tasks.
• Multimodal Track: Submissions must be trained on 100M words or less, however, they can be trained on any amount of non-linguistic data. Submissions will be evaluated on language-only and language-vision tasks, meaning that successful entrants will likely pre-train on text-image data. To help you get started, we are releasing a 50M word text-only and 50M word paired text-image dataset.
• Interaction Track: The Interaction track debuts this year to allow for interaction between multiple agents during training. We will distinguish between a submission model, i.e., the participants' entry into the competition, and an external model, i.e., a secondary model used in the training pipeline of the submission model but not submitted to the competition. External models must come from a predetermined list of models available on the BabyLM website. External models may be fine-tuned or distilled without restriction. However, the submission model must be exposed to no more than 100M word tokens (multiple exposures allowed, e.g., epochs); this word count includes text generated by external models and pre-existing corpora. Additionally, the submission model may not generate more than 100M words during the training process. Finally, the external model's weights, hidden states, or output distribution cannot be revealed to the submission model.
• Paper Track: In this track, we will accept any contribution related to human-scale or data-efficient language modeling, or cognitive modeling using language models. While contributions can describe language modeling architectures, we also welcome new datasets and evaluation tasks.
Pretraining Data
[
Click here to access data (via OSF) ] To help you get started, we distribute pretraining datasets for the strict, strict-small and multimodal track. Note, however that, unlike last year, participants are free to construct their own datasets if they wish. Below, we give a few more details on the datasets we provide:
• Text-only Dataset: Our text-only dataset is an updated version of last year's BabyLM training corpus. It comes in 10M and 100M word variants, consists mostly of transcribed speech, and has a large proportion of simplified language, such as child-directed speech, childrens' storybooks, and simple Wikipedia.
• Multimodal Dataset: Our multimodal dataset consists of a 50M word down-sampled version of our text-only dataset, and 50M words of paired image-text data.
Evaluation Pipeline
Models will be evaluated on a shared pipeline, which will be released on GitHub in late April. The evaluation pipeline will come in two variants: a text-only evaluation, which is required for strict and strict-small track participants, and a vision-language variant required for multimodal track participants.
Results Submissions
The details for the submission of the results and the paper will be shared soon. In the meanwhile, checkout the timeline for tentative dates.
Paper Submissions
Along with their model submissions, everyone must submit a paper. This can be a short technical description of the proposed approach or a longer contribution, up to 8 pages.
Submissions will be made through our OpenRevew portal. Note that hyperparameters and decisions should be stated in the paper but also filled in a
form to assure same format and ease of future use
Submissions of both types are:
• given unlimited space for references,
• given unlimited space for appendices,
• given extra space for ethics/limitations, though these sections are optional
We allow dual submissions of archival papers. If an archival paper is accepted by both BabyLM and another venue, it can only appear in one of their proceedings (i.e., it must be withdrawn from one venue).
BabyLM will hold its own review process, and the proceedings will appear in their own volume. The acceptance criteria are based on soundness and fit: We plan only to reject submissions that make incorrect or unjustified claims or else are not related to the BabyLM topic. Other feedback will be directed toward improving submissions.
Outstanding Paper Awards
In addition to track winners, we will also award several "outstanding paper" awards. We intend to give these awards to submissions that are innovative or unusual or make novel and significant connections between language modeling and psycholinguistics research topics.