BIDS: The Standard That Makes Brain Data Reproducible
The Most Expensive Filing Cabinet Problem in Science
Here's a story that should make you uncomfortable.
In 2015, a group of researchers at Stanford tried to reproduce the results of 100 published psychology studies. They followed the original methods as closely as possible, used the same statistical tests, ran the same protocols. The result? Only 36 of the 100 studies replicated successfully.
This became known as the "replication crisis," and it sent shockwaves through the scientific community. But here's what's less discussed: a huge chunk of that failure had nothing to do with bad science. It had to do with bad filing.
Researchers couldn't reproduce results because they couldn't access the original data. Or when they could, the data was organized in some ad-hoc folder structure that only made sense to the person who created it three years ago (and even they had forgotten what "final_FINAL_v2_CORRECTED" meant). Column headers were cryptic. File formats were inconsistent. Metadata was either missing or buried in a lab notebook somewhere.
The brain data was fine. The brain data was always fine. The problem was that nobody could find it, read it, or trust it.
This is the problem BIDS was built to solve.
What BIDS Actually Is (And Why It's Simpler Than You Think)
BIDS stands for Brain Imaging Data Structure. If that sounds intimidating, here's a more honest description: it's a set of rules for naming your files and organizing your folders.
That's it. It's not software. It's not a platform. It's not a file format. It's a convention, a community agreement about where files should go, what they should be called, and what information should accompany them. Think of it as the Dewey Decimal System for brain data.
The specification was first published in 2016 by a team led by Krzysztof Gorgolewski at Stanford. The original paper in Scientific Data laid out a simple insight: if every neuroscience lab organized their data the same way, everything downstream gets easier. Sharing gets easier. Analysis gets easier. Reproducibility goes from aspirational to automatic.
Before BIDS, a typical neuroimaging dataset might look like this:
my_experiment/
subject1/
run1_raw.eeg
run1_events.txt
notes.docx
subj_02/
EEG_session1_raw.bdf
triggers.csv
participant3/
data_final.set
README_important.txt
Three subjects, three different naming conventions, three different file formats, and zero chance that any automated tool could process all of them without custom code. Multiply this by every lab in the world and you start to see the scale of the problem.
Here's what the same dataset looks like in BIDS:
my_experiment/
dataset_description.json
participants.tsv
sub-01/
eeg/
sub-01_task-rest_eeg.edf
sub-01_task-rest_eeg.json
sub-01_task-rest_channels.tsv
sub-01_task-rest_events.tsv
sub-02/
eeg/
sub-02_task-rest_eeg.edf
sub-02_task-rest_eeg.json
sub-02_task-rest_channels.tsv
sub-02_task-rest_events.tsv
sub-03/
eeg/
sub-03_task-rest_eeg.edf
sub-03_task-rest_eeg.json
sub-03_task-rest_channels.tsv
sub-03_task-rest_events.tsv
Every subject follows the same pattern. Every file name tells you exactly what it contains. Every data file has a companion JSON sidecar with metadata. A researcher in Tokyo, a grad student in Berlin, and a Python script on a cloud server can all understand this dataset instantly.
That's the entire value proposition of BIDS. Predictability.
What Is the Anatomy of a BIDS Filename?
One of the most clever things about BIDS is its naming convention. Every filename is a chain of key-value pairs separated by underscores, with the data type as a suffix.
Take this filename: sub-01_ses-pre_task-nback_run-02_eeg.edf
Each piece communicates something specific:
| Key | Value | Meaning |
|---|---|---|
| sub | 01 | Subject number 01 |
| ses | pre | The pre-intervention session |
| task | nback | The N-back working memory task |
| run | 02 | The second run of this task |
| eeg | .edf | EEG data in European Data Format |
The beauty is that both humans and machines can parse this. A script can split on underscores, split each segment on hyphens, and immediately know the subject, session, task, and run. No guessing. No custom parsing. No hunting through README files to figure out what "experiment_3b_corrected" means.
The keys are standardized. You can't invent your own. This feels restrictive at first, but that restriction is the point. When everyone uses the same keys, every tool in the ecosystem can read every dataset.
The Sidecar Files: Where Metadata Lives
Raw data without context is just numbers. BIDS solves this with sidecar files, JSON and TSV files that travel alongside your data and describe everything the filename can't.
For every EEG recording, BIDS requires (or strongly recommends) several companion files:
The JSON sidecar (*_eeg.json) contains recording parameters: sampling frequency, reference electrode, power line frequency, hardware filters, the manufacturer of the device, and more. This is the metadata that makes your recording interpretable.
The channels file (*_channels.tsv) lists every channel by name, type (EEG, EOG, ECG, etc.), units, and sampling frequency. If a channel was marked as bad, that's recorded here too.
The events file (*_events.tsv) logs every event that occurred during the recording: stimulus onset, button presses, markers you dropped into the data. Each event gets an onset time, a duration, and a descriptive label.
The electrodes file (*_electrodes.tsv) records the 3D coordinates of each electrode. This is critical for source localization and for comparing data across different EEG systems.
The coordsystem file (*_coordsystem.json) specifies which coordinate system those electrode positions use (e.g., CTF, MNI, or a digitizer-specific system).
This might seem like a lot of files for one recording. But every single one solves a problem that has bitten researchers in practice. "What was the sampling rate?" Check the JSON. "Which channel was the reference?" JSON. "What happened at the 45-second mark?" Events TSV. "Where was electrode Cz positioned?" Electrodes TSV.
No more emailing the original researcher six months later asking, "Hey, do you remember what filter you used?"
BIDS for EEG: The Details That Matter
BIDS started with MRI in 2016, but EEG support was formally added in 2019 through the BIDS-EEG extension (BEP006). This was a big deal for the electrophysiology community, because EEG data has its own quirks that MRI conventions don't address.
Here's what the BIDS-EEG specification covers:
| Element | Requirement | Details |
|---|---|---|
| File formats | Required | BrainVision (.vhdr/.vmrk/.eeg), European Data Format (.edf), or BrainVision Core (.set) for raw data |
| Sampling frequency | Required | Recorded in the JSON sidecar in Hz |
| Reference electrode | Required | The electrode used as reference during recording |
| Power line frequency | Required | 50 Hz or 60 Hz depending on country |
| Channel descriptions | Required | TSV file listing all channels with type and units |
| Event markers | Recommended | TSV file with onset, duration, and trial_type columns |
| Electrode positions | Recommended | TSV file with x, y, z coordinates per electrode |
| Coordinate system | Conditional | Required if electrode positions are provided |
| Hardware filters | Recommended | High-pass and low-pass filter settings during acquisition |
| Software filters | Optional | Any post-acquisition filters applied to the data |
A few things are worth highlighting. First, BIDS doesn't force you to use a specific EEG file format, but it limits you to a few well-supported ones. This is intentional. Proprietary formats that require vendor-specific software to read defeat the purpose of a sharing standard.
Second, the metadata requirements for EEG are more detailed than for MRI in some ways. EEG analysis is extremely sensitive to reference electrode choice, filter settings, and electrode placement. Failing to document these can make a dataset essentially unusable for re-analysis. BIDS makes these fields required or strongly recommended precisely because they're the ones researchers most often forget to record.
Here's something most people don't realize about the reproducibility problem in EEG research: two identical recordings analyzed with different reference electrodes will produce completely different topographic maps. The same data, the same brain, the same moment in time, but fundamentally different results depending on a single metadata field. BIDS makes reference electrode documentation mandatory because this one omission has probably invalidated more EEG studies than any methodological error. It's not glamorous. But it might be the single most important field in the entire sidecar JSON.
The dataset_description.json: Your Dataset's ID Card
Every BIDS dataset has one file at the root level that acts as an identity card: dataset_description.json. This file tells anyone (human or machine) what they're looking at before they open a single data file.
Here's what a typical one looks like:
{
"Name": "Resting-State EEG During Focused Attention",
"BIDSVersion": "1.9.0",
"DatasetType": "raw",
"License": "CC-BY-4.0",
"Authors": [
"Jane Smith",
"John Doe"
],
"Acknowledgements": "Data collected at the Cognitive Neuroscience Lab",
"HowToAcknowledge": "Please cite Smith & Doe (2026)",
"DatasetDOI": "doi:10.18112/openneuro.ds004521"
}
The BIDSVersion field is particularly important. The BIDS specification evolves over time, adding new modalities and refining existing ones. By recording which version of BIDS your dataset follows, you ensure that validators and analysis tools know exactly which rules apply.
The participants.tsv: Who Was in Your Study
At the root level, BIDS also expects a participants.tsv file that records demographic and group information about each subject:
participant_id age sex group
sub-01 28 M control
sub-02 31 F experimental
sub-03 27 F control
This file is what connects the anonymized sub-XX labels in your folder structure to the demographic variables your analysis needs. And because it's a simple TSV (tab-separated values), any programming language, any spreadsheet application, and any human with eyes can read it.

Tools That Make BIDS Practical
The BIDS specification is just a document. It doesn't organize your data for you. But the ecosystem of tools built around it does. Here are the ones worth knowing:
BIDS Validator
The BIDS Validator checks whether your dataset actually conforms to the specification. It catches missing files, incorrect naming, invalid metadata values, and structural errors. You can run it three ways:
- Browser: Visit bids-standard.github.io/bids-validator and drag in your dataset folder. No installation needed.
- Command line: Install via npm (
npm install -g bids-validator) and runbids-validator /path/to/dataset. - Python: Use
bids-validatoras a library in your analysis scripts.
Always validate before sharing. It takes seconds and catches the errors that would take someone else hours to diagnose.
MNE-BIDS (Python)
MNE-BIDS is the bridge between MNE-Python (the most popular open-source EEG analysis library) and BIDS. It can convert raw EEG data into BIDS format, create all the required sidecar files automatically, and read BIDS datasets directly into MNE data structures.
For Neurosity Crown users working in Python, the typical workflow looks like this: export raw EEG data using the Neurosity Python SDK, load it into MNE-Python as a Raw object, then use mne_bids.write_raw_bids() to output a fully compliant BIDS dataset.
BIDS Starter Kit
The BIDS Starter Kit is a collection of templates, tutorials, and example datasets maintained by the BIDS community. If you're new to BIDS, this is where you should start. It includes template sidecar JSON files that you can fill in for your specific setup, which is far easier than writing them from scratch.
BIDS Apps
Here's where the standardization really pays off. BIDS Apps are containerized analysis pipelines that take a BIDS dataset as input and produce results as output. Because BIDS datasets all follow the same structure, these apps don't need custom configuration for each dataset.
| Tool | Purpose | Language |
|---|---|---|
| BIDS Validator | Checks dataset compliance with the BIDS specification | JavaScript / Browser |
| MNE-BIDS | Converts EEG data to BIDS format and reads BIDS into MNE | Python |
| BIDS Starter Kit | Templates and tutorials for building BIDS datasets | Documentation |
| HeuDiConv | Converts DICOM images to BIDS-formatted NIfTI | Python |
| BIDScoin | GUI-based BIDS conversion tool | Python |
| PyBIDS | Queries and indexes BIDS datasets programmatically | Python |
| OpenNeuro | Public repository requiring BIDS format for uploads | Web platform |
The existence of this ecosystem is the strongest argument for using BIDS. You're not just organizing files for neatness. You're plugging into a pipeline that hundreds of tools and thousands of researchers already speak fluently.
Why BIDS Matters More for EEG Than You Think
If you work with MRI data, BIDS is nice to have. If you work with EEG data, BIDS is arguably essential. Here's why.
MRI data is relatively self-describing. A NIfTI file header contains the voxel dimensions, the acquisition matrix, the orientation. You can open an MRI file and, with some effort, figure out what you're looking at even without metadata.
EEG data is not like that. A raw EEG file is a matrix of voltage values. Without knowing the sampling rate, the reference electrode, the electrode positions, the filter settings, and the event markers, that matrix is meaningless. It's just numbers. You can't even tell which direction is "up" in the data without knowing the montage.
This means EEG research has a particularly acute metadata problem. And BIDS addresses it head-on by making the most critical metadata fields required rather than optional. The specification essentially forces you to document the things that future-you (and other researchers) will desperately need to know.
The OpenNeuro Connection
OpenNeuro is a free, open-access repository for neuroscience data. It requires BIDS formatting for all uploads. As of 2026, it hosts thousands of datasets across MRI, EEG, MEG, and other modalities, all searchable, all downloadable, all structured identically.
This is the payoff of standardization at scale. Want to run your analysis pipeline on someone else's resting-state EEG dataset from the other side of the world? Download it from OpenNeuro, point your BIDS App at it, and go. No reformatting. No guessing. No detective work.
For the neuroscience community, OpenNeuro and BIDS together represent something profound: the infrastructure for cumulative science. Instead of every lab reinventing the wheel with its own data organization, everyone builds on the same foundation.
Structuring Neurosity Crown Data in BIDS
The Neurosity Crown exports raw EEG through its JavaScript SDK and Python SDK. The data comes out as time-stamped samples from 8 channels (CP3, C3, F5, PO3, PO4, F6, C4, CP4) at 256Hz. Converting this to BIDS involves a few steps.
Step 1: Export the raw data. Use the Neurosity SDK to record raw brainwave data and save it to a file. The JavaScript SDK provides brainwaves("raw") which streams all 8 channels at 256Hz. Save the timestamps and voltage values.
Step 2: Convert to an accepted format. BIDS-EEG accepts BrainVision, EDF, or BrainVision Core formats. MNE-Python can write to BrainVision format, so the easiest path is to load your Crown data into an MNE Raw object and export from there.
Step 3: Create the sidecar JSON. Document the sampling frequency (256), the reference (the Crown's reference electrode configuration), the power line frequency for your region, and the hardware information (Neurosity Crown, 8 channels).
Step 4: Create the channels TSV. List all 8 channels with their 10-20 positions, type (EEG), and units (microvolts).
Step 5: Add events. If your experiment involved stimuli or task markers, log these in the events TSV with onset times relative to the recording start.
Step 6: Organize the folder structure. Place everything in the BIDS hierarchy: sub-XX/eeg/ with properly named files.
Step 7: Validate. Run the BIDS Validator on your dataset. Fix any issues. Repeat until it passes.
The Crown's integration with BrainFlow and Lab Streaming Layer (LSL) also provides conversion pathways. BrainFlow can output data in formats that MNE-BIDS can directly ingest, which streamlines the process considerably.
The fact that the Crown provides open data access through standard APIs is what makes BIDS compatibility possible. A device that locks you into a proprietary ecosystem can't participate in open science. The Crown was designed with the opposite philosophy: your brain data belongs to you, and you should be able to take it anywhere.
Where BIDS Is Heading
The BIDS specification isn't static. The community maintains it through a public GitHub repository, and new extensions are proposed and reviewed through a formal process called BIDS Extension Proposals (BEPs).
Recent and in-progress extensions include support for fNIRS (functional near-infrared spectroscopy), motion capture data, eye tracking, and physiological recordings like heart rate and respiration. The trend is clear: BIDS is expanding from a neuroimaging standard to a comprehensive neuroscience data standard.
For EEG researchers specifically, the BIDS-EEG derivatives specification is one to watch. It will standardize how to organize processed EEG data, including filtered data, independent component analysis results, source-localized activity, and statistical outputs. Right now, BIDS only covers raw data formally. Once derivatives are standardized, the entire analysis chain from acquisition to publication will be BIDS-compliant.
There's also growing interest in BIDS for real-time and consumer-grade EEG data. As devices like the Neurosity Crown bring EEG out of the lab and into daily life, the amount of brain data being generated is exploding. BIDS provides the framework to make that data scientifically useful rather than just a pile of recordings on someone's hard drive.
The Bigger Picture: Data Organization as Scientific Infrastructure
BIDS might seem like a mundane topic. Folder structures and naming conventions don't have the glamour of a new brain imaging technique or a breakthrough cognitive model. But think about it this way.
The Human Genome Project didn't just sequence DNA. It created a shared format for storing and sharing genetic data. That format, and the tools built around it, made modern genomics possible. BIDS is attempting something similar for neuroscience: building the data infrastructure that makes cumulative brain science possible.
Every time a researcher organizes their EEG data in BIDS and uploads it to OpenNeuro, they're adding a brick to that infrastructure. Every BIDS App that gets built means one less custom script that someone else has to write. Every validation pass means one less dataset that's lost to the "I can't figure out your folder structure" problem.
For developers building with brain data, whether from the Neurosity Crown or any other EEG system, BIDS is worth learning not because it's required, but because it's the language the rest of the neuroscience community speaks. When your data speaks that language, it plugs into an entire ecosystem of tools, repositories, and collaborators.
And that ecosystem is growing fast.
The next time you're staring at a folder full of EEG recordings and wondering how to organize them, remember: someone already solved this problem. The solution has been peer-reviewed, battle-tested by thousands of labs, and supported by every major analysis tool in the field.
Your brain data deserves better than final_FINAL_v2_CORRECTED. Give it a structure that the entire scientific community can read.

