Guide

Machine Learning for EEG Classification

By AJ Keller, CEO at Neurosity • February 2026

ML classification turns raw brainwave data into meaningful labels like focused, relaxed, or imagining movement. It works by extracting features from EEG signals and training algorithms to recognize patterns no human could hand-code.

Every brain-computer interface that does something useful has a classifier at its heart. Whether it detects focus, identifies motor imagery, or flags seizures, the core task is the same: take a window of EEG data and assign it a label. This guide walks you through the full pipeline, from feature extraction to model evaluation, with honest coverage of the algorithms that work, the ones that don't, and the pitfalls that trap nearly every newcomer.

Explore the Crown

The brain-computer interface built for developers

Your Brain Is a Terrible Communicator

Here's a fact that should bother you more than it probably does. Right now, as you read this sentence, your brain is producing a symphony of electrical signals across roughly 86 billion neurons. Those signals contain information about your level of focus, your emotional state, whether you're about to lose interest in this paragraph, and about a thousand other things.

And we can record those signals. We've been able to since 1929. Stick electrodes on someone's scalp, amplify the voltage, and you get EEG: a real-time readout of your brain's electrical activity.

But here's the problem. What you get looks like this: a wall of squiggly lines. Channels of data undulating at different frequencies, overlapping and interfering with each other, contaminated by eye blinks and jaw clenches and the 60Hz hum of the power outlet across the room. Somewhere in that mess is the signal that tells you whether this person is focused or daydreaming, calm or anxious, imagining moving their left hand or their right.

Finding that signal by staring at the data? Impossible. Finding it by writing simple rules? Sometimes. But for anything beyond the most basic brain states, the patterns are too complex, too variable, too buried in noise for a human to describe explicitly.

This is why machine learning exists in this field. Not as a buzzword. Not as an upgrade. As the only viable path from "raw voltage fluctuations" to "this person is focused."

And if you're a developer who wants to build anything that responds to brain data, understanding how this classification works isn't optional. It's the entire game.

The Pipeline: From Skull to Label

Before we talk about specific algorithms, you need the mental model. Every EEG classification system, from a PhD student's MATLAB script to the Neurosity Crown's N3 chipset, follows the same basic pipeline. Understanding this pipeline is the trunk of the tree. Everything else is branches.

Step 1: Record the EEG. Electrodes on the scalp pick up voltage fluctuations. The Crown uses 8 channels at positions CP3, C3, F5, PO3, PO4, F6, C4, and CP4, sampling 256 times per second. Each sample is a vector of 8 numbers representing the voltage at each electrode. Over one second, that's 2,048 numbers.

Step 2: Preprocess. The raw signal is messy. You filter out frequencies you don't care about (typically bandpass to 1-50 Hz). You remove artifacts from eye blinks and muscle movements. You might re-reference the channels. Preprocessing doesn't add information, but it removes noise that would confuse everything downstream.

Step 3: Extract features. This is where the magic starts. You take a window of preprocessed EEG (say, 2 seconds of data) and compute numbers that describe what's happening in that window. Power in the alpha band. Ratio of theta to beta. Coherence between two channels. These numbers are your features, and they compress 4,096 raw data points into maybe 20-50 meaningful measurements.

Step 4: Classify. Feed those features into a machine learning algorithm that's been trained to map feature vectors to labels. "Focused." "Relaxed." "Left hand motor imagery." The algorithm outputs a prediction.

Step 5: Use the prediction. Trigger a notification, adjust music tempo, move a cursor on screen, or log data for later analysis.

That's it. Five steps. Each one matters enormously, and screwing up any single step can make the whole pipeline useless. But the two steps that determine whether your system actually works are 3 and 4. Feature extraction and classification. Let's go deep on both.

Feature Extraction: Teaching Your Model What to Look At

Raw EEG data is terrible input for a classifier. Not because it lacks information, but because it has too much, spread across too many dimensions, buried under too much noise. A 2-second window from 8 channels at 256Hz gives you 4,096 numbers. Most of those numbers are redundant. Many are noise. A few contain the signal you care about.

Feature extraction is the art of boiling those 4,096 numbers down to the 20-50 that matter.

Frequency-Domain Features

The most common and most reliable EEG features are spectral. You decompose each channel's signal into frequency components (usually with a Fast Fourier Transform or Welch's method) and compute the power in standard frequency bands.

Band	Frequency Range	Associated States	Common Use in Classification
Delta	0.5-4 Hz	Deep sleep, unconsciousness	Sleep staging, anesthesia depth
Theta	4-8 Hz	Drowsiness, memory encoding, meditation	Attention monitoring, meditation detection
Alpha	8-13 Hz	Relaxed wakefulness, eyes closed, inhibition	Relaxation detection, workload estimation
Beta	13-30 Hz	Active thinking, focus, motor planning	Focus detection, motor imagery
Gamma	30-50 Hz	Cross-modal integration, higher cognition	Cognitive load, binding/perception tasks

Band

Delta

Frequency Range

0.5-4 Hz

Associated States

Deep sleep, unconsciousness

Common Use in Classification

Sleep staging, anesthesia depth

Band

Theta

Frequency Range

4-8 Hz

Associated States

Drowsiness, memory encoding, meditation

Common Use in Classification

Attention monitoring, meditation detection

Band

Alpha

Frequency Range

8-13 Hz

Associated States

Relaxed wakefulness, eyes closed, inhibition

Common Use in Classification

Relaxation detection, workload estimation

Band

Beta

Frequency Range

13-30 Hz

Associated States

Active thinking, focus, motor planning

Common Use in Classification

Focus detection, motor imagery

Band

Gamma

Frequency Range

30-50 Hz

Associated States

Cross-modal integration, higher cognition

Common Use in Classification

Cognitive load, binding/perception tasks

For an 8-channel device, computing band power across all 5 bands gives you 40 features. Add ratios like theta/beta (an attention marker) and alpha asymmetry between hemispheres (an emotional valence marker), and you're up to 50-60 features. That's already enough to build surprisingly good classifiers for many tasks.

Time-Domain Features

Sometimes the shape of the waveform itself carries information. Common time-domain features include:

Hjorth parameters: Activity (variance of the signal), mobility (mean frequency), and complexity (change in frequency). Three numbers per channel that capture the signal's statistical character. Fast to compute. Surprisingly informative.
Zero-crossing rate: How often the signal crosses zero. Correlates loosely with dominant frequency but captures something slightly different.
Signal statistics: Mean, variance, skewness, kurtosis. Simple but effective, especially kurtosis, which captures how "spiky" the signal is.

Connectivity Features

Here's where things get interesting. The features above treat each channel independently. But the brain doesn't work in isolation. Different regions communicate, synchronize, and desynchronize. Connectivity features capture these relationships.

Coherence measures how correlated two channels are at a specific frequency. High alpha coherence between frontal and parietal sites might indicate a sustained attention state. Phase-locking value captures whether two channels maintain a consistent phase relationship, even if their amplitudes differ. Granger causality estimates directional information flow: is channel C3 driving activity at C4, or vice versa?

Connectivity features are computationally expensive and you need enough channels for them to be meaningful, but they capture information that per-channel features completely miss. On an 8-channel device like the Crown, you have 28 unique channel pairs, each of which can produce coherence values across 5 frequency bands. That's 140 additional features if you want them.

The Feature Extraction Trap

More features is not always better. With a small dataset (which EEG datasets almost always are), adding too many features actually hurts classification accuracy. This is called the "curse of dimensionality." If you have 200 features but only 100 training examples, your classifier will memorize noise instead of learning real patterns. A good rule of thumb: keep your feature count well below your number of training examples. If you have 500 labeled windows of EEG, start with 20-30 features and add more only if they demonstrably help.

The Classifiers: Four Algorithms That Actually Work

There are dozens of ML algorithms you could throw at EEG features. But in practice, four dominate the literature and for good reason. Each has a specific strength that maps to a specific EEG challenge.

Linear Discriminant Analysis (LDA)

LDA is the workhorse of BCI classification, and it has been since the 1990s. The idea is elegant: find the linear combination of features that best separates two classes. If you're classifying "focused" vs "relaxed," LDA finds the axis in feature space where the two states are maximally spread apart and minimally spread within each group.

Why does LDA dominate BCI? Three reasons. First, it's fast. On embedded hardware, LDA classification takes microseconds. Second, it has very few parameters to tune, which means it's hard to overfit (more on that later). Third, it works surprisingly well with small datasets because it makes a strong assumption (equal covariance matrices for each class) that acts as a built-in regularizer.

The downside? LDA can only draw straight lines between classes. If the true boundary between "focused" and "relaxed" in feature space is curved or wiggly, LDA will miss it. For many EEG tasks, the linear assumption holds well enough. For complex multi-class problems, it starts to struggle.

Support Vector Machines (SVM)

SVMs are what you reach for when LDA's linear boundary isn't enough. An SVM finds the hyperplane that separates two classes with the maximum margin, the widest possible gap between the nearest data points of each class. But the real power comes from the kernel trick: by projecting your features into a higher-dimensional space (without actually computing the projection, which is the trick part), SVMs can learn nonlinear decision boundaries.

For EEG, the radial basis function (RBF) kernel is the go-to. It lets the SVM carve out curved, flexible boundaries in feature space. The cost is two hyperparameters (C and gamma) that need tuning, which means you need proper cross-validation (seriously, we'll get there).

SVMs with RBF kernels consistently rank among the top classifiers in BCI competitions. They're particularly strong for motor imagery classification, where the boundaries between "imagine moving left hand" and "imagine moving right hand" in feature space aren't cleanly linear.

Random Forest

A Random Forest is an ensemble of decision trees, typically hundreds of them, each trained on a random subset of your data and a random subset of your features. To classify a new sample, every tree votes, and the majority wins.

Random Forests have a property that makes them especially attractive for EEG work: they're resistant to noisy features. If 30 of your 50 features are garbage, a Random Forest will naturally figure this out and lean on the 20 that matter. The trees that happen to use informative features will agree with each other. The trees that use noise will disagree. The consensus filters out the junk.

They also give you feature importance scores for free. After training, you can ask "which features contributed most to classification?" This is invaluable for EEG, where knowing that theta/beta ratio matters more than gamma power for your specific task helps you understand the neuroscience, not just the accuracy.

k-Nearest Neighbors (kNN)

kNN is the simplest classifier here and the most underrated for certain EEG tasks. To classify a new sample, kNN finds the k most similar samples in the training set and assigns the majority label. That's it. No training phase, no model parameters, no assumptions about the data distribution.

kNN works well for EEG when you have a relatively small, clean feature set and you're doing subject-dependent classification (more on this distinction shortly). It's also useful as a baseline. If a more complex algorithm can't beat kNN on your dataset, something is probably wrong with your features, not your classifier.

The weakness? kNN scales poorly. It has to store and search the entire training set at prediction time. For real-time BCI with thousands of training samples, this can be too slow. And it's extremely sensitive to irrelevant features, since distances are computed across all dimensions, noisy features can dominate the neighbor calculation.

Classifier Quick Reference

LDA: Best when you have few training samples, need real-time speed, and the classes are roughly linearly separable. The safe default for any BCI project.

SVM (RBF kernel): Best when boundaries are nonlinear and you have enough data to tune hyperparameters. The strongest single classifier for most BCI competitions.

Random Forest: Best when you have noisy or high-dimensional features and want interpretability through feature importance scores. Strong and hard to mess up.

kNN: Best as a baseline or for subject-dependent models with clean features. Simple, no training, but slow at prediction time and sensitive to irrelevant features.

The Comparison That Matters

Let's put these algorithms side by side on the dimensions that actually determine which one you should use.

Dimension	LDA	SVM (RBF)	Random Forest	kNN
Training speed	Very fast	Moderate	Moderate	None (lazy learner)
Prediction speed	Very fast	Fast	Fast	Slow (searches full dataset)
Small dataset performance	Excellent	Good	Good	Good
Noise robustness	Moderate	Good	Excellent	Poor
Nonlinear boundaries	No	Yes	Yes	Yes
Hyperparameters to tune	0-1	2 (C, gamma)	2-3 (trees, depth, features)	1 (k)
Interpretability	High (weights are meaningful)	Low (black box with kernels)	Moderate (feature importance)	Low (no explicit model)
Overfitting risk	Low	Moderate	Low	Moderate
Typical BCI accuracy	70-85%	75-90%	72-88%	65-82%
Best use case	Real-time BCI, embedded systems	Competition-grade accuracy	Noisy features, feature selection	Baselines, small clean datasets

Dimension

Training speed

LDA

Very fast

SVM (RBF)

Moderate

Random Forest

Moderate

kNN

None (lazy learner)

Dimension

Prediction speed

LDA

Very fast

SVM (RBF)

Fast

Random Forest

Fast

kNN

Slow (searches full dataset)

Dimension

Small dataset performance

LDA

Excellent

SVM (RBF)

Good

Random Forest

Good

kNN

Good

Dimension

Noise robustness

LDA

Moderate

SVM (RBF)

Good

Random Forest

Excellent

kNN

Poor

Dimension

Nonlinear boundaries

LDA

SVM (RBF)

Yes

Random Forest

Yes

kNN

Yes

Dimension

Hyperparameters to tune

LDA

0-1

SVM (RBF)

2 (C, gamma)

Random Forest

2-3 (trees, depth, features)

kNN

1 (k)

Dimension

Interpretability

LDA

High (weights are meaningful)

SVM (RBF)

Low (black box with kernels)

Random Forest

Moderate (feature importance)

kNN

Low (no explicit model)

Dimension

Overfitting risk

LDA

Low

SVM (RBF)

Moderate

Random Forest

Low

kNN

Moderate

Dimension

Typical BCI accuracy

LDA

70-85%

SVM (RBF)

75-90%

Random Forest

72-88%

kNN

65-82%

Dimension

Best use case

LDA

Real-time BCI, embedded systems

SVM (RBF)

Competition-grade accuracy

Random Forest

Noisy features, feature selection

kNN

Baselines, small clean datasets

Notice something about those accuracy numbers. Even the best classifier on this list, SVM, tops out around 90% in typical BCI tasks. That might sound decent until you compare it to image classification, where 99%+ is routine. The gap tells you something fundamental about EEG data: it's hard. The signals are weak, the noise is strong, and the patterns shift over time and between people. Getting from 80% to 90% on EEG often requires more effort than getting from 50% to 80%.

Brainwave data, captured at 256Hz across 8 channels, processed on-device. The Crown's open SDKs let developers build brain-responsive applications.

Explore the Crown

Cross-Validation: The One Thing You Cannot Skip

Here is where I need to tell you something that will save you months of wasted work. The single most common mistake in EEG classification, the one that fills published papers with inflated results and fills GitHub repos with models that don't actually work, is improper evaluation.

The mistake looks like this. You extract features from your EEG data. You train a classifier. You test it on... the same data you trained on. Or a random subset of that data. Your accuracy is 95%. You celebrate. You write a paper. Someone else tries your method. It gets 55%.

What happened? You didn't evaluate your model. You evaluated your model's ability to memorize your data.

Why Random Splits Don't Work for EEG

In most ML tutorials, you randomly split your dataset 80/20 into train and test sets. This works fine for image classification or spam detection. It's catastrophic for EEG.

Here's why. EEG data is temporally autocorrelated. A sample at time t=10.0 seconds and a sample at t=10.5 seconds are not independent. They share noise characteristics, they share the person's overall brain state at that moment, they might share the same artifact from a head movement. If one lands in your training set and the other in your test set, your classifier isn't learning brain states. It's learning to recognize temporal neighborhoods.

The fix is straightforward but non-negotiable: always split EEG data by time blocks, not by random sampling. If you have 30 minutes of recording, maybe the first 20 minutes are training and the last 10 are testing. Or better yet, use k-fold cross-validation where each fold is a contiguous block of time.

K-Fold Cross-Validation Done Right

Here's the procedure that actually gives you reliable accuracy estimates:

Divide your recording into k contiguous blocks (5 or 10 is standard).
For each fold, hold out one block as the test set. Train on the remaining k-1 blocks.
Record the accuracy on each held-out block.
Report the mean and standard deviation across all folds.

The standard deviation matters as much as the mean. If your classifier gets 90% on fold 1 and 55% on fold 4, your mean of 72.5% is hiding a serious problem: your model's performance depends on which segment of the recording it sees, which means it's probably picking up on non-stationarity or session-level confounds rather than stable brain patterns.

The Leakage You Don't See

Feature extraction before cross-validation is a subtle but devastating form of data leakage. If you normalize your features using the mean and standard deviation of the entire dataset before splitting into folds, information from the test set leaks into the training set through those statistics. Always fit your feature normalization (and any other data-dependent preprocessing) on the training fold only, then apply the same transformation to the test fold. Scikit-learn's Pipeline object handles this correctly. Doing it by hand almost always introduces leakage.

The Overfitting Trap (And Why EEG Falls In Every Time)

Overfitting is a problem in all of machine learning. But EEG classification is especially prone to it, for reasons that are worth understanding deeply.

Small datasets. A typical EEG experiment might give you 20-30 minutes of labeled data per participant. After windowing, that's maybe 500-1,000 labeled samples. In computer vision, that's a rounding error. In EEG, that's your entire dataset.

High-dimensional features. If you compute band power, ratios, connectivity, and time-domain features across 8 channels, you can easily generate 200+ features. When your feature count approaches your sample count, classifiers start memorizing rather than generalizing. This is the curse of dimensionality, and it's the reason feature selection isn't optional for EEG, it's survival.

Non-stationarity. Your brain doesn't produce the same signal for "focused" at 9 AM and at 3 PM. EEG statistics drift over time, meaning the patterns your classifier learned in minute 5 may not apply in minute 25. A model that overfits to early-session patterns will fail on late-session data.

Researcher degrees of freedom. This one is the quiet killer. With so many choices in the pipeline (which features, which frequency bands, which classifier, which hyperparameters, which cross-validation scheme), it's easy to inadvertently optimize for your specific dataset through trial and error. You try 50 configurations, pick the one that works best, and report that number. But that "best" number is partially luck, and it won't reproduce on new data.

How to Fight Overfitting

The defenses are well-known but poorly followed:

Feature selection. Use Random Forest feature importance, mutual information, or recursive feature elimination to prune your feature set before classification. Fewer features means less room for the classifier to memorize noise.
Regularization. LDA is inherently regularized. For SVM, a smaller C value increases regularization. For any model, err on the side of simpler.
Nested cross-validation. If you're tuning hyperparameters, you need two levels of cross-validation: an outer loop for estimating performance and an inner loop for selecting hyperparameters. Single-loop cross-validation with hyperparameter tuning leaks test information into your model selection.
Report variance. Always report standard deviation across folds, not just mean accuracy. If the variance is high, your model isn't stable, and the mean is misleading.

Subject-Dependent vs Subject-Independent: The Divide That Defines Everything

Now for the part that genuinely changes how you think about this field. There's a question lurking behind every EEG classification project that matters more than which algorithm you choose or which features you extract.

Does your model need to work on people it's never seen before?

If the answer is no, congratulations. You're building a subject-dependent model. You train on Alice's data and test on Alice's data (different sessions or time blocks, of course). Life is relatively good. Accuracies of 85-95% are achievable for many tasks. Alice's brain has consistent patterns, your model learns them, and those patterns hold up across sessions.

If the answer is yes, you're building a subject-independent model. You train on data from 20 people and test on person 21, someone whose data your model has never encountered. And this is where the floor drops out.

Why Brains Are Like Fingerprints

Here's the "I had no idea" moment of this entire guide. The spatial pattern of electrical activity on your scalp is as individual as your fingerprint. Not figuratively. A 2015 study by Brigham and Kumar showed that EEG-based biometric identification (recognizing who someone is purely from their brainwave patterns) can achieve over 95% accuracy across 100+ individuals. Your brain's electrical signature is so distinctive that it could serve as a password.

This is fascinating for identity verification. It's devastating for classification.

When you train a subject-dependent model, you're learning one person's unique neural fingerprint for each state. "Alice focused" has a specific, reproducible pattern. "Alice relaxed" has a different pattern. Your classifier learns the difference.

But "Alice focused" and "Bob focused" can look wildly different in raw feature space. Alice might show strong alpha suppression when she focuses. Bob might show beta enhancement without much alpha change. Carol might show both, plus a connectivity shift between frontal and parietal sites that Alice and Bob don't exhibit. Same cognitive state, three completely different neural implementations.

A subject-independent model has to see through these individual differences to the underlying commonality. That requires vastly more training data, more sophisticated features, and often a different algorithmic approach entirely.

Bridging the Gap

The field has developed several strategies for this challenge:

Transfer learning. Train a base model on data from many people. Then fine-tune it on a small amount of data from the new person. This captures general EEG patterns in the base model and individual quirks in the fine-tuning.
Domain adaptation. Mathematically align the feature distributions of different subjects so that "focused" occupies the same region of feature space regardless of who produced it.
Riemannian geometry. Represent EEG data as covariance matrices and classify them using the geometry of the space of symmetric positive definite matrices. This approach is naturally invariant to some forms of between-subject variability and has won multiple BCI competitions.

Each of these strategies adds complexity. But if you're building a product that needs to work for thousands of users, not just one person in a lab, this complexity isn't optional.

Where Neurosity Fits: ML at the Edge

Everything we've discussed so far, feature extraction, classification, cross-validation, subject variability, these are the problems that any EEG classification system has to solve. The question for a developer is: do you solve them yourself, or do you build on top of a system that's already solved them?

The Neurosity Crown's N3 chipset runs trained ML classifiers directly on the hardware. When you call neurosity.focus() through the SDK, you're not getting raw alpha/beta ratios. You're getting the output of a classification model that was trained on data from thousands of sessions, handles between-subject variability, and runs inference in real time on the device itself. No cloud round-trip. No latency penalty from network calls. No raw brain data leaving the hardware.

For many applications, these pre-trained models are exactly what you need. You don't have to build a focus classifier from scratch. You subscribe to the focus stream and build your application logic around it.

But for developers who want to go deeper, who want to build custom classifiers for mental states or cognitive events that the default models don't cover, the Crown gives you everything you need for that too. Raw EEG at 256Hz from 8 channels through the JavaScript or Python SDK. Stream that data into your own feature extraction pipeline. Train your own SVM, your own Random Forest, your own neural network. The Crown becomes the data source, and your classifier becomes the brain.

This is the architecture that makes sense for the current state of the field. Pre-trained models handle the common cases. Custom classifiers handle the edge cases. And the same hardware supports both.

The Path Forward

If you've read this far, you understand something that most people who dabble in EEG never grasp. The hard part isn't the algorithm. LDA, SVM, Random Forest, these are all implemented in a single function call in scikit-learn. The hard part is everything around the algorithm. Choosing the right features. Evaluating honestly. Respecting the gap between subject-dependent and subject-independent performance. Resisting the siren song of inflated accuracy numbers.

Here's what I'd suggest if you're getting started:

Start subject-dependent. Record your own EEG doing two distinct tasks. Extract band power features. Train an LDA. See what accuracy you get with proper block-wise cross-validation. This teaches you the pipeline without the added complexity of between-subject variability.
Respect the baseline. Before you try fancy algorithms, establish what chance-level accuracy looks like for your task (50% for two classes, 33% for three, and so on). If your fancy model barely beats chance after proper cross-validation, the problem is probably your features, not your classifier.
Add complexity gradually. Move from LDA to SVM only if LDA's linear boundaries aren't enough. Add connectivity features only if per-channel features plateau. Try subject-independent models only after you've mastered subject-dependent ones.
Read the competition results. BCI Competition datasets (available for free) include data, labels, and results from teams worldwide. You can benchmark your pipeline against published baselines and see exactly where you stand.

The field of EEG classification is at a genuinely exciting inflection point. Consumer hardware like the Crown puts research-quality EEG data in the hands of developers who can iterate at software speed. The algorithms are mature. The tooling is there. The bottleneck has shifted from "can we classify brain states?" to "what do we build once we can?"

That second question is the one worth staying up late for.

Stay in the loop with Neurosity, neuroscience and BCI

Get more articles like this one, plus updates on neurotechnology, delivered to your inbox.

Frequently Asked Questions

What is EEG classification?

EEG classification is the process of assigning a label to a segment of brainwave data. Given a window of EEG signals, a classifier determines what cognitive state the person is in (focused, relaxed, drowsy) or what mental action they're performing (imagining moving their left hand vs right hand). It's the core technology behind every brain-computer interface.

Which ML algorithm is best for EEG classification?

There is no single best algorithm. Support Vector Machines and Linear Discriminant Analysis remain strong baselines for subject-dependent models with small datasets. Random Forests handle noisy features well. For large datasets and subject-independent models, deep learning approaches like convolutional neural networks are increasingly competitive. The best choice depends on your dataset size, number of channels, and whether you need real-time inference.

Why is EEG classification harder than image classification?

EEG signals have extremely low signal-to-noise ratios, high variability between individuals, and non-stationarity within a single session. Two recordings from the same person doing the same task can look surprisingly different. A model trained on one person's data often fails completely on another person. These challenges don't exist at the same scale in image or text classification.

What is the difference between subject-dependent and subject-independent EEG models?

A subject-dependent model is trained and tested on data from the same individual. It typically achieves 80-95% accuracy because it learns that person's unique brain patterns. A subject-independent model is trained on a group and tested on a new person it has never seen. Accuracy drops to 60-80% because brain signals vary dramatically between individuals. Most consumer BCI products need subject-independent models or per-user calibration.

Can the Neurosity Crown run custom ML classifiers?

The Neurosity Crown's N3 chipset runs pre-trained ML models on-device for focus scores, calm detection, and kinesis (mental commands). Developers can access raw EEG data at 256Hz through the SDK and build custom classifiers that run on their own hardware or in the cloud. The Crown handles data acquisition and basic signal processing, while your custom models handle the classification logic.

How much training data do I need for EEG classification?

For a subject-dependent model, 10-30 minutes of labeled recording per class often suffices for simpler tasks like eyes-open vs eyes-closed or focused vs relaxed. For subject-independent models, you typically need data from 20+ participants. Deep learning models are more data-hungry, often requiring hundreds of hours of recordings. Transfer learning and data augmentation can reduce these requirements significantly.