Lightnews — Scholar-powered news

Selçuk Korkmaz

@selcukorkmaz.bsky.social

Passionate about data & innovation | Building tools to simplify complex problems | Advocate for open-access research & AI-driven solutions | https://selcukorkmaz.github.io/

Posts Replies Media Videos

Selçuk Korkmaz

@selcukorkmaz.bsky.social

8/8 In short:
Leak-free design
Real signal
Correct inductive bias
Distribution alignment
Appropriate loss and algorithm
Without these, high performance is only an illusion.

November 29, 2025 at 5:28 PM

Selçuk Korkmaz

@selcukorkmaz.bsky.social

7/8 Finally, hyperparameter tuning is just optimization. The true structure of the model is determined by the loss function and the algorithm. Tuning cannot fix a fundamentally wrong design.

November 29, 2025 at 5:28 PM

Selçuk Korkmaz

@selcukorkmaz.bsky.social

6/8 Another core assumption is distribution alignment: the data the model sees during training must come from the same distribution as the data it will face in reality. If these differ, errors are inevitable.

November 29, 2025 at 5:28 PM

Selçuk Korkmaz

@selcukorkmaz.bsky.social

5/8 When this bias matches the problem, the model generalizes well. When it doesn’t, the model may look perfect during training but fail immediately in practice. Capacity control, regularization, and appropriate architecture choices are the main tools to manage this.

November 29, 2025 at 5:28 PM

Selçuk Korkmaz

@selcukorkmaz.bsky.social

4/8 Third, inductive bias. This is the model’s built-in view of how the world works.
Decision trees assume sharp splits.
Linear models assume linear relationships.
Deep learning assumes the ability to represent highly complex structures.

November 29, 2025 at 5:28 PM

Selçuk Korkmaz

@selcukorkmaz.bsky.social

3/8 Second, the data must contain real signal. There must be a learnable relationship. If noise dominates or the sample size is too small, even the best algorithm fails. Models cannot learn from noise.

November 29, 2025 at 5:28 PM

Selçuk Korkmaz

@selcukorkmaz.bsky.social

2/8 The first requirement is a leak-free experimental design. No information from the test set should leak into the training process. If leakage exists, the model isn’t performing, it’s just seeing the answers beforehand.

November 29, 2025 at 5:28 PM

Selçuk Korkmaz

@selcukorkmaz.bsky.social

Paper: journal.r-project.org/articles/RJ-...

PubChemR: An R Package for Accessing Chemical Data from PubChem

Chemical data is a cornerstone in the fields of chemistry, pharmacology, bioinformatics, and environmental science. The PubChemR package provides a comprehensive R interface to the PubChem database, w...

journal.r-project.org

November 29, 2025 at 5:54 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news