Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
riku_iki
3 months ago
|
parent
|
context
|
favorite
| on:
Kimi K2 Thinking, a SOTA open-source trillion-para...
I assume training set components have also priorities, low priority data goes to training very few times at the beginning of pretraining, while higher priority data is trained on multiple times until the end.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: