ramon15635mdiego_sandoval•1mmobilene•2mpjmlp•2meastbound•2mjagged-chisel•4mjamesbelchamber•4mjpadkins•1mazhenley•4mNextgrid•5mbolangi•5mhirako2000•8mBitWiseVibe•10mwhynotminot•10mperihelions1h 26mshevy-java56mDyslexicAtheist46msalviati44mDyslexicAtheist40mcr125rider•3msalviati•14mconartist659mLapel274245mconartist640mLapel2742•16mconartist6•7mtux358mconartist656msalviati42mSirFatty•12mcolesantiago1h 38mzkmon•27mTheOtherHobbes49mhorsawlarway•25mtorginus51mIanCal•18mhulitu•12mthw_9a83c1h 2mfoxrider•29mswatson7411h 7mIanCal•17mnecovek1h 10mIanCal•23mbrainless1h 18mhulitu•9mmort961h 12mrvz1h 0mIanCal•23mmort96•5mIanCal1h 22msaltwatercowboy50mIanCal•27mSynaesthesia1h 27mfluidcruft•1mgwd1h 30mRygian1h 22mgwd1h 6mIanCal•15mhenwfan1h 41mpxtail•12mepolanski•20membedding-shape•3mhenwfan•4mbaq•5membedding-shape30mbaq•22mconstantcrying34mnetdevphoenix•8mdragochat41mbko•1mfarhanhubble33mnetdevphoenix•13mfrancoispiquard55mhenwfan•25mgpjt3h 0mbillylo•7mHavoc1h 3membedding-shape•9mACCount37•16mducktective1h 11mACCount37•17mhtrp•11mBengalilol1h 53mdnrvs•5msimonw2h 44mapi•4msimonw•3mwouldbecouldbe1h 28mErroneousBosh•19mdevinprater1h 40mm12k1h 25mcenamus52msimonw43mdebugnik•26msimonw•14mPebblesHD2h 25m8organicbits•28m
↙ time adjusted for second-chance
LLM from scratch, part 28 – training a base model from scratch on an RTX 3090 (gilesthomas.com)depressingly enough, things that work on small scale architectures often don't work at larger scales
ACCount37•7mtosh3h 1mnticompass•1mTavsiE9s1h 27mFieryMechanic36mTavsiE9s•22mkouteiheika42mrkomorn•18mTavsiE9s•24mxandrius•26mnottorp57mTavsiE9s•25mFieryMechanic35mphantasmish•28mFieryMechanic•25mjeppester1h 4mTavsiE9s•22mrobin_reala4h 0m↙ time adjusted for second-chance
No ARIA is better than bad ARIA (w3.org)