CAVOK@lemmy.world to Technology@lemmy.worldEnglish · 2 months agoDonating our open-source alignment tool - Anthropicwww.anthropic.comexternal-linkmessage-square1fedilinkarrow-up121arrow-down18
arrow-up113arrow-down1external-linkDonating our open-source alignment tool - Anthropicwww.anthropic.comCAVOK@lemmy.world to Technology@lemmy.worldEnglish · 2 months agomessage-square1fedilink
minus-squareEm Adespoton@lemmy.calinkfedilinkEnglisharrow-up7·2 months agoThat’s all great, but all it takes is to unalign a single parameter and it appears to unalign the entire model. So this is great for ensuring you’re testing what you think you’re testing, but it’s not going to actually secure a model you’re going to make open.
That’s all great, but all it takes is to unalign a single parameter and it appears to unalign the entire model.
So this is great for ensuring you’re testing what you think you’re testing, but it’s not going to actually secure a model you’re going to make open.