Though most of our explanations have low scores, we imagine that we are able to additional enhance our capacity to create explanations utilizing ML strategies. For instance, I discovered you can enhance your rating by:
- Repeat the reason. You may improve your rating by asking GPT-4 to provide you with potential counterexamples and modifying your clarification to account for his or her activation.
- I’ll clarify utilizing a big mannequin. The typical rating will increase because the options of the explainer mannequin improve. Nevertheless, even GPT-4 has poorer explanations than people, suggesting that there’s room for enchancment.
- Modify the structure of the described mannequin. Coaching the mannequin with totally different activation features improved the reason rating.
We’re open sourcing the descriptive dataset and visualization instruments written in GPT-4 for all 307,200 neurons in GPT-2, in addition to the descriptive and scoring code. Use published models About OpenAI API. We hope the analysis neighborhood will develop new strategies to generate higher-scoring explanations and higher instruments to make use of explanations to analyze his GPT-2.
We discovered greater than 1,000 neurons with descriptions with a rating of not less than 0.8. Which means, in keeping with GPT-4, they account for almost all of the highest activation conduct of neurons. Most of those well-described neurons will not be very attention-grabbing. However we additionally discovered numerous attention-grabbing neurons that GPT-4 does not perceive. We hope that as our explanations enhance, we could shortly uncover attention-grabbing qualitative understandings of mannequin calculations.

