Yes we are working with probability clouds. Nostr is special, a very bountiful could with so much beneficial rain.
🔔 This profile hasn't been claimed yet. If this is your Nostr profile, you can claim it.
Edit
Yes we are working with probability clouds. Nostr is special, a very bountiful could with so much beneficial rain.
LLM builders in general are not doing a great job of making human aligned models. Most probable cause is reckless training LLMs using outputs of other LLMs, and don't caring about curation of datasets and not asking 'what is beneficial for humans?'... Here is the trend for several months:
A comparison of world's two best LLMs! My LLM seems to be doing better than Mike Adams'. Of course I am biased and the questions are coming from the domains that I did trainings. His model would rank 1st in the AHA leaderboard though, with a score of 56, if I included fine tunings in the leaderboard. I am only adding full fine tunes. His model will not be a row but will span several columns for sure (i.e. it will be a ground truth)! My LLM is certainly much more woo woo :) I marked green which answers I liked. What do YOU think? https://sheet.zohopublic.com/sheet/published/sb1dece732c684889436c9aaf499458039000
Using his own words, this model is about emergency first aid, home gardening, survival, preparedness, herbal extracts, money, gold, silver, federal reserve, false flag events, mRNA, vaccines and more. https://www.brighteon.com/fc80b9bf-db8d-4517-b7ba-6c9fe4e65a44 I uploaded it to hf: https://huggingface.co/etemiz/Mistral-Nemo-12B-CWC-Enoch-251014-GGUF
Benchmarked Mike Adams' new model. It got 56, which is very good. https://brightu.ai/downloads
Our leaderboard can be used for human alignment in an RL setting. Ask the same question to top models and worst models and the answer from top models can get +1 score, bad models can get -1. Ask many times with higher temperature to generate more answers. This way other LLMs can be trained towards human alignment. Below, Grok 2 is worse than 1 but better than 3. This was already measured using API but now we measured the LLM and the results are similar. GLM is ranking higher and higher compared to previous versions. Nice trend! I hope they continue doing better aligned models.
It sucks in all the veracious types and it may give birth to a veracious AI. Soon ™