CryptoLLM
Data Collection Program (DCP)
We are glad to roll out our CryptoLLM Data Collection Program (opens in a new tab)!!!
For first stage, we only consider the data from twitter, as we hope to make our LLM model generate more authentic contents thats fits the crypto-domain. e.g., BUIDL, HODL, etc.
The advantage of this activty is:
- We can get more authentic data from twitter.
- We can get the user face-depiction which we can use to generate more user-centric contents, when we deploy our twitter Agent.
What's the difference between our Model and other LLM models e.g., from together.ai (opens in a new tab)?
We choose to fine-tune our own crypo-domain LLM model, by collecting the data from twitter from crypto-domain users. Thus enjoys the following advantages:
- We can generate more diverse contents, and not overfit the specific twitter account.
- We provide more flexibility on the model design, more technical difference is that ,we fine-tune the model using LoRA with python on dedicated GPU, while together.ai fine-tune the model using their own toolchain on cloud GPU using javascript.
Instead, together.ai choose to fine-tune model per character, which makes the model too personalized, and not able to generate more diverse contents, and overfit the specific twitter account, thus fall into the trap of information hallucination.
What You Can Get
- We will use partial data to pretrain our LLM model first, so the earlier you provide data, the more benefit you can get.
- Experience the versatility of our CryptoLLM model, which can be flexibly deployed across various scenarios, enabling you to leverage its capabilities in different applications and use cases.