Friday, September 22, 2023
HomeNatural Language ProcessingLLMs can't discover any extra information, what are they going to do...

LLMs can’t discover any extra information, what are they going to do now? – Bitext. We assist AI perceive people.


What’s the Downside within the AI Market?

Companies are investing closely in creating LLM-based purposes, with GPT, LLaMa, MPT, Falcon, and many others. Since all these fashions depend on very related datasets and architectures, they are typically indistinguishable in follow from one another. This lack of differentiation results in AI purposes that provide undifferentiated experiences since they’re primarily based on related fashions with related information and related architectures.

What Options are Obtainable?

Since architectures are principally made public through open supply, information appears to supply one potential path. At Bitext we’ve produced information for NLP/NLU/AI purposes for a number of years. To handle this problem, we now have produced “Hybrid Datasets” (like in “hybrid automobiles”). We name them hybrid as a result of they’re a mix of guide and artificial information, created with a strategy that mixes NLG know-how with curation by linguists and vertical consultants.

What are some great benefits of “Hybrid Datasets”? Two.

First. They’re artificial however nonetheless they keep away from the everyday issues of the generative method:

  • Hallucination free. The corpus is 100% hallucination free. This makes it notably appropriate for high-quality LLM nice tuning.
  • Bias free. The corpus contains tagging for offensive language generated from human-curated dictionaries.
  • PII free. The corpus is 100% freed from Private Identifiable Info, there aren’t any precise names however placeholders or slots.

Second. The datasets have intensive artificial tagging, textual content is not only uncooked textual content however textual content enriched with data on what sort of language variation the categorical, which implies:

  • a request like “can u ship me a brand new pw?” might be tagged as “colloquial”
  • one other request like “simply cancel proper now the f***g order” might be tagged as “offensive”

There are 12 totally different tags like these ones. Completely different vertical datasets may be created with this tagging, for instance, a coaching dataset for youthful folks primarily based on colloquial texts. We are going to come again with extra on this in Half 2.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments