{"id":168626,"date":"2021-06-09T22:01:04","date_gmt":"2021-06-09T17:01:04","guid":{"rendered":"https:\/\/venturebeat.com\/?p=2695704"},"modified":"2021-06-09T22:01:04","modified_gmt":"2021-06-09T17:01:04","slug":"eleutherai-claims-new-nlp-model-approaches-gpt-3-level-performance","status":"publish","type":"post","link":"https:\/\/www.technologyforyou.org\/eleutherai-claims-new-nlp-model-approaches-gpt-3-level-performance\/","title":{"rendered":"EleutherAI claims new NLP model approaches GPT-3-level performance"},"content":{"rendered":"<div id=\"boilerplate_2682874\" class=\"post-boilerplate boilerplate-before\">\n<p><em>Elevate your enterprise data technology and strategy at <a href=\"https:\/\/venturebeat.com\/event\/transform-2021\/register\/#\" data-type=\"URL\" target=\"_blank\" rel=\"noreferrer noopener\">Transform 2021<\/a><\/em>. <\/p>\n<hr class=\"wp-block-separator is-style-wide\">\n<\/div>\n<p>AI-powered language systems have transformative potential, particularly in the enterprise. They\u2019re already being used to drive chatbots, translate natural language into structured query language, create application layouts and spreadsheets, and improve the accuracy of web search products. Perhaps the best-known AI text-generator, OpenAI\u2019s <a href=\"https:\/\/venturebeat.com\/2021\/06\/01\/microsoft-gpt-3-and-the-future-of-openai\/\">GPT-3<\/a>, is being used in more than 300 different apps by tens of thousands of developers and producing 4.5 billion words per day.<\/p>\n<p>As interest in AI rises in business, advisory firm Mordor Intelligence forecasts that the natural language processing (NLP) market will more than <a href=\"https:\/\/www.cio.com\/article\/3543296\/the-business-value-of-nlp-5-success-stories.html#:~:text=As%20interest%20in%20AI%20rises,revenue%20in%202019%20by%202025.\">triple<\/a> its revenue by 2025. But noncommercial, open source efforts are concurrently gaining steam, as evidenced by the progress made by <a href=\"https:\/\/venturebeat.com\/2021\/01\/15\/ai-weekly-meet-the-people-trying-to-replicate-and-open-source-openais-gpt-3\/\">EleutherAI<\/a>. A grassroots collection of AI researchers, EleutherAI this week released GPT-J-6B (GPT-J), a model the group claims performs nearly on par with an equivalent-sized GPT-3 model on various tasks.<\/p>\n<p>\u201cWe think it\u2019s probably fair to say this is currently the best open source autoregressive language model you can get by a pretty wide margin,\u201d Connor Leahy, one of the founding members of EleutherAI, told VentureBeat.<\/p>\n<p>GPT-J is what\u2019s known as a <a href=\"https:\/\/venturebeat.com\/2021\/01\/12\/google-trained-a-trillion-parameter-ai-language-model\/\">Transformer<\/a> model, which means it weighs the influence of different parts of input data rather than treating all the input data the same. Transformers don\u2019t need to process the beginning of a sentence before the end. Instead, they identify the context that confers meaning to a word in the sentence, enabling them to process input data in parallel.<\/p>\n<p>The Transformer architecture forms the backbone of language models including GPT-3 and Google\u2019s <a href=\"https:\/\/venturebeat.com\/2018\/11\/02\/google-open-sources-bert-a-state-of-the-art-training-technique-for-natural-language-processing\/\">BERT<\/a>, but EleutherAI claims that GPT-J took less time to train compared with other large-scale model developments. They researchers attribute this to the use of Jax, DeepMind\u2019s Python library designed for machine learning research, as well as training on Google\u2019s <a href=\"https:\/\/venturebeat.com\/2021\/05\/18\/google-details-new-ai-accelerator-chips\/\">tensor processing units (TPU)<\/a>, application-specific integrated circuits (ASICs) developed specifically to accelerate AI.<\/p>\n<h2>Training GPT-J<\/h2>\n<p>EleutherAI says that GPT-J contains roughly 6 billion parameters, the parts of the machine learning model learned from historical training data. It was trained over the course of five weeks on 400 billion tokens from a dataset created by EleutherAI called The Pile, a 835GB collection of 22 smaller datasets including academic sources (e.g., Arxiv, PubMed), communities (StackExchange, Wikipedia), code repositories (Github), and more. Tokens are a way of separating pieces of text into smaller units in natural language, and they can be words, characters, or parts of words.<\/p>\n<div>\n<div id=\"attachment_2695742\" class=\"wp-caption aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-2695742 size-large\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2021\/06\/Screenshot-2021-06-09T122044.178.png?w=776&amp;resize=776%2C502&amp;strip=all\" alt=\"EleutherAI\" width=\"776\" height=\"502\" data-recalc-dims=\"1\"><\/p>\n<p class=\"wp-caption-text\">Above: GPT-J can solve basic math problems.<\/p>\n<p><em>Image Credit: EleutherAI<\/em><\/p>\n<\/div>\n<\/div>\n<p>For compute, EleutherAI was able to leverage the TPU Research Cloud, a Google Cloud initiative that supports projects with the expectation that the results of the research will be shared via code and models. GPT-J\u2019s code and the trained model are open sourced under the MIT license and can be used for free using <a href=\"http:\/\/colab.research.google.com\/github\/kingoflolz\/mesh-transformer-jax\/blob\/master\/colab_demo.ipynb\">HuggingFace\u2019s Transformers platform<\/a> or EleutherAI\u2019s website.<\/p>\n<p>GPT-J is more capable than the two models EleutherAI previously released, <a href=\"https:\/\/venturebeat.com\/2021\/05\/15\/gpt-3s-free-alternative-gpt-neo-is-something-to-be-excited-about\/\">GPT-Neo 1.3B and GPT-Neo 2.7B<\/a>. For example, it can&nbsp;perform addition and subtraction and prove simple mathematical theorems, like \u201cAny cyclic group is abelian.\u201d It can also answer quantitative reasoning questions from a popular test dataset (BoolQ) and generate pseudocode.<\/p>\n<div>\n<div id=\"attachment_2695743\" class=\"wp-caption aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-2695743 size-large\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2021\/06\/Screenshot-2021-06-09T122103.730.png?w=768&amp;resize=768%2C319&amp;strip=all\" alt=\"EleutherAI\" width=\"768\" height=\"319\" data-recalc-dims=\"1\"><\/p>\n<p class=\"wp-caption-text\">Above: GPT-J proving a theorem.<\/p>\n<p><em>Image Credit: EleutherAI<\/em><\/p>\n<\/div>\n<\/div>\n<p>\u201c[OpenAI\u2019s] GPT-2 was about 1.5 billion parameters and doesn\u2019t have the best performance since it\u2019s a bit old, GPT-Neo was about 2.7 billion parameters but somewhat underperforms equal-sized GPT-3 models, GPT-J, the new one, is now 6B \u2014 sized similar to the Curie model of OpenAI, we believe,\u201d Leahy said.<\/p>\n<h2>Looking ahead<\/h2>\n<p>EleutherAI plans to eventually deliver the code and weights needed to run a model similar, though not identical, to the full \u201cDaVinci\u201d GPT-3. (Weights are parameters within a neural network that transform input data.) Compared with GPT-J, the full GPT-3 contains 175 billion parameters and was trained on 499 billion tokens from a 45TB dataset.<\/p>\n<p>Language models like GPT-3 often amplify biases encoded in data. A portion of the training data is not uncommonly sourced from communities with&nbsp;<a href=\"https:\/\/venturebeat.com\/2020\/08\/07\/researchers-quantify-bias-in-reddit-content-sometimes-used-to-train-ai\/\">pervasive<\/a> gender, race, and religious prejudices. OpenAI notes that this can lead to placing words like \u201cnaughty\u201d or \u201csucked\u201d near female pronouns and \u201cIslam\u201d near words like \u201cterrorism.\u201d Other studies, like one published in April by Intel, MIT, and the Canadian Institute for Advanced Research (CIFAR) researchers, have found high levels of stereotypical bias in some of the most popular models.<\/p>\n<div>\n<div id=\"attachment_2695744\" class=\"wp-caption aligncenter\"><img decoding=\"async\" loading=\"lazy\" class=\"wp-image-2695744 size-large\" src=\"https:\/\/venturebeat.com\/wp-content\/uploads\/2021\/06\/Screenshot-2021-06-09T122138.245.png?w=774&amp;resize=774%2C392&amp;strip=all\" alt=\"EleutherAI\" width=\"774\" height=\"392\" data-recalc-dims=\"1\"><\/p>\n<p class=\"wp-caption-text\">Above: GPT-J answering a word problem.<\/p>\n<p><em>Image Credit: EleutherAI<\/em><\/p>\n<\/div>\n<\/div>\n<p>But EleutherAI claims to have performed \u201cextensive bias analysis\u201d on The Pile and made \u201ctough editorial decisions\u201d to exclude datasets they felt were \u201cunacceptably negatively biased\u201d toward certain groups or views.<\/p>\n<p>While EleutherAI\u2019s model might not be cutting edge in terms of its capabilities, it could go a long way toward solving a common problem in tech: the disconnect between research and engineering teams. As Hugging Face CEO Cl\u00e9ment Delangue told VentureBeat in a recent interview, on the one hand, tech giants provide black-box NLP APIs while also releasing open source repositories that can be hard to use or aren\u2019t well-maintained. EleutherAI\u2019s efforts could help enterprises to realize the business value of NLP without having to do much of the legwork themselves.<\/p>\n<div id=\"boilerplate_2660155\" class=\"post-boilerplate boilerplate-after\">\n<h3>VentureBeat<\/h3>\n<p>VentureBeat&#8217;s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:<\/p>\n<ul>\n<li><span>up-to-date information on the subjects of interest to you<\/span><\/li>\n<li><span>our newsletters<\/span><\/li>\n<li><span>gated thought-leader content and discounted access to our prized events, such as <a href=\"https:\/\/events.venturebeat.com\/transform2021\/\"><strong>Transform 2021<\/strong>: Learn More<\/a><\/span><\/li>\n<li><span>networking features, and more<\/span><\/li>\n<\/ul>\n<p><a class=\"membership-link\" href=\"https:\/\/venturebeat.com\/venturebeat-membership-plans\/\">Become a member<\/a><\/div>\n<p><!-- Boilerplate CSS for \"after\" --> <a href=\"http:\/\/feedproxy.google.com\/~r\/venturebeat\/SZYF\/~3\/MkOfz9f2k54\/\">Source Link<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Elevate your enterprise data technology and strategy at Transform 2021. AI-powered language systems have transformative potential, particularly in the enterprise. They\u2019re already being used to drive chatbots, translate natural language into structured query language, create application layouts and spreadsheets, and improve the accuracy of web search products. Perhaps the best-known AI text-generator, OpenAI\u2019s GPT-3, is [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[27765,27766,14083],"tags":[20,37,73,16418,16672,17574,420,16413,22972,14383,16794,76,3350,3351,15210,22830],"class_list":{"0":"post-168626","1":"post","2":"type-post","3":"status-publish","4":"format-standard","6":"category-artificial-intelligence-news","7":"category-machine-learning-news","8":"category-technology-industry-news","9":"tag-ai","10":"tag-artificial-intelligence","11":"tag-big-data","12":"tag-category-computers-electronics","13":"tag-category-science-computer-science","14":"tag-category-science-mathematics-statistics","15":"tag-cloud","16":"tag-dev","17":"tag-eleutherai","18":"tag-enterprise","19":"tag-language-model","20":"tag-machine-learning","21":"tag-natural-language-processing","22":"tag-nlp","23":"tag-open-source","24":"tag-vb-home-page"},"_links":{"self":[{"href":"https:\/\/www.technologyforyou.org\/wp-json\/wp\/v2\/posts\/168626","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.technologyforyou.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.technologyforyou.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.technologyforyou.org\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.technologyforyou.org\/wp-json\/wp\/v2\/comments?post=168626"}],"version-history":[{"count":0,"href":"https:\/\/www.technologyforyou.org\/wp-json\/wp\/v2\/posts\/168626\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.technologyforyou.org\/wp-json\/wp\/v2\/media?parent=168626"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.technologyforyou.org\/wp-json\/wp\/v2\/categories?post=168626"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.technologyforyou.org\/wp-json\/wp\/v2\/tags?post=168626"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}