Site icon The Stack

GitHub sued over Copilot for alleged “unprecedented scale” software piracy

github copilot sued piracy

GitHub Copilot “ignores, violates, and removes the Licenses offered by thousands — possibly millions — of software developers, thereby accomplishing software piracy on an unprecedented scale”. 

That’s according to what will be a closely watched class action lawsuit filed on November 3, by programmer Matthew Butterick and the Joseph Saveri Law Firm in the San Francisco federal court.

Critically however, their case does not rest primarily on alleged copyright breaches.

The two are suing GitHub, its parent, Microsoft, and its AI-technology partner, OpenAI.

GitHub Copilot, is effectively an “auto-complete” for coders. It was trained on public GitHub repositories and spits out lines of code on simple prompts. Whilst many coders welcome its ability to avoid the need to write essentially boilerplate code ad nauseum, it has met with some disquiet since its June 2021 launch.

The two are suing GitHub, its parent, Microsoft, and its AI-technology partner, OpenAI.

GitHub sued: Copilot class action lawsuit is extensive

Open source advocates have been concerned since the product’s launch that it appears to be auto-completing using chunks of code, verbatim, that were written with copy-left licenses that may prohibit this kind of reuse.

Recent critics of Copilot include Tim Davis, professor of computer science at Texas A&M University.

GitHub’s then-CEO Nat Friedman noted in 2021: “In general: (1) training ML systems on public data is fair use (2) the output belongs to the operator, just like with a compiler. We expect that IP and AI will be an interesting policy discussion around the world in the coming years, and we’re eager to participate!”

Other prominent concerns have been that the product (charged at $10 per month) is “laundering bias through opaque systems” and perpetuating the use of bloated/lazily written code as coders get accustomed to not having to meaningfully think about or critically review the code Copilot is producing for them.

As the complaint [pdf] alleges: “The Defendants stripped Plaintiffs’ and the Class’s attribution, copyright notice, and license terms from their code in violation of the Licenses and Plaintiffs’ and the Class’s rights. Defendants used Copilot to distribute the now-anonymized code to Copilot users as if it were created by Copilot.”

It adds that: “Copilot often simply reproduces code that can be traced back to open-source repositories or open-source licensees. Contrary to and in violation of the Licenses, code reproduced by Copilot never includes attributions to the underlying authors” – the plaintiffs seek a trial by jury and to “seek to recover injunctive relief and damages as a result and consequence of Defendants’ unlawful conduct” the complaint shows. 

GitHub and Microsoft have been contacted for comment.

Copilot class action raises “numerous questions of law”

The complaint alleges that “numerous questions of law or fact common to the entire Class arise from
Defendants’ conduct—including, but not limited to those identified below:

See also: Heroku’s GitHub connection remains on ice after breach as customers fret, eye alternatives

GitHub Copilot’s launch triggered an immediate firestorm around potential copyright and open source licensing breaches. Most commentators at the time suggested it would be tough to win a case on these grounds, whatever moral qualms many may have. The Class Action does not, interestingly, centre on this.

One commentator, former MEP Felix Reda noted at the time that certainly in the EU it was likely legal. He wrote: “Since the EU Copyright Directive of 2019, text & data mining is permitted. “Even where commercial uses are concerned, rights holders who do not want their copyright-protected works to be scraped for data mining must opt-out in machine-readable form such as robots.txt. Under European copyright law, scraping GPL-licensed code, or any other copyrighted work, is legal, regardless of the licence used. In the US, scraping falls under fair use, this has been clear at least since the Google Books case.”

GitHub Copilot open source complaint: It’s going to be a tricky case

Assessing the legal risks for Microsoft when GitHub Copilot launched and triggered immediate concerns, The Stack sought comment from a number of lawyers specialising in open source at the time.

One (whom we won’t name as we were unable to contact them today and their views may have evolved) wrote at the time, saying: “[With regard to inputs] copying and reading code in an AI engine is specifically permitted by every single open source license – it is freedom 0: the freedom to use code for any purpose. So, for the training phase, unless there are further details I am unaware of, I would definitely think that there is no infringement of publicly available open source code. I have also seen articles by colleagues (Neil Brown, Andres Guadamuz) who indicate that GH’s terms of business may also allow GH to scan any code, not just open source code on GH.”

They added: “[When it comes to outputs] the AI generated code (output) may not be covered by copyright at all. There have been many arguments about machine-generated works (or photographs taken by monkeys) and so far the general consensus if not court decisions are that for copyright protection, the work must be the intellectual creation of a human person. The AI engine itself is not a person, and while for example the UK regime provides that for machine-generated works, the copyright lies with the person that configured the machine, in the case the AI is self configuring, so much harder to argue that the engineer running the AI has any input into the resulting output/work. That’s regarding protection of the generated code.”

Follow The Stack on LinkedIn

Exit mobile version