My social media timeline has started filling up with angry reactions to the news that several academic publishers have been licensing academic works to AI companies for training purposes without having consulted the authors of those works. I can understand the anger and frustration that this news have generated, so this could prove to be a good time to reflect on just how messed up academic publishing is, and why it is possible for companies to be able to do this without any recourse from authors.
I should preface by tackling the issue that I am an active participant in the publishing industry as a journal editor, peer-reviewer, and author. I’ve long been a critic of academic publishing and an advocate of open access, and yet I choose to participate in it. This is inevitable, our jobs are tied to the publishing industry, so there is practically no other choice, at least for now. I still re-post everything I’ve written on public repositories, and try to continue to advocate for open access. And anyway, I am reminded of this famous cartoon:
With that out of the way, let’s look at what’s wrong with academic publishing.
The industry
The global academic publishing industry is valued at approximately $19 billion USD annually. This industry is dominated by a few major players, with five companies accounting for over 50% of the market share (Elsevier, Wiley, Taylor & Francis, Springer, and SAGE). But the income is not the headline, it is the profits, which can go up to 40%, making it more profitable than companies like Google and Coca Cola.
How is it possible for this industry to have such incredible revenue and profits? This is due to the uniqueness of the academic environment which means that the industry has an almost free source of content.
This is the part of the story that involves people like Yours Truly, those who work in academia. Our jobs are pretty much predicated on the principle of “publish or perish”, in order to get hired you have to have published, and to get promoted you have to continue getting published. Not all journal publications carry equal weight; there is a complicated hierarchy of publications that rank journals by metrics such as impact factor, as well as there being some unofficial ranking by prestige. Academics always know the top journals in their area. So in order to get hired at the top institutions, or to compete for funding money, it helps to have been published in these top journals, most of which are owned by the top publishers.
So an interesting game begins, people will do anything to get published in the top journals, so an industry has developed based on this fact, and authors do not get any pay for their work, and under some circumstances, they even are willing to pay for the works to be published. Add to that the fact that the operation of these journals is undertaken also by academics, not on full salary mind you, but often for free or receiving tiny annual honorariums. This is because being an editor is also prestigious and leads to hiring, funding, and promotion, hence adding another spoke to the already messed up wheel. And the icing on the cake is that peer-review is also conducted for free. This is because you want to help the community as your own articles are being peer-reviewed by other academics, and you want to participate as a good citizen to allow the entire edifice to remain intact.
And the result is an industry with incredible profits and eye-watering profit margins, with practically no economic recompense going to the content creators and the people who make the system work.
In the UK, the system is even more messed up because of the REF (Research Excellence Framework), a system so incredibly inefficient that it could have only been devised by the type of bureaucrats that would make the Vogons turn green with envy (or greener-ish, but I digress). Imagine a place where the government has tied up the funding of higher education to demonstration of excellence in research, which sounds good in principle, until you look at the effects. This is a process that takes place every few years, and academics have to demonstrate the impact of their research. While the place where papers and books are published is not supposed to matter, of course they always do, so more money goes to the institutions with better publications and impact. This creates a system where everything you do has to be directed at demonstrating those research credentials, which keeps the machine spinning. The REF helps to entrench a system by which we are obligated to publish at whatever cost, or our institutions will suffer. The current crisis in UK higher education is just another symptom of a broken system that spends incredible amounts of internal resources on an inefficient funding framework that only benefits the status quo.
And here is where copyright comes in. As the journals have all of the advantage, and academics are desperate to get published, when the paper is accepted they are presented with a transfer of copyright agreement. This varies, but for the most part you give away all copyright to the publisher in a form that is worded like this: “The copyright to this article is transferred to the owner of [journal], effective if and when the article is accepted for publication.” Under some journals, authors will retain copyright over their works, but they will have given a broad licence to the publisher to use, reuse, publish, and sub-licence their work.
The resulting system is one in which journal articles are locked behind paywalls and are incredibly expensive. Academic books are also very expensive because, for the most part, nobody will ever read them, so the cost is high because the number of copies is low. The target audience of these works are university libraries used by the same people who wrote the tomes that are filling it, and who do not see any monetary benefit from it. We are being sold our own works at exorbitant prices so that we can cite them to apply for funding so that the university can support the library to pay for more books and journal subscriptions.
You have to admire the sheer insanity of it all.
Open access
People who were unfamiliar with the academic publishing system and have made it this far may be wondering “why do you put up with this?” Good question! The answer is a combination of inertia and the fact that our jobs are tied to the system as it is; a change would require a complete overhaul of how we conduct and assess the quality of research.
At some point, some academics did start a rebellion. As described above, the proprietary publishing model results in journal articles and books that are expensive and locked behind paywalls. So roughly 20 years ago, some academics started pushing towards more open publishing models that would allow research outputs to be available for free online, a system that is known as open access. The ideal of an open access journal is one that is published independently, completely detached from the commercial academic publishing apparatus. As most of the work is performed for free, why not do the same process independently? This was the idea behind a journal that I helped to found at the University of Edinburgh called SCRIPTed, which still exists. This model, however, is difficult to replicate. The journal existed for years only by the goodwill of volunteers and was funded in large part by an AHRC grant that funded the SCRIPT Centre, so all of the admin expenses were covered. Not everyone is able to replicate this model, which explains why the number of pure open access journals remains relatively low.
Then there is the problem of prestige. One of the problems we had at SCRIPTed was that it was not an established prestige journal, so publishing in it would not help your career. If you were an open access advocate like myself, publishing in open access journals was actively discouraged. When I was hired at my current employer, I received feedback that my publication record was “weird”. At another job application, I received feedback that I was not publishing in the top journals despite the fact that my topics were very interesting and could have higher impact.
However, open access prevailed in the end as it became an integral part of funding and the REF. If you get public funding, you have to make your works available in an open access repository for free to the public, and all REF outputs have also to have been posted as open access. This is usually met by having institutional repositories where you can download copies of the works. Another factor in favour of open access is that, unsurprisingly, if your work is available online it will have more reach and impact than a work that is hiding behind a paywall. Who knew.
It is at this point that you will probably be thinking “wait a second Andres, this doesn’t make sense! If open access won and is so prevalent, how can the publishing industry’s profits remain?” Good question again, astute reader! To quote Galadriel, “but they were all of them deceived…”
The reality is that the academic publishing industry did not abandon the proprietary system, but rather adopted what is called “gold open access”. In this model, the author has to pay to have their work published as open access and available to the public. So the publisher gets a lump sum of money, which is likely more than they would make from an article that nobody was going to read anyway. And here is the kicker: this money is often paid by public funding bodies and/or the universities themselves, as many of the top publishers have agreements with university libraries so that they will cover the cost of the article. This is nothing more than an extraction of public funds directed towards private profits.
And if you decide not to go for gold open access and upload your work to an institutional repository, then you have to comply with embargo periods. Most commercial publishers now take this into account, and they know that after a period of 18 months or so the article will not have commercial value anyway, so any open access version will not be competing with the commercial one.
The result is that profits remain high and money keeps coming in regardless.
AI
This is where the licensing of academic works for AI training comes in. At best, authors will have given a very broad licence to the publishers; at worst, the authors will have no copyright over their own works, and therefore cannot do anything if the publishers want to licence those works.
Can authors do anything? That will depend entirely on specific agreements, but my guess is that most people opposing this development will not have a legal recourse, but that will depend entirely on individual circumstances. If your work has been included in AI training and you’re absolutely opposed to this, you may need to look at the correspondence and documentation and examine the licensing and any other grants. Most people do not look at the fine print when publishing, and will therefore give away their copyright. One thing is certain: this is not going to stop, and I would wager that most academic publishers will licence their corpus for AI training in the next few months; this is big business.
If you’ve been following the blog, you may already guess that my own stance diverges from some of the angrier reactions from my colleagues. Some of my own articles have been included in these deals, and personally I have no issue with my work being used in training. I’m already not getting paid, so not getting paid by another exploitative industry is just par for the course in the academic publishing environment. I’m also very pragmatic when it comes to AI training of my own work. I believe that as we move towards further integration of AI tools at all levels of the IT infrastructure, including office tools and search engines, not being in the training data will amount to the same as not being on Google; for all practical purposes, you will be invisible.
However, people should be able to make decisions about their own work and should be able to opt-out of being included in training data if that is their wish. The fact that this is not an option for most of the authors included in these datasets is an indictment of the academic publishing industry.
Concluding
I don’t see much of a solution here other than fixing academic publishing, and don’t ask me how to do that. The problem runs deep, and it’s present at every level of the academic job market and funding structures. The elite institutions that remain elite by attracting funding have it in their best interest to keep the system as it is, as it creates an environment where money comes in, and a workforce that does not get enough of it. The journals want to keep the system as it is for evident reasons. And academics remain underpaid and underappreciated, both a captive audience and provider of free labour in a multi-billion pound industry.
Goodness me, that was a long rant. I’d like to finish with a snarky one-liner, but in this instance I’m afraid that the certainty that the system will not change in my lifetime makes me feel quite bleak.
4 Comments
Kathy Bowrey · July 25, 2024 at 4:55 am
There has been a great change though in institutional management of copyright and rights retention policies will seed a different path. Have you looked at Stephen Eglen’s map of this lately:
https://sje30.github.io/rrs/rrs.html
Andres Guadamuz · July 25, 2024 at 8:59 am
Thanks Kathy, I hadn’t seen the map lately, glad to see more universities with rights retention policies.
Anonymous · September 22, 2024 at 6:48 pm
Excellent Read, thank you! As a first-year PhD student, you have shown me a different perspective! much appreciated. Sonia
Andres Guadamuz · September 23, 2024 at 4:49 am
Thanks!