I live in a country where the selection of available books, especially in English, is very limited. Buying online from foreign markets comes with a long list of administrative hurdles and limits.
If it were not for Anna's Archive and Z-Library, I would've never been able to read the books that shaped who I am today, or keep my passion for learning alive.
Thanks, AA and ZLib! (Also, thank you to the authors whose books and knowledge I consumed without being able to pay them back.)
Thanks, but I don't use e-readers as they are not available here.
I've been using MoonReader for many years now and settled on pretty good parameters that make the reading experience very comfortable on both my phone and my tablet.
I’ve noticed that people today often bristle at any suggestion that one connect a device to a phone or computer with a cable – on Reddit, one will often get downvoted for this. Apparently, a lot of younger people are hardly aware this is possible and it strikes them as overly complicated or for old people. People want to wirelessly transfer stuff, and what the OP linked to is a popular way to do that with Kobo.
I'm old enough to have used computers before having any Internet, and in 2026 the idea of plugging in devices to transfer files to it does feel like a fiddly relic of the past to me.
I agree that a USB cable is the most practical option. However, the aforementioned site is useful in a specific scenario: if your Kobo is very old, macOS won't recognize it.
On my Kindle I use KOReader Zlibrary plugin which allows you to download books from within the reader. It's more convenient then any send-to-kindle workflow.
Look, fair enough from your perspective. But a lot of those books probably wouldn't exist if the author couldn't make some money from their work.
I can't find the post but years ago on Reddit an author posted stats showing when her book turned up pirates online, real sales for it collapsed.
Because of this I make a point of buying books, programming books especially. Yes I download pdfs, I use them as previews. This has led to buying way more than I would have.
Anyway, I appreciate this doesn't apply if you live somewhere that these books can't be purchased. But everyone praising these sorts of sites tends to look at them from only a positive perspective.
> But a lot of those books probably wouldn't exist if the author couldn't make some money from their work.
I think that's at least a bit debatable. People thought that about (normal) libraries back in the day, but it ended up having the opposite effect.
Not to mention out of print books or academic books which is a big usage of sites like these, since lots of people prefer physical books and only reach for pdfs as a last resort.
Can you imagine if we didn’t have libraries and someone tried to create them today? From publishers to right wingers, they would be painted as communist plots to destroy creativity.
The Internet Archive tried, at great cost and peril, to defend its ability to lend books as an online library due to format shift (physical books get first sale doctrine, ebooks are licensed, you cannot own them), and were told no by the system, so “pirating” it is until copyright changes and becomes more reasonable. Disk is cheap, and the Internet global. Global distributed storage system durability and availability is the path to success until laws change imho.
(Archiving culture alone is not the same as also enabling universal access to the culture and knowledge one is acting as custodian for and serving to global citizens)
I think I agree, the FAR bigger impact on my book's sales was Google search deciding not to surface it in search results. Presence on pirate websites had no effect, and eventually I switched to the PDF as "pay what you want."
Libraries spend like $2B / year buying books https://www.imls.gov/sites/default/files/2021-08/fy19-pls-re..., which is like 10% of the total book market. So even if no one ever bought a book because they first encountered the book, author, or genre in the library that's already a signficant difference
I live in Iran and the administrative hurdles the op was mentioned are not an issue here because you just can't even buy intwrnationally to begin with so there is no hurdle you might need to circumvent. The few English books I have are largely illegal reprints of a pirated version or some old ancient printed version that have somehow gotten imported (no clue how or is there an actual legal way)
I remember opening "thinking fast and slow" and noticing the weird paging. After checking the official version's page count and seeing how the version in my hand doesn't match, my best guess was that someone had printed a pirated epub version.
I rambled so much that I forgot to say what I laid all that introduction for.
I'm not part of the market for these products. I don't have access to them nor even if I buy some imported (probably illegally and by single persons) or printed version, am I going to benefit them since I'm disconnected monetarily from yhe author.
Me reading pirated versions of these books has no negative effects on the earnings of the authors.
> Knowledge should be free. It was never created in a vacuum.
This is a common perspective on HN, but it's so jarring. Someone violates an open-source license and we grab our pitchforks. Someone pirates books and it's fine - really, the authors should be thanking us.
Good books are incredibly challenging to write, more so than good software. It's not like you grab Harry Potter and say "I'm just gonna change character names and rephrase some of the text". Most authors recognize that not everyone can afford books and then contend that some amount of sharing is healthy, no different from borrowing books from a local library. If you ask nicely, they will probably send you a PDF for free. But the scale of online book piracy is absolutely staggering and demoralizing, and most of it has nothing to do with taking any serious moral stance. It's just "lol, why pay when you can download for free".
Taken out of context, you're right. But the parent comment couldn't buy these books even if they wanted to. I'd say there's a consensus that the primary motivation for piracy is hurdles to access having nothing to do with payment.
I think iTunes Store and Netflix both showed 15 or so years back that if you give people and easy and convenient way to pay a reasonable price for music/movies - a huge number of people will willingly choose to pay and support the artists/creators, instead of hanging out on Bittorrent trackers and paying for seedboxes and sharing with friends.
And the siloing of movies by Netflix nd all the other streaming services, and the introduction of advertising into the "reasonable price" tiers, shows that people can and do remember piracy in still an option, for when corporations and copyright holding groups enshittify things. Lately amongst a lot of my friends I've seen more usb stick with movies being shared that even in the heyday of Bittorrent.
BitTorrent is one form of collective archive for things that either have restricted or no commercial availability. I wish more cultural artifacts were archived this way.
There’s also the argument that copyright has been extended to the point of absurdity.
I respect copyright, but I can’t respect a lockup period that can push to 130 years or more. For example: if JK Rowling is alive in 4 years the first Harry Potter book will have a valid copyright extending from the 20th, 21st and into the 22nd century. Is it really defensible to say that your great, great (great?) grandchildren should benefit from a government mandated monopoly on your work?
JK Rowling is the exception. The duration is one thing, but so are her proceeds. Most authors are never going to see that much money, if any, from writing. Maybe copyright would expire once the original author has made enough money to pay for lavish living expenses for themselves for the rest of their life, inflation adjusted. So their family wouldn't automatically be taken care of, but they are.
Kind of weird too think about it that way, but food for thought.
I'm not here to defend publishers or the insane current copyright terms. However, in the traditional model of publishing, publishers subsidize production of new books from unproven authors--through book advances, copyediting services, and printing and distribution costs--via the money they make on the few successful and very few ultra-successful books.
If you take away copyright, you reduce the revenue of publishers, which reduces the number of unproven authors they can take chances on. (They're not very good at picking winners from the pool of new author manuscripts, not nearly as good as "agents" like to think they are, but that doesn't matter; they still wouldn't be able to take as many chances.)
Sure, but for the lower volume authors the royalties at year 20 are effectively 0. Most books don’t even sell out their first printing. Its a money grab to defend/prosecute copyright on a book that you couldn't even sell.
100+ year copyrights only help the descendants of the people that have already become extraordinarily wealthy. Cutting off at 2 decades would have close to 0 effect on the huge majority of creatives, while benefitting society immensely. JK Rowling would still be a billionaire, and small authors would be fine too.
I think that as you said, she's the exception. Why are people using superstars as the rule, instead of the exception? Authorship covers everything, from software to how-to books. Comic book authors. It covers authors which are barely successful. Most authors struggle to write all sorts of little pieces which used to end up in magazines, thousands, just struggling to get by.
While the current copyright length indeed seems excessive, looking at the most successful and determining fairness seems very untoward. As it stands now, an author could build up a library of their writings, and perhaps even sell it for their retirement, as it has value.
Is that what is happening often for indie novelists? I don't know, but to others in this thread know?
I know you were just musing, but I really dislike this "let's create a hyper-invasive tax framework, so that anyone who etches out a little wealth has it capped" concept. There's no wealth to redistribute, not in the way people think. Money isn't real. It's just an indicator of things. In some cases, it's an indicator that someone owns a lot of stock in a company that could be worth 1/20th of its value tomorrow, after a crash, and further is valued at hundreds or thousands of times the actual value of the company today.
In such cases, it's often an indicator of capacity to steer the economy, not of any tangible wealth.
But people need something to strive for. Wealth is one of those things. In a free market economy, wealth is the reward for job well done. It does make people hustle. It's not perfect, but we've seen how poorly and incapable any centrally planned economy seems to be.
As a Canadian, I do believe the government should be involved in certain things. I have a post office, schools, police stations, fire department, food inspections, and so no on that the government is involved in, so I think health care makes sense too. Yet I absolutely do not believe the government should be planning most sectors of the economy, nor should it be meddling too deeply in the wealth ratios of its citizens.
To put this in that context, would you want to have the current US administration determine how every sector of the economy works, and how people are paid, etc, etc, eg central planning? Can you imagine?
Of course there is nuance. Of course there are exceptions. But overall management of individual wealth seems very invasive to me.
> Someone violates an open-source license and we grab our pitchforks.
If you look back through the annals of Free Software, one often encounters the claim that the GPL was a way to use copyright against itself, and if there were no more copyright, there would be little need for these licenses.
The issue with copyright law is that it is all or nothing. Rights to a work are either tightly held by an author/publisher, and even downloading a small excerpt can get you in trouble, or it is fully public domain and open for any and all use.
There needs to be a middle ground, such as: after 15 years of publication any private individual can access and read the work for free, but the rightsholder still controls commercial sales, merchandising, licensing, character rights, movie rights etc.
If this is not bad faith argument, then I don't what is. When someone is violating an OSS licence, they are doing it for commercial gains and monetary profit. Nobody is angry at someone using FOSS software for himself with no money getting involved.
As opposed to that, books, movies are pirated for personal consumption. Not monetary gains. If someone bought a $30 book, and then ran a BaaS with millions of VC money in his pocket, people in HN would be angry at him, too.
If knowledge is to be free, that means there should be no restrictions on how it is used. Even an open source license misses the point, because the implication is still that one person can dictate how another person can make use of knowledge. It’s still premised on the same dystopian view that a person can own an idea.
Except it’s not. That poster is just doing an IP law version of the “paradox of tolerance”. Their argument is just: you say you want information to be free yet you believe in licenses that keep information from being made un-free.
Knowledge is free as in *free beer once in a while because you genuinely can’t pay*, not free as in *scale up the freemium model, keep grabbing free stuff daily, weekly, monthly, and then start running your own pub with the free beers you took from the neighboring pub.*
This discussion is intellectually dishonest. Either some people here genuinely dont understand the concepts of kindness and gratitude, or they do understand them and are just choosing to spread falsehoods anyway.
Just because my beer pub isnt going out of business because you took some free beers doesnt make it ethical for you to exploit my kindness and use those free beers to build your own competing beer pub.
If people are still confused: that setup is not sharing knowledge. It is stealing with nicer branding to help you and your friends sleep at night.
But then we step back a little further and ask what this thing that is called property, why should any human be granted any beyond what actually constitute them as an entity of their own.
What matter at the end of the day is not what the document pretend about who possess what, but how people feel in their life, what they can access to, and what they are bared to access for which actual reason.
It can't even be purely narrowed on what human people feel like. We all know our species is dependent on many physical phenomena and other species which owe nothing to us.
Property may be a social construct, but the costs of living are not.
You can question ownership in the abstract, and I am not even against that conversation. But that does not answer the actual point here. We still live in a world where food, rent, healthcare, clothing, hygiene, servers, tools, and time all cost money.
So if someone gives something away out of kindness, access, public benefit, or community spirit, that does not automatically mean everyone else is entitled to industrialize that kindness for their own business.
Open source is not a mystical anti-property pact.
Open source is not a contract where people are expected to provide endless unpaid labour for others to build businesses on top of. At some point this stops being a discussion about sharing knowledge and becomes a way to justify taking advantage of people’s work.
I just don't like these replies, if this was sarcasm, it might not have worked for me. I just find these social comparisons deeply unserious when the discussions are against theft or harm done to actual people with actual human needs.
Not so. Stallman created copyleft licenses as a defense against the current implementation of copyright. Copyleft uses the existing system of copyright to protect authors of free software from people who want to use copyright to restrict distribution. It wouldn't be necessary if copyright didn't exist.
Copyleft was created to protect users of free software from authors/distributors who tried to use copyright to control the software running on the users' computers.
Sure, but an idea is not a (physical) good, nor is it a service. Coming up with an idea or writing a book is a service and should be paid for (probably by commission), but (and Stallman would agree) the idea or book itself should be free.
If knowledge is to be free, then any corporate/commercial interest that locks up modified knowledge (code) to run their own services should have that locked-up knowledge freed from their commercial silo as well.
Knowledge should be free, but that can't be treated too literally. Not a unique case of this kind of phrase. If we're doing capitalism people have to be paid somehow, and when people say "free" they don't mean "absolutely". I mean, speaking of open source, consider "free" software.
Open source licenses are almost entirely unrelated. They're strictly a hack around the copyright system, and not only that, they literally do nothing other than grant you rights you wouldn't otherwise have. Talking about open source is mostly a distraction. When people say knowledge is free they almost always mean access to knowledge. Open source grants people access and more.
People are not mad that they can't just steal things, they're mad that access to things is tied behind massive gatekeepers (essentially indefinitely...) that essentially exist to continue to enrich themselves while somehow almost none of the money makes it back to the authors, and is sometimes completely untethered from where the money comes from that funds the works to begin with. You can't just freely navigate, search through and consume information, it's all tied up behind various pay walls and monetization schemes while authors starve anyways.
We could have a more equitable and reasonable system that allows broad access to knowledge while providing some approach to monetization that is reasonable for both people seeking it out and people consuming it. There's little point in trying to enumerate the number of ways it could be done. We already have a system for taxes, we already have seen commercial schemes like Spotify, you could slice it thousands of different ways. Plenty of pros and cons. I'm just saying it could be done and we know it could be done.
But it can never work if all media and knowledge dominated by rent seeking gatekeepers standing in the middle whose primary purpose will always be to enrich themselves first and foremost. They will always want to get more and give less, because that is more or less their fiduciary duty.
> Even an open source license misses the point, because the implication is still that one person can dictate how another person can make use of knowledge
To be considered open source software, the license cannot impose any restrictions on how the software is used. You are free to use the software for whatever purpose you want.
Now that AI is decimating a lot of bullshit jobs we need a basic income of some kind. A universal one.
That would enable the authors, activists and hackers to pursue what's meaningful, instead of the profit of the multinational leeches that do not need to adhere to laws, borders or taxes.
If a zillion dollar corporations (Meta and the likes) can torrent Annas Archive and decimate the copyrights I see it a moral imperative for the people to do the same and spare the pennies that would profit the publishing / media / ... industries to support the authors directly, instead of the trickle up method, where majority goes in to the hands of the dystopican narcissist zillionaire.
I too think Basic Income is a necessity. I don't think it can happen in the US (for cultural reasons) but I think elsewhere it can work. (And indeed in many places it kinda, sorta, already does).
>> That would enable the authors, activists and hackers to pursue what's meaningful
I would counsel not using the word "meaningful" in the context of BI. We already have a way of evaluating "meaningful", it's called "money". If society gets to judge what is meaningful or not, well, that's the system we currently have.
BI is about letting people do whatever they like especially if it is meaningless. BI implies an economy (you have to spend the Income on something) so meaningful will always be richly rewarded.
Incidentally publishers exist to act as curators and filters. The value they add is real. There's no shortage of self-published stuff on say Amazon, but 99% of it is drivel. I go into a bookshop to find the 1% that at least someone thinks is worth reading.
The reason I think it is hard in the US is because there is a very strong "work or die" ethic in the US. Everything is driven by money. Even basic things like healthcare are driven by money. Your life after retirement is determined by how much money you accumulated. The word-association between "poor" and "lazy" is strong. Taxation should be light. Each man should keep what he accumulates.
BI by contrast values people over money. It recognises not just the social responsibility of the rich to the poor, but also the dignity of being human.
Some countries are further along that path than others. Health care, education, unemployment benefits, are all steps towards BI. The wealthy are taxed to pay for the poor. Ultimately the suppresses the excess, while raising the floor.
From a cultural point of view, the US has many steps to take before society is really for (real) BI.
I think I get your point, but few PhD candidates pay tuition in the US and most actually receive a living stipend. In exchange they act as teaching and/or research assistants.
Another way of looking at that is that the people who have been able to accumulate excessive gains from the capitalist system are forced to pay some of that back, to maintain the system that enriched them and those whose labor they profited from.
That seems like a screwy but ultimately more than fair deal for the top 10%.
Especially when the alternative is pitchforks and torches.
Before you go to the capitalist in their home and torch it or beat them in front of their family, try striking for a while. Usually they soften rather quickly when labour is collectively withheld.
I disagree with your definition of meaningful here. Society's willingness to pay is certainly a signal for meaning in an output but it seems quite inaccurate. Think of the number of artists and thinkers that weren't recognised in their lifetimes, their work was still meaningful but society hadn't discovered it yet.
Similarly, there are a number of things that would be incredibly meaningful to all of society (eradication of disease, nuclear fusion, etc) that we choose to deprioritise to instead eat fois gras and fight.
Sorry to jump down your throat on this, we're on the same side, I think BI is inevitable and worthwhile. But it's worth pointing out that BI enables more than just the meaningless things.
BI does not stop people doing meaningful things. Society will (mostly) reward things which add value. We have a very efficient system for that, and it doesn't go away under BI.
We are already spending massive amounts of money on disease, fusion and so on. There's no issue there, and BI doesn't move that needle.
At the moment society (especially in the US) operates on a "add value or starve" basis. (That's an over simplification, but the underlying "morality" us strong in that direction.)
BI moves the needle for those who are not "adding value" (in a materialistic sense.) Artists and Authors are free to spend their time creating works, of which a rounding error will have any value. Sure there's some unappreciated author out there cranking out literature, but there's also everyone else cranking out rubbish.
BI doesn't make 'big things' easier to do. Arguably it makes them harder. Rather it allows individuals to gain satisfaction from little things. Budding poets can write all day long. But if (great) poetry is currently ignored, do not expect much on that front.
I say this not to denigrate BI but rather because allowing the meaningless is precisely its goal. To miss that is to miss the point. It allows people to find worth and dignity without having to add value to society.
I think you’re getting pushback because of this choice of language. It’s not the only goal, but it is a key feature. BI supports choices of how to spend your time and enables freedoms.
> I would counsel not using the word "meaningful" in the context of BI. We already have a way of evaluating "meaningful", it's called "money". If society gets to judge what is meaningful or not, well, that's the system we currently have.
I’m also going to take issue with this interpretation of “meaningful” - I’ve known several amazing crafters and artists who have had an incredibly hard time doing their craft simply because the demands of capitalism prevent them from putting the time and effort into honing those skills, finding a market, creating a portfolio, etc.
If anything, I think BI is just as likely to add meaningfully to society as it is to give people the option to do meaningless things.
Universal basic income is dystopic. It's a way of finally making everyone completely dependent on the government. It's a Brave New World kind of dystopy.
I am dependend on government not giving subsidies to my competition. I am dependend on them not to raise taxes. I even depend on them for keeping the infrastrukture alive.
If I cant trust the state I life in, it is time to move to a different state.
To me it makes you more dependent on the other people in your society - only if you don't earn more than the BI of course.
That is true under Western Capitalism too, you're already dependent on your society, unless you filter and pump your own water; grow your own food; generate your own power; make your own devices; school your own children; do your own healthcare, etc.
For those reliant on BI, instead of greed being the moderator through which your income is determined, it is determined through the democratic processes of state.
Now, if your have an oligarchic dictatorship like USA and Russia, sure you'd need to be particularly worried if you were relying on BI. If you've got a strong democracy (probably true of any country with state-run services that are competitively priced) then BI should work if there's enough total income that it can be redistributed.
It’s worth pointing out that violating a FOSS license is ALWAYS about denying users rights the upstream already gave them. Violating a FOSS license is stealing from your users.
Just because something is challenging, that doesn't mean you should get paid for it. Art and business are two distinct endeavors. All the copyright and IP issues come up because of this one confusion.
Capitalism works great for shoes. You pay the proletariat for their labor and they labor away in your shoe factory. The shoe economy ticks over on scarcity.
The knowledge economy is different. It’s hard to see how the system works in a world where everyone has the equivalent of a shoe replicator in their pocket.
Ironically the “free market” only survives by having arbitrary regulations on shoe duplication enforced in the interests of shoe-rights holders.
The shoes would be free and not a market product, as is it currently air to breathe.
Free market is just the best way we know of dealing with scarcity, every other alternative that was attempted seems to fail sooner or later. If you invent a way to get an infinite amount of something, that something goes out of the free market rationing.
For the knowledge, a lot of free market more radical defenders don't believe at all in current state of "intelectual property", much less if et has to be enforced by the state. They are for "industial secrets" (if they leak, you lose them), NDAs (if they leak, you enforce the contract) and similar formulas.
> This is a common perspective on HN, but it's so jarring. Someone violates an open-source license and we grab our pitchforks. Someone pirates books and it's fine - really, the authors should be thanking us.
If the books were released under an open source license, there would be no problem here?
>This is a common perspective on HN, but it's so jarring. Someone violates an open-source license and we grab our pitchforks. Someone pirates books and it's fine - really, the authors should be thanking us.
This is only apparent contradiction. The underlying issue is of course the social concentration power.
It's not the same when an author is deprived of virtual money the copyright system entice them to extract from readers regardless of whether they have money or not, and deprive authors from a negotiation mechanism against corporate that swim in money. In both case, with current legal systems the most obvious law enforceable mechanism is copyright. It doesn't mean the underlying issues at stake are the same.
What's really staggering and demoralizing is that humans have all that it takes to feed all mouths, make equal incomes, live in peace in all kind of diversities that encompasses reciprocity of accepting differences, and yet we end up with people dying from war and starvation while other accumulate a toxic level of wealth in a system that tries to uniform everyone and harshly cuts anything that don't fit the standard box.
It's almost as if we are in a comment section where most peoples jobs exist due to "content" either being stolen (AI most recently) or crowd sourced under EULAs that effectively steal it.
Not that I think a back door to the shit Disney and other corporate content onwers pull is nessecarily a bad thing. But it is funny to see people here gaslight themselves into believing they have some moral right to just take what others have created.
> most of it has nothing to do with taking any serious moral stance
If many people are immoral, curb the roots of immorality then, fight that one battle - do not shift it to others' sound behaviour exercised with proportion and judgement.
The pitchforks aren't being raised for licence violations, they are raised because software should be free and unencumbered and now somebody is distributing it in a way which doesn't afford the same freedoms to the users.
Free/copyleft licences like the GPL are just the way we've been able to effectively make software free in a world in which copyright applies to software.
> Someone violates an open-source license and we grab our pitchforks.
Where do we do that? All LLMs are still doing it. I've not seen any MIT license or BSD license of all the repos LLMs have gone through. That's excluding copy-left licences. Zero pitch-forks in sight.
> Someone pirates books and it's fine - really, the authors should be thanking us.
I agree here with you. And I'll give a better reason to not pirate for most people. In the seas of endless content, If you want to read/learn from a book - try paying for them. You won't read most books. You'll curate/research better before buying. The books you buy will be worth it in your own mind. You re-consume them more which is very important IMO. For most pirates, I think the issue is not unavailability of resources/content. But need to consume in a better way. Buying slows things down. You'll consume better. Quality over quantity. Even then I have a ton of humble bundle books that I've not touched at all.
> Knowledge should be free. It was never created in a vacuum. It belongs to us all
Imagine you're a professional writer and it's your main source of income. How would you feel if someone said this to you? Would you still want to write books?
Things have changed a lot since the late twentieth century. The kind of people you imagine, who can live full-time off writing, are responsible for a vanishingly small amount of the books that appear today. Piracy has little to do with it; this is primarily due to the fierce competition from other books, the glut of content available today, but especially from mobile phones as fewer and fewer people read books. Even for those who make appreciable income off books, the books are nevertheless usually a side gig alongside other hustles.
Imagine that terrible time before Copyright existed, and there was no motivation for anyone to make art, literature, or music.
"What an inestimable advantage it would be, if, in every branch of literature, there existed only a few but excellent books! This can never come to pass so long as money is to be made by writing." --Arthur Schopenhauer
Keep in mind "copyright" explicitly does not cover "knowledge" or "ideas".
The reason I buy books is rarely for knowledge or ideas, its either for a good story in the case of fiction (which the author definitely should have the right to exclusively commercialise), or for the authors explanation of and idea or some knowledge which goes beyond the raw information I could find in the scientific papers or higher level descriptions.
Good storytelling and teaching are valuable and should come with some sort of exclusive rights to control and profit from by the author. And even bad storytelling and teaching should have that same protection from other people distributing it in ways that restrict the authors rights.
Clearly 130 years of protection is insane, and all it does is keeps Micky Mouses lawyers able to buy new yachts. But as others in this discussion have pointed out, after 20 year almost all of the authors who are still earning money off their works are already rich beyond most authors realistic hopes. I'm not sure 20 years is "the right length" for protection, you sometimes hear stories of works being rediscovered and becoming wildly popular more that 20 years past the original publication date (Kate Bush's Running Up That Hill getting back into the charts on the back of it being used in Stranger Things - for example).
Obviously yes. While I have the privilege of earning money from services instead of products, I still think that producing creative works is important and should be done whether the motive is profit or not. Many things are not profitable. Should we leave them unwritten? Leave them to those who have the time to spend. For those who have chosen the life of producing products that are easily copied, it is part of _reality_ that those things will be copied when copying benefits the copier. That doesn't mean I think all books should be free. So many books I buy because I don't have another choice or because I want to support the author. But expecting everyone to be in the same situation as me is nonsense.
This is such a ridiculous statement. The people who spent their time building up this body of work deserve to be compensated. Take whatever job you do and imagine people confidently stating that you should work for free.
But do their grandchildren deserve to live off the proceeds for the rest of their lives as well? Say I'm a carpenter and I make chairs. How many chairs do I need to make before I get to retire? If I make a really really good one and get it put it in the right place, one should be enough. Just sit back and collect $1 for every time that someone sits in it. I don't have to make the chair particular good or comfortable, just get it into the right place where people will pay. And then I don't have to work for the rest of my life. Nor do my children. Or their children either. Framing the question at the extreme, that one should be expected to work for free, is just as absurd as framing it as some people should just never have to work at all, ever. No one put me in charge, but I believe people need to do work of some sort. Who gets to decide what counts as work or not isn't for me to decide though, so the system we've got is just this whole unorganized unplanned economy.
Constant payment into perpetuity for the replication of digital information is just a form of rent-seeking. Except, unlike a landlord, you're not obligated to correct defects.
I'm glad your username specifies your location. My biggest pet peeve online these days is someone telling a story about "my country" but never specifying which country that is.
We sell ebooks in every country, including Tunisia - https://www.ebooks.com/en-tn/. I understand that the price of books is sometimes prohibitive, but it's largely outside our control.
The issue is not the amount; it's the currency. Ours is not convertible.
To put it into perspective: even *free* services can be completely inaccessible here if they require payment info for verification purposes.
Acquiring hard currency to spend online is truly a bureaucratic nightmare, and even if you manage it, the annual limit is strictly capped (at around $300).
Some of my published work is pirated heavily. That's not my main income source, so I just shrug and let it go. If anything, I'm probably happy that people are reading my work. Especially if it's people that can't afford it, I'm glad they enjoy my work -- those books got pretty high reader ratings, and it seems to me many readers are actually reading the pirated version.
But I do have friends that depend on this income source, and fighting piracy has become a part of their day job. It's not a fun thing to do, they'd rather spend time working on their next story, but they still have to do this everyday. I feel for them.
That's almost every country these days. At least in the EU. Amazon doesn't carry nearly the selection you'd want, and certainly not in the non-fiction department.
https://SourceLibrary.org has about 16,000 rare books translated — most for the first time. 50,000 books archived (will be translated when we have $$ for it). More tokens than English Wikipedia and about .75 petabytes.
Not sure if we will qualify for a bounty, but happy to share! Btw, we are looking for funding from small or large donors who want to help us translate the Renaissance…
I can't quickly tell what all you have archived^, but I have some friends who are academic historians who might be interested in certain categories of work (and could help verify some esoteric languages) - is it possible to search by region or language?
Have you reached out to any types of historians WRT the project? It seems like some PhD students might be able to find some projects in this work etc
Yes, this is designed with historians and librarians from the Embassy of the Free Mind (https://embassyofthefreemind.com) in Amsterdam, stewards of the collection of the Biblioteca Philosophica Hermetica
Please share with historian friends. I’m not great at socials or fundraising but this was really designed to support humanists. It can give DOIs for the versions of the translated books, which means they can be quoted and cited in academic papers.
Tip: Try it in Claude or Claude code (even better)! Just point it towards the source library. It can find quotes and evidence on any topic of interest. Or try the librarian — our source-grounded research agent https://sourcelibrary.org/librarian
Interesting site. I picked a random topic to listen to — flying chariots or something like that — and the conversation of one person talking and the other whispering was definitely not to my preference. I’ll have to take another look when I have more time.
How do you handle the more densely written pages in script ? I did a very similar exercise OCRing works from this exact collection, but I stuck with the English books for the first pass.
I don't see raw token counts, just a list of steps and page counts. For example, what is the rough average token count per page in the ocr and in the translation steps for a Greek book?
I have seen Gemini costs change quite a bit when processing very similar books from the same series lately, mainly because thinking tokens have increased about 5x. Has that has happened to you as well?
Edit: for ocr I am using about 15k-25k tokens per page, but I have a complex prompt.
Anna’s came clutch for me yesterday. I spent a few days trying to find a zip file of a CD that came with an old book from early 2000s on programming. One of those Thomson Publishing slap jobs that I actually enjoyed. I checked used copies all of them said does not come with CD. I tried googling around, nothing. LLMs couldn’t find it. ChatGPT kept saying it is on the archive (no it isn’t you useless piece of shit). Anyway, on a whim I went to AA, lo and behold, zip files for both first and second edition. Godsend.
I wonder how long it will be before they offer bounties for internet scrapes.
Cloudflare captchas have made the internet unusable for me, and I'm sure it will only get worse over time. I'd much rather just browse (or even torrent) a copy of archive.is or similar. The latter would be much better for privacy, and hey, I run ad blockers anyway.
Someone on your network is likely playing one of the games that are monetized by bright data proxies, it was a thread on here a few days ago. It could be your smart TV. If you find the culprit and remove it there's a decent chance your ip reputation will improve enough to not see those captchas
I live in Canada but was born in Italy.
I want to often buy books in Italian (digital) and it's incredibly complicated because licensing deals are never for people speaking Italian in Canada.
"If you work at Google and have access to this data, then we realize that $200,000 means little to you, but you'd be hailed a legendary archivist if you're able to sneak out this data."
Yeah, but still, I think I'd prefer to do it anonymously than be the legendary archivist rotting away in prison.
I think the main source may be in Russia; or that was with libgen.
But I could be wrong.
I am more surprised to see that there are so few alternatives to it. Or perhaps I am unaware of them but after Facebook and co declared war on libgen, and libgen going down, there were surprisingly few alternatives. Anna was one of the few. I still don't know what happened with libgen, but since the attack it really is kind of semi-gone.
Libgen and similar are more alive than ever with an extended botnet growing weekly. The "googlers" indexed framework is shrinking everyday, so users wont find it in those search engines easily, also it is hard to keep up with a good storage considering price trend last 5 years so the botnet and torrents are some kind of solution I guess. (We for instance are considering to use the old taping system, cause is at least a viable alternative.
I went to that link and almost ran a malicious script in my terminal for a supposed reCAPTCHA verification. Just before pressing enter, I verified it with gemini and it said it was a ClickFix script designed to steal passwords and other sensitive information. Because of all the weird things we have to do for CAPTCHA verification these days, I almost believed it was legitimate and went with the steps. It is really frightening.
Lying about your assets to avoid paying a lawful fine is criminal. Just because they can’t see your money doesn’t mean they can’t prove that you have it, and can’t jail you for hiding it to get out paying a fine.
i'd strongly caution anybody foolish enough to go down this path
financial watchdogs and international treaties make it impossible unless you are perhaps a multi billionaire who can afford to buy people at the political level
It’s the “ Copy data into extra large capacity micro sdcard” step that gets you caught. Nobody is stopping you from leaving with an SD card or USB stick at Google.
Importantly, this would work way better on a speedcube than the rubick's brand cubes. Never use the rubick's brand rubicks cubes. I use the Moyu RS3m V5 maglev, and I think that it would work well for hiding uSD cards.
I wonder how hard Google could press an estate. For a living person the main consequences would be criminal, not civil, but you can't go after a dead person criminally. Civilly it's not really clear to me that Google would have been harmed enough to create noteworthy damages.
The book publishers might actually be the bigger problem. They'd have civil copyright infringement claims with giant statutory damages.
You could look it up and see for yourself. It's applicable to all sorts of collections such as map data and I'd presume also a book database but IANAL so best if you see for yourself - my intention was just to point out (what I hope is) the applicable legal principle for anyone curious about this
Doubtful that random employees just have access to the full archive. And among those few that do have access there are probably automated systems that will catch you once you start downloading even a small percent of the content
Quite a few textbook authors I know are paid well to be part of the whole scheme (kickbacks, forced yearly repurchase for the 'online' component of books, etc). So I think it varies a lot.
Up to 500k for OPSEC failures is interesting, as well. It gives me hope that there are wealthy individuals contributing to sharing books, or many small donations.
Micropayments (charged in mills, not cents) is the solution. Downloading books remains essentially "free" for the individual but the Internet scale is such that authors would receive compensation for their work. The Spotify model is better than downright piracy. It is very difficult to compete with free.
I would argue precisely that downright piracy is better than the Spotify model. It is based in micro-micropayments, so much so that even at internet scale very few artists outside of the uber-macro-Taylor-Swift size get proper compensation. Sending a single dollar through Bandcamp amounts to hundreds of listens on Spotify.
It really sucks, but I'd rather pirate and know I'm at fault with the artist – maybe I'll buy some tracks off Bandcamp to make up for it – rather than let Spotify cover the transaction with a legal blanket, while the artists get almost nothing in return.
AA compiles from everywhere; LibGen and Z-Lib served as the major sources of books. This has unfortunately led to search results for a particular book containing multiple versions of that book, and it is not readily clear which one is the highest-quality version. A real library would have librarian staff who carefully curate everything, but in the pirate world this isn’t realistic so it just gets all thrown together.
LibGen is now more or less a dead project. The servers of the original version were reportedly seized a couple of years ago already, and other sites under the LibGen name were notorious for piggybacking the original collection and just plastering it with ads. If one wants to upload stuff, better now to upload it to Z-Lib (not a perfect site, but still) and it will then get picked up by AA in a few months.
Surely this is realistic now (or soon) in the form of LLM curation. A few auto-librarians reading everything, looking at different versions side by side, making choices, etc.
LOL, not realistic at all. The differences between book versions lie in more than the raw text. Moreover, an archival project would be loathe to favour or disfavour a version unless an actual human made the call.
Oh, I have no expectation that Page or Brin would do something like this, let alone do it openly. But Page did seem to care about Books access at one point, and I find the image of him secretly leaking the Google Books corpus amusing.
Gemini should be trained on those books already, so in theory it could regurgitate some verbatim fragments (as NYT lawsuit against OpenAI showed some time ago).
Gemini, gpt and fable are actually very good compressions of internet content. But is lossless compression as in they kept the most important part (for them to fulfill the next token task) and found a way to mimic the rest.
I think you meant lossy compression and not lossless. I'm not suggesting this as a method to extract those books from the models, which by their nature are not databases. Just commenting on the somewhat surprising fact that the bigger the model the more likely it is to produce some (short) excerpts of the original training material
> Virtually all major companies building LLMs contacted us to train on our data. Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality.
> We have given high-speed access to about 30 companies. Most of them are LLM companies, and some are data brokers, who will resell our collection. Most are Chinese, though we’ve also worked with companies from the US, Europe, Russia, South Korea, and Japan. DeepSeek admitted that an earlier version was trained on part of our collection, though they’re tight-lipped about their latest model (probably also trained on our data though).
It's at least 30 companies, each of which paid hundreds of thousands of dollars.
How is Anna's Archive funded? I see they have memberships, but it's hard to believe that can fund all these bounties - some going into six figures. Ask any FOSS project about funding by that method.
It seems like there are some deep pockets funding them.
The US should just find a way to quietly share literature access with the Russians, rather than letting piracy be promoted and facilitated for US consumers as freedom-fighter "archiving".
Between all the piracy, and all the AI training and the purchase/visitor-circumventing AI services, the practice of writing and publishing genuinely good work is being wiped out.
We're killing the goose that lays the eggs, for selfish gain.
Possibly but this act of governmental self-harm is useful to The People. We live in a world where if your valuation is ~1T you can more or less just do what you like. And the work of The People is stolen from you and launderd.
In such a world, isnt it useful that governments are stupid enough to give adversaries reasons to undermine it? When the government props up a corporate tyranny domestically, and racketeering, should we make a temporary alliance with all its enemies?
</i> (Eg., the provision to AI companies of all corporate secretes and competitive practices via prompts, eventually to be used against their capital interests and their labour interests).
This ship has sailed for academic publications, and academics define that term very liberally because we want to read everything, fiction included. The shadow libraries started off as a way for scholars in ex-Soviet countries in particular (but also India, SE Asia, etc.) to access literature that simply wasn’t available in their country. But the shadow libraries proved so successful and convenient that academics in all countries are using them now, even if they have access to official subscription services. I use AA several times a day and so do the researchers around me in my office; at conferences, if the presenter mentions an interesting publication, the whole room immediately opens AA on their laptops, etc.
Even if projects like AA didn’t have nation-level support, academics would find a way to keep as much of it as possible going. After all, we’re the ones who compiled the bulk of pre-2020 material, and we’re the ones who do all the hard work of scanning from our institutional libraries stuff that doesn’t exist anywhere in digital form.
>We're killing the goose that lays the eggs, for selfish gain
We already did that when the internet collectively agreed decades ago that everything digital should be free for anyone.
We're now 20 years downstream of ad-blocking being a virtuous good, and piracy being the ultimate show of liberty, and now suddenly everyone cares about the creator's revenue stream.
The mask slipped and unsurprisingly the internet is a bunch of selfish morally stunted children. Some of them even pushing 50 years old.
Yes, I am talking to you with the 4TB of pirated content, proud of not loading any ads in the last 15 years, and getting enraged over LLM training.
> Yes, I am talking to you with the 4TB of pirated content, proud of not loading any ads in the last 15 years, and getting enraged over LLM training.
That's oddly-specific :-)
In any case, I have no pirated content that I know off, neither proud nor ashamed of blocking ads[1], but I still get annoyed that a bunch of VCs can use their invested-into companies to launder all the worlds IP, then sell it back to them.
[1] Who feels proud of blocking ads? It's like feeling proud of tying your shoelaces: "Good job, well done, but that's the expectation, son".
>the practice of writing and publishing genuinely good work is being wiped out.
Most of the best literature in the English language was written before modern IP law was even a thing. There's very little good literature written by authors primarily motivated by money.
That's just cultural elitism. I hope you meet someone in your life who finds absolute joy in reading young adult romance novels or D&D fantasy books so you can understand how irrelevant "good" literature is. I love Dostoevsky and Verne (and D&D novels, especially those written by R.A. Salvatore), but I would never judge the modern "IPs" that got my daughter into reading.
Everyone has their own opinion as to what the best literature is, just like what the best music is.
But there is also some consensus. For music it would be Beethoven, Mozart, the Beatles, etc.
For literature it would include Tolstoy, Shakespeare, Cervantes, Dickens, Austen, Tolkien, and many more. I would bet it will eventually include Stephen King but it's too early to make that call now.
There is no such consensus. Those are just popular answers coming from people with eurocentric tastes over the last couple of centuries. Millions in the global south would swap The Beatles for Celia Cruz or Fela Kuti. People in Asia from the 1700s to this day would tell you Tyagaraja is a way better composer than Beethoven and Mozart put together... he is a literal saint in India, actually.
Not to mention, many people would disregard your answer for putting Stephen King (and Tolkien, to a lesser extent) in the same sentence as Tolstoy and Shakespeare.
There is no "better" anything when we talk about culture. No consensus, etc. There are just opinions.
Well, you needed the means to get an education, since most of the poor in those days were illiterate, which is something of an impediment to becoming a successful writer.
I can only think of one writer off hand who wasn’t a wealthy landowner, although it is a particularly notable example; that of William Shakespeare.
Shakespeare wasn’t poor (his parents seem to be of upper middle class standing), he was able to get a basic (but not a university) education and then pursue an acting career (with perhaps a side hustle as a teacher). Whatever the case he certainly wasn’t independently wealthy before he started writing, he needed to earn a living.
He did seem to be in it for the money (and fame) since he wasn’t just a writer he was an actor, theatre owner, and something of a celebrity, and he did make enough money to become a wealthy landowner by the time he died.
I’m not sure piracy or AI training are really affecting book publishing dramatically. But if you have data, I’d be curious to see it. AI scraper bots are a total pain for online publishers and FOSS sites, but AFAIK they’re not really harming book publishing directly.
The consolidation of publishers and Amazon’s own practices are probably worse for authors than “piracy”.
Russians will just share it back (I’m saying that as a Russian). And if not Russians, then somebody else will.
What you can do is make sure people can pay you easily, and not put (a lot of) hurdles in your readers way. And when people can’t afford to pay... maybe let them enjoy your work still, and you’ll get a couple more loyal fans who would pay you when they’re able to.
At least this was my world view before AI has arrived and ruined^W disrupted everything. Now I’m not so sure.
Another explanation might be a general dislike of big establishments like AI companies and publishers (which glosses over individual authors, but they probably make up a negligible portion of total sales anyway).
The problem is that it is quite difficult to access the published papers is you are not in academia or some company that pays for the access, so AA sort of serves that niche to transfer the knowledge. Training on the other hand is a commercial activity to later rent the model, if this would be purely for open weights I suspect everyone was cool with it.
It's not backwards. Which of the two makes a profit? Which of the two comes away richer? Which of the two actually takes business away from the original copyright holder?
If it works as AA seems to theorize, you'd need to:
(a) work out how Google books exposes fragments of books, and see if there's a systematic way of using this to get whole books. For example, a naive approach might be to find any fragment of the book by searching some exact phrase. Then, you can search for an exact phrase from the start or end of the fragment it gave you, hoping it will show you the previous or next part of the book. You can then just loop that to get the whole book.
(b) once you have (a), you need a way of bypassing Google's bot detection/rate limiting. I don't know what current state of the art is, but there may be a solution for sale out there. E.g. you pay to receive a cookie or browser state, and use that to fetch the URLs from (a). Or if you're good/already in the scene, you could do this part yourself.
yeah it's similar to what I would value google books scraping job difficulty wise and data wise, I've done my probing on m own it was water tight as my probing has gone
On another note, if Google's cybersecurity were always one rogue employee away from a massive leak, then it wouldn't be Google. What was the last Google leak you remember, defense in depth people.
Indeed, took a closer look, and the court order that took down their .org domain included "Computer fraud and abuse" claims.
Surprisingly the order is very specific about DNS registrars and authoritative domains not advertising the AA servers, so sharing IP addresses or alternative domain names through other means, like WikiPedia, is not against the order. Which means that nowadays the Wikipedia page works as a pseudo authoritative DNS.
Chinese companies giving away expensive models for free is a symptom of the AI bubble, too. It's not a law of nature that they'll always be able to scrounge up the money for yet another training run.
Shaping the tool that does the thinking is quite valuable when you're in the business of changing how people think - I think we can expect propaganda agencies to be subsidizing model creation forever.
This doesn't strike me as a symptom of a bubble - except in so far as the bubble pushes the competitors models forwards and thus they need to invest more to stay competitive.
If you think Chinese companies always act as a bloc, your mental model needs to get about a billion times more detailed. But in this case just a few details may be enough: There are Chinese AI companies that have released LLMs without publishing the weights.
ByteDance is going the direct-to-consumer route with their Doubao chatbot (the most popular in China, probably thanks to their social media prowess). iFlyTek seems to be angling for enterprise and government use cases, where they already have an in.
The companies that have released weights have in common that they didn't have a monetization channel lined up and their models weren't good enough to make people pay attention with just API access. (You can see with Qwen Max that the calculus can change towards not releasing weights for better models.)
And who exactly among the investors is having their complement commoditized? When Nvidia releases Nemotron, the story is clear, but it's less obvious for say Z.ai's GLM.
That’s misunderstanding why these models are behind. A large part of why they’re behind is they aren’t able to do the reinforcement learning post-training steps that takes a pre-trained model and turns it into a frontier model like GPT 5 or Opus. Instead they do their best to recreate these models using distillation.
Fundamentally, you can never distill your way to being the teacher, so these approaches will not advance the frontier.
[edit, after thinking about it I think my phrasing is unfair. It's not necessarily that aren't able to do it, but they haven't yet shown that they are willing to do it.]
That’s not remotely true. They did distillation as a cheap solution to the cold start problem. You need data/trajectories to hill climb to higher capabilities. All large Chinese labs do RLAIF.
Oh yes, not remotely true. Which is why the frontier labs all have invested heavily in trying to identify and thwart distillers, using known company names / domains to drive their exclusion lists.
It's cheaper to distill than to do reinforcement learning, so of course they prefer that, but if it wasn't an option they could just pay up and spend more GPU time on RL.
>"they aren’t able to do the reinforcement learning post-training steps"
Not yet.
If there is a need someone will come and fulfill. Personally for me now I do not even want to use top models. Professionally I use AI to help with the coding using Junie agent that comes with IDEs from JetBrains. Junie is told to use Gemini Flash and works fine for what I ("I" being an emphasis here) ask it to do. I tried more advanced models and different vendors only to discover credits going down the toilet without any extra benefit.
Internet was a bubble, so was telecom etc. at some point. Being bubble does not mean that when 90% of investments go down the drain the remains are not useful.
"The Internet" was not a bubble. Companies with no long-term business model / sufficient product-market fit that were riding hype were the "dotcom bubble". But when those companies crashed, nobody said "I really want to get my hands on their IP", because it wasn't valuable – an important pre-requisite to the the bubble popping. Seems to be a different case here if people actually want the SOTA models.
A bubble just means it is overvalued beyond it's true fundamentals due to speculation on speculation. The underlying asset can still have value, just less than the market price.
Consider Cisco. On the 31st of March, 2000, it was valued at US$77.31 / share, which in inflation adjusted terms is $150.46 (above the current price over 26 years later). This valuation was on the basis of speculation that the price would continue to go up and Cisco would get a large cut of the industry profits. Cisco's business is still valuable, it was just treated as overvalued by the market.
Similarly, if we go back to one of the classic examples of a bubble - consider Dutch tulip prices in 1636; speculation drove future contracts high. But tulips still have value to people today, it's just the price was higher than was sustainable.
And why's that? I don't see how either of those two things relate to one another. A grape is a naturally occurring fruit, while the creation of a book is wholly a human creative endeavor.
When I buy grapes, yeah, I get that someone worked for me to get 'em. But I don't get all metaphysical and spooky about it. I'm buying the grapes as objects. Like a book. Once I take ownership of the object, yes, thanks for your work. I paid you and I wish you well.
As far as what I ought to be able to do with the thing? Beat it. Of course there are exceptions. You say, "Don't scan the book and upload it to object storage and place it behind a CDN so Ahmed from Tounis can access it". I say, "Don't make wine".
There’s a lot of crops that are very intensively bred to create new, specific cultivars. Take a look at all the new Apple varieties popping up. There’s nothing natural about their creation: https://applerankings.com
I live in a country where the selection of available books, especially in English, is very limited. Buying online from foreign markets comes with a long list of administrative hurdles and limits.
If it were not for Anna's Archive and Z-Library, I would've never been able to read the books that shaped who I am today, or keep my passion for learning alive.
Thanks, AA and ZLib! (Also, thank you to the authors whose books and knowledge I consumed without being able to pay them back.)
https://send.djazz.se/
This is key for getting epubs to your Kobo.
Calibre? https://calibre-ebook.com/
Thanks, but I don't use e-readers as they are not available here.
I've been using MoonReader for many years now and settled on pretty good parameters that make the reading experience very comfortable on both my phone and my tablet.
Moon reader is amazing. I love mine so much I don't see a point of having a separate book reader.
I don't understand what this is doing. Can't you sideload any ebook onto a kobo anyway? Never had an issue on my Clara
I’ve noticed that people today often bristle at any suggestion that one connect a device to a phone or computer with a cable – on Reddit, one will often get downvoted for this. Apparently, a lot of younger people are hardly aware this is possible and it strikes them as overly complicated or for old people. People want to wirelessly transfer stuff, and what the OP linked to is a popular way to do that with Kobo.
I'm old enough to have used computers before having any Internet, and in 2026 the idea of plugging in devices to transfer files to it does feel like a fiddly relic of the past to me.
Yes. You can literally ssh into a kobo. I usually just put my books on a WebDAV share that is mounted on the kobo.
Sideload without cables is challenging.
I download epubs from zlib but then they're on my phone and transferring them to my Kobo is arduous. This makes it easy.
Handy, but a book lover with an ereader probably already uses Calibre :)
I don't recall ever needing anything special on my Aura H2O. It's one of the reasons I chose Kobo in the first place. Just copy any file onto it.
If you mean stripping drm I used Calibre for that but mostly I just avoid buying books with drm where possible.
This is a genius way to farm ebooks while providing a useful service. I personally just use Google drive though.
Lol it never occurred to me that they might just save every single upload
I’d hope they do file or block deduping.
Or... just a USB cable?
I agree that a USB cable is the most practical option. However, the aforementioned site is useful in a specific scenario: if your Kobo is very old, macOS won't recognize it.
If other operating systems interact with the Kobo well, the problem clearly lies with MacOS.
I the zlib app on my phone. No clean way from my phone to on my Kobo.
On my Kindle I use KOReader Zlibrary plugin which allows you to download books from within the reader. It's more convenient then any send-to-kindle workflow.
https://github.com/ZlibraryKO/zlibrary.koplugin
Look, fair enough from your perspective. But a lot of those books probably wouldn't exist if the author couldn't make some money from their work.
I can't find the post but years ago on Reddit an author posted stats showing when her book turned up pirates online, real sales for it collapsed.
Because of this I make a point of buying books, programming books especially. Yes I download pdfs, I use them as previews. This has led to buying way more than I would have.
Anyway, I appreciate this doesn't apply if you live somewhere that these books can't be purchased. But everyone praising these sorts of sites tends to look at them from only a positive perspective.
> But a lot of those books probably wouldn't exist if the author couldn't make some money from their work.
I think that's at least a bit debatable. People thought that about (normal) libraries back in the day, but it ended up having the opposite effect.
Not to mention out of print books or academic books which is a big usage of sites like these, since lots of people prefer physical books and only reach for pdfs as a last resort.
Can you imagine if we didn’t have libraries and someone tried to create them today? From publishers to right wingers, they would be painted as communist plots to destroy creativity.
The Internet Archive tried, at great cost and peril, to defend its ability to lend books as an online library due to format shift (physical books get first sale doctrine, ebooks are licensed, you cannot own them), and were told no by the system, so “pirating” it is until copyright changes and becomes more reasonable. Disk is cheap, and the Internet global. Global distributed storage system durability and availability is the path to success until laws change imho.
(Archiving culture alone is not the same as also enabling universal access to the culture and knowledge one is acting as custodian for and serving to global citizens)
The Internet Archive has lost its appeal in Hachette vs. Internet Archive - https://news.ycombinator.com/item?id=41447758 - September 2024 (793 comments)
https://archive.org/details/brewsterkahlelongnowfoundation
Totally unrelated: Dweb camp 2026 is coming up for those interested: https://dwebcamp.org/
(no affiliation with any person or entity mentioned in this comment)
As a wise man once said, if “buying” isn’t owning, piracy isn’t stealing. Words to live by, arrrrrrrr.
I think I agree, the FAR bigger impact on my book's sales was Google search deciding not to surface it in search results. Presence on pirate websites had no effect, and eventually I switched to the PDF as "pay what you want."
Libraries spend like $2B / year buying books https://www.imls.gov/sites/default/files/2021-08/fy19-pls-re..., which is like 10% of the total book market. So even if no one ever bought a book because they first encountered the book, author, or genre in the library that's already a signficant difference
I live in Iran and the administrative hurdles the op was mentioned are not an issue here because you just can't even buy intwrnationally to begin with so there is no hurdle you might need to circumvent. The few English books I have are largely illegal reprints of a pirated version or some old ancient printed version that have somehow gotten imported (no clue how or is there an actual legal way)
I remember opening "thinking fast and slow" and noticing the weird paging. After checking the official version's page count and seeing how the version in my hand doesn't match, my best guess was that someone had printed a pirated epub version.
I rambled so much that I forgot to say what I laid all that introduction for.
I'm not part of the market for these products. I don't have access to them nor even if I buy some imported (probably illegally and by single persons) or printed version, am I going to benefit them since I'm disconnected monetarily from yhe author.
Me reading pirated versions of these books has no negative effects on the earnings of the authors.
You can definitely justify using pirated versions more than most people.
Please say more about accessing HN from Iran, I’m very curious.
Not gonna argue with the point that we need to support authors, but “can be purchased” is a relative concept.
So you’re saying your entire current life is because of the proceeds of crime?!
I’m kidding. Knowledge should be free. It was never created in a vacuum. It belongs to us all
> Knowledge should be free. It was never created in a vacuum.
This is a common perspective on HN, but it's so jarring. Someone violates an open-source license and we grab our pitchforks. Someone pirates books and it's fine - really, the authors should be thanking us.
Good books are incredibly challenging to write, more so than good software. It's not like you grab Harry Potter and say "I'm just gonna change character names and rephrase some of the text". Most authors recognize that not everyone can afford books and then contend that some amount of sharing is healthy, no different from borrowing books from a local library. If you ask nicely, they will probably send you a PDF for free. But the scale of online book piracy is absolutely staggering and demoralizing, and most of it has nothing to do with taking any serious moral stance. It's just "lol, why pay when you can download for free".
Taken out of context, you're right. But the parent comment couldn't buy these books even if they wanted to. I'd say there's a consensus that the primary motivation for piracy is hurdles to access having nothing to do with payment.
I think iTunes Store and Netflix both showed 15 or so years back that if you give people and easy and convenient way to pay a reasonable price for music/movies - a huge number of people will willingly choose to pay and support the artists/creators, instead of hanging out on Bittorrent trackers and paying for seedboxes and sharing with friends.
And the siloing of movies by Netflix nd all the other streaming services, and the introduction of advertising into the "reasonable price" tiers, shows that people can and do remember piracy in still an option, for when corporations and copyright holding groups enshittify things. Lately amongst a lot of my friends I've seen more usb stick with movies being shared that even in the heyday of Bittorrent.
BitTorrent is one form of collective archive for things that either have restricted or no commercial availability. I wish more cultural artifacts were archived this way.
There’s also the argument that copyright has been extended to the point of absurdity.
I respect copyright, but I can’t respect a lockup period that can push to 130 years or more. For example: if JK Rowling is alive in 4 years the first Harry Potter book will have a valid copyright extending from the 20th, 21st and into the 22nd century. Is it really defensible to say that your great, great (great?) grandchildren should benefit from a government mandated monopoly on your work?
JK Rowling is the exception. The duration is one thing, but so are her proceeds. Most authors are never going to see that much money, if any, from writing. Maybe copyright would expire once the original author has made enough money to pay for lavish living expenses for themselves for the rest of their life, inflation adjusted. So their family wouldn't automatically be taken care of, but they are.
Kind of weird too think about it that way, but food for thought.
If most authors don't make much money, if any, then what is the point of copyright? It only really seems to benefit publishers.
I'm not here to defend publishers or the insane current copyright terms. However, in the traditional model of publishing, publishers subsidize production of new books from unproven authors--through book advances, copyediting services, and printing and distribution costs--via the money they make on the few successful and very few ultra-successful books.
If you take away copyright, you reduce the revenue of publishers, which reduces the number of unproven authors they can take chances on. (They're not very good at picking winners from the pool of new author manuscripts, not nearly as good as "agents" like to think they are, but that doesn't matter; they still wouldn't be able to take as many chances.)
Sure, but for the lower volume authors the royalties at year 20 are effectively 0. Most books don’t even sell out their first printing. Its a money grab to defend/prosecute copyright on a book that you couldn't even sell.
100+ year copyrights only help the descendants of the people that have already become extraordinarily wealthy. Cutting off at 2 decades would have close to 0 effect on the huge majority of creatives, while benefitting society immensely. JK Rowling would still be a billionaire, and small authors would be fine too.
Individuals did not push for these laws.
JK Rowling is irrelevant to the question of whether "lifetime plus 70 years" is an absolutely batshit fucking insane copyright term.
I think that as you said, she's the exception. Why are people using superstars as the rule, instead of the exception? Authorship covers everything, from software to how-to books. Comic book authors. It covers authors which are barely successful. Most authors struggle to write all sorts of little pieces which used to end up in magazines, thousands, just struggling to get by.
While the current copyright length indeed seems excessive, looking at the most successful and determining fairness seems very untoward. As it stands now, an author could build up a library of their writings, and perhaps even sell it for their retirement, as it has value.
Is that what is happening often for indie novelists? I don't know, but to others in this thread know?
I know you were just musing, but I really dislike this "let's create a hyper-invasive tax framework, so that anyone who etches out a little wealth has it capped" concept. There's no wealth to redistribute, not in the way people think. Money isn't real. It's just an indicator of things. In some cases, it's an indicator that someone owns a lot of stock in a company that could be worth 1/20th of its value tomorrow, after a crash, and further is valued at hundreds or thousands of times the actual value of the company today.
In such cases, it's often an indicator of capacity to steer the economy, not of any tangible wealth.
But people need something to strive for. Wealth is one of those things. In a free market economy, wealth is the reward for job well done. It does make people hustle. It's not perfect, but we've seen how poorly and incapable any centrally planned economy seems to be.
As a Canadian, I do believe the government should be involved in certain things. I have a post office, schools, police stations, fire department, food inspections, and so no on that the government is involved in, so I think health care makes sense too. Yet I absolutely do not believe the government should be planning most sectors of the economy, nor should it be meddling too deeply in the wealth ratios of its citizens.
To put this in that context, would you want to have the current US administration determine how every sector of the economy works, and how people are paid, etc, etc, eg central planning? Can you imagine?
Of course there is nuance. Of course there are exceptions. But overall management of individual wealth seems very invasive to me.
> Someone violates an open-source license and we grab our pitchforks.
If you look back through the annals of Free Software, one often encounters the claim that the GPL was a way to use copyright against itself, and if there were no more copyright, there would be little need for these licenses.
> there would be little need for these licenses.
The GPL ones explicitly disallow the removal of user rights from work derived from GPL software. That wouldn’t be possible without copyright.
The issue with copyright law is that it is all or nothing. Rights to a work are either tightly held by an author/publisher, and even downloading a small excerpt can get you in trouble, or it is fully public domain and open for any and all use.
There needs to be a middle ground, such as: after 15 years of publication any private individual can access and read the work for free, but the rightsholder still controls commercial sales, merchandising, licensing, character rights, movie rights etc.
If this is not bad faith argument, then I don't what is. When someone is violating an OSS licence, they are doing it for commercial gains and monetary profit. Nobody is angry at someone using FOSS software for himself with no money getting involved.
As opposed to that, books, movies are pirated for personal consumption. Not monetary gains. If someone bought a $30 book, and then ran a BaaS with millions of VC money in his pocket, people in HN would be angry at him, too.
If knowledge is to be free, that means there should be no restrictions on how it is used. Even an open source license misses the point, because the implication is still that one person can dictate how another person can make use of knowledge. It’s still premised on the same dystopian view that a person can own an idea.
Wow very well put.
Except it’s not. That poster is just doing an IP law version of the “paradox of tolerance”. Their argument is just: you say you want information to be free yet you believe in licenses that keep information from being made un-free.
Knowledge is free as in *free beer once in a while because you genuinely can’t pay*, not free as in *scale up the freemium model, keep grabbing free stuff daily, weekly, monthly, and then start running your own pub with the free beers you took from the neighboring pub.*
This discussion is intellectually dishonest. Either some people here genuinely dont understand the concepts of kindness and gratitude, or they do understand them and are just choosing to spread falsehoods anyway.
Just because my beer pub isnt going out of business because you took some free beers doesnt make it ethical for you to exploit my kindness and use those free beers to build your own competing beer pub.
If people are still confused: that setup is not sharing knowledge. It is stealing with nicer branding to help you and your friends sleep at night.
But then we step back a little further and ask what this thing that is called property, why should any human be granted any beyond what actually constitute them as an entity of their own.
What matter at the end of the day is not what the document pretend about who possess what, but how people feel in their life, what they can access to, and what they are bared to access for which actual reason.
It can't even be purely narrowed on what human people feel like. We all know our species is dependent on many physical phenomena and other species which owe nothing to us.
Property may be a social construct, but the costs of living are not.
You can question ownership in the abstract, and I am not even against that conversation. But that does not answer the actual point here. We still live in a world where food, rent, healthcare, clothing, hygiene, servers, tools, and time all cost money.
So if someone gives something away out of kindness, access, public benefit, or community spirit, that does not automatically mean everyone else is entitled to industrialize that kindness for their own business.
Open source is not a mystical anti-property pact.
Open source is not a contract where people are expected to provide endless unpaid labour for others to build businesses on top of. At some point this stops being a discussion about sharing knowledge and becomes a way to justify taking advantage of people’s work.
I just don't like these replies, if this was sarcasm, it might not have worked for me. I just find these social comparisons deeply unserious when the discussions are against theft or harm done to actual people with actual human needs.
Not so. Stallman created copyleft licenses as a defense against the current implementation of copyright. Copyleft uses the existing system of copyright to protect authors of free software from people who want to use copyright to restrict distribution. It wouldn't be necessary if copyright didn't exist.
Important nit-pick:
Copyleft was created to protect users of free software from authors/distributors who tried to use copyright to control the software running on the users' computers.
Stallman wanted to protect the right to fix bugs, he was not against paying for goods and services.
Sure, but an idea is not a (physical) good, nor is it a service. Coming up with an idea or writing a book is a service and should be paid for (probably by commission), but (and Stallman would agree) the idea or book itself should be free.
If knowledge is to be free, then any corporate/commercial interest that locks up modified knowledge (code) to run their own services should have that locked-up knowledge freed from their commercial silo as well.
Knowledge should be free, but that can't be treated too literally. Not a unique case of this kind of phrase. If we're doing capitalism people have to be paid somehow, and when people say "free" they don't mean "absolutely". I mean, speaking of open source, consider "free" software.
Open source licenses are almost entirely unrelated. They're strictly a hack around the copyright system, and not only that, they literally do nothing other than grant you rights you wouldn't otherwise have. Talking about open source is mostly a distraction. When people say knowledge is free they almost always mean access to knowledge. Open source grants people access and more.
People are not mad that they can't just steal things, they're mad that access to things is tied behind massive gatekeepers (essentially indefinitely...) that essentially exist to continue to enrich themselves while somehow almost none of the money makes it back to the authors, and is sometimes completely untethered from where the money comes from that funds the works to begin with. You can't just freely navigate, search through and consume information, it's all tied up behind various pay walls and monetization schemes while authors starve anyways.
We could have a more equitable and reasonable system that allows broad access to knowledge while providing some approach to monetization that is reasonable for both people seeking it out and people consuming it. There's little point in trying to enumerate the number of ways it could be done. We already have a system for taxes, we already have seen commercial schemes like Spotify, you could slice it thousands of different ways. Plenty of pros and cons. I'm just saying it could be done and we know it could be done.
But it can never work if all media and knowledge dominated by rent seeking gatekeepers standing in the middle whose primary purpose will always be to enrich themselves first and foremost. They will always want to get more and give less, because that is more or less their fiduciary duty.
> Even an open source license misses the point, because the implication is still that one person can dictate how another person can make use of knowledge
To be considered open source software, the license cannot impose any restrictions on how the software is used. You are free to use the software for whatever purpose you want.
Absolutely not. Go read e.g. GPL license before spreading such falsehoods.
This is the same “beer not speech” argument that has been going on for decades. A quick search will show you why your claim is incomplete.
This is ultimately just a reframing of "a tolerant society cannot tolerate intolerance".
Now that AI is decimating a lot of bullshit jobs we need a basic income of some kind. A universal one.
That would enable the authors, activists and hackers to pursue what's meaningful, instead of the profit of the multinational leeches that do not need to adhere to laws, borders or taxes.
If a zillion dollar corporations (Meta and the likes) can torrent Annas Archive and decimate the copyrights I see it a moral imperative for the people to do the same and spare the pennies that would profit the publishing / media / ... industries to support the authors directly, instead of the trickle up method, where majority goes in to the hands of the dystopican narcissist zillionaire.
I too think Basic Income is a necessity. I don't think it can happen in the US (for cultural reasons) but I think elsewhere it can work. (And indeed in many places it kinda, sorta, already does).
>> That would enable the authors, activists and hackers to pursue what's meaningful
I would counsel not using the word "meaningful" in the context of BI. We already have a way of evaluating "meaningful", it's called "money". If society gets to judge what is meaningful or not, well, that's the system we currently have.
BI is about letting people do whatever they like especially if it is meaningless. BI implies an economy (you have to spend the Income on something) so meaningful will always be richly rewarded.
Incidentally publishers exist to act as curators and filters. The value they add is real. There's no shortage of self-published stuff on say Amazon, but 99% of it is drivel. I go into a bookshop to find the 1% that at least someone thinks is worth reading.
> I don't think it can happen in the US (for cultural reasons)
Cultural reasons are just a matter of spindoctoring/propaganda.
Nixon almost implemented a form of UBI: https://en.wikipedia.org/wiki/Family_Assistance_Plan
In Alaska there's already a form of UBI: https://en.wikipedia.org/wiki/Alaska_Permanent_Fund
The reason I think it is hard in the US is because there is a very strong "work or die" ethic in the US. Everything is driven by money. Even basic things like healthcare are driven by money. Your life after retirement is determined by how much money you accumulated. The word-association between "poor" and "lazy" is strong. Taxation should be light. Each man should keep what he accumulates.
BI by contrast values people over money. It recognises not just the social responsibility of the rich to the poor, but also the dignity of being human.
Some countries are further along that path than others. Health care, education, unemployment benefits, are all steps towards BI. The wealthy are taxed to pay for the poor. Ultimately the suppresses the excess, while raising the floor.
From a cultural point of view, the US has many steps to take before society is really for (real) BI.
I have an Irish friend that got a Ph.D, while on the dole.
That would never happen, in the US. We have a strong “Why do they get special treatment?” thing going on.
I think I get your point, but few PhD candidates pay tuition in the US and most actually receive a living stipend. In exchange they act as teaching and/or research assistants.
That would be perceived differently.
I guess we know that Ph.D Research/Teaching Assistants tend to be worked to death, and paid peanuts.
The whiners would be OK with that.
Top 10% already pays 70.5% of all federal income taxes. US high income payers are already taxed to pay the poor.
Almost half of US federal tax payers pay 0 income tax.
Another way of looking at that is that the people who have been able to accumulate excessive gains from the capitalist system are forced to pay some of that back, to maintain the system that enriched them and those whose labor they profited from.
That seems like a screwy but ultimately more than fair deal for the top 10%.
Especially when the alternative is pitchforks and torches.
Before you go to the capitalist in their home and torch it or beat them in front of their family, try striking for a while. Usually they soften rather quickly when labour is collectively withheld.
Hence why corporate leadership is salivating over AI.
I disagree with your definition of meaningful here. Society's willingness to pay is certainly a signal for meaning in an output but it seems quite inaccurate. Think of the number of artists and thinkers that weren't recognised in their lifetimes, their work was still meaningful but society hadn't discovered it yet.
Similarly, there are a number of things that would be incredibly meaningful to all of society (eradication of disease, nuclear fusion, etc) that we choose to deprioritise to instead eat fois gras and fight.
Sorry to jump down your throat on this, we're on the same side, I think BI is inevitable and worthwhile. But it's worth pointing out that BI enables more than just the meaningless things.
BI does not stop people doing meaningful things. Society will (mostly) reward things which add value. We have a very efficient system for that, and it doesn't go away under BI.
We are already spending massive amounts of money on disease, fusion and so on. There's no issue there, and BI doesn't move that needle.
At the moment society (especially in the US) operates on a "add value or starve" basis. (That's an over simplification, but the underlying "morality" us strong in that direction.)
BI moves the needle for those who are not "adding value" (in a materialistic sense.) Artists and Authors are free to spend their time creating works, of which a rounding error will have any value. Sure there's some unappreciated author out there cranking out literature, but there's also everyone else cranking out rubbish.
BI doesn't make 'big things' easier to do. Arguably it makes them harder. Rather it allows individuals to gain satisfaction from little things. Budding poets can write all day long. But if (great) poetry is currently ignored, do not expect much on that front.
I say this not to denigrate BI but rather because allowing the meaningless is precisely its goal. To miss that is to miss the point. It allows people to find worth and dignity without having to add value to society.
> allowing the meaningless is precisely its goal
I think you’re getting pushback because of this choice of language. It’s not the only goal, but it is a key feature. BI supports choices of how to spend your time and enables freedoms.
"add value or starve"
I think this is wrong. It's not about value, it's about being submissive.
> We already have a way of evaluating "meaningful", it's called "money".
This is a mechanism that’s very narrow and creates notable distortions between financial and artistic value, just to keep the example specific enough.
> I would counsel not using the word "meaningful" in the context of BI. We already have a way of evaluating "meaningful", it's called "money". If society gets to judge what is meaningful or not, well, that's the system we currently have.
I’m also going to take issue with this interpretation of “meaningful” - I’ve known several amazing crafters and artists who have had an incredibly hard time doing their craft simply because the demands of capitalism prevent them from putting the time and effort into honing those skills, finding a market, creating a portfolio, etc.
If anything, I think BI is just as likely to add meaningfully to society as it is to give people the option to do meaningless things.
Universal basic income is dystopic. It's a way of finally making everyone completely dependent on the government. It's a Brave New World kind of dystopy.
Are you independent of your government yet?
I am dependend on government not giving subsidies to my competition. I am dependend on them not to raise taxes. I even depend on them for keeping the infrastrukture alive. If I cant trust the state I life in, it is time to move to a different state.
Interesting perspective.
To me it makes you more dependent on the other people in your society - only if you don't earn more than the BI of course.
That is true under Western Capitalism too, you're already dependent on your society, unless you filter and pump your own water; grow your own food; generate your own power; make your own devices; school your own children; do your own healthcare, etc.
For those reliant on BI, instead of greed being the moderator through which your income is determined, it is determined through the democratic processes of state.
Now, if your have an oligarchic dictatorship like USA and Russia, sure you'd need to be particularly worried if you were relying on BI. If you've got a strong democracy (probably true of any country with state-run services that are competitively priced) then BI should work if there's enough total income that it can be redistributed.
I think there are problems with BI though.
Anthropic, OpenAI and Meta: yeah totally, personal consumption only
Altman et al take knowledge and lock it a way. The add restraints, borders, limitations of things they got for free. They stole freedoms.
> When someone is violating an OSS licence
It’s worth pointing out that violating a FOSS license is ALWAYS about denying users rights the upstream already gave them. Violating a FOSS license is stealing from your users.
Just because something is challenging, that doesn't mean you should get paid for it. Art and business are two distinct endeavors. All the copyright and IP issues come up because of this one confusion.
Copyright is not an unalloyed good.
In small doses, and short terms, I might agree with your classification.
But when copyright is 150 years It no longer has anything to do with reward for the author or encourage you creativity, it’s just a cartel.
Capitalism works great for shoes. You pay the proletariat for their labor and they labor away in your shoe factory. The shoe economy ticks over on scarcity.
The knowledge economy is different. It’s hard to see how the system works in a world where everyone has the equivalent of a shoe replicator in their pocket.
Ironically the “free market” only survives by having arbitrary regulations on shoe duplication enforced in the interests of shoe-rights holders.
The shoes would be free and not a market product, as is it currently air to breathe.
Free market is just the best way we know of dealing with scarcity, every other alternative that was attempted seems to fail sooner or later. If you invent a way to get an infinite amount of something, that something goes out of the free market rationing.
For the knowledge, a lot of free market more radical defenders don't believe at all in current state of "intelectual property", much less if et has to be enforced by the state. They are for "industial secrets" (if they leak, you lose them), NDAs (if they leak, you enforce the contract) and similar formulas.
> This is a common perspective on HN, but it's so jarring. Someone violates an open-source license and we grab our pitchforks. Someone pirates books and it's fine - really, the authors should be thanking us.
If the books were released under an open source license, there would be no problem here?
> Someone violates an open-source license and we grab our pitchforks
we grab pitchforks because you violate the licence by making knowledge NOT free?
what exactly is the contradiction?
> It's just "lol, why pay when you can download for free".
Well for starters because the research shows that reading on a screen literally makes you retarded.
https://www.nature.com/articles/s41598-023-36256-4
https://www.journals.uchicago.edu/doi/10.1086/691462
https://www.nature.com/articles/s41598-022-05605-0
>This is a common perspective on HN, but it's so jarring. Someone violates an open-source license and we grab our pitchforks. Someone pirates books and it's fine - really, the authors should be thanking us.
This is only apparent contradiction. The underlying issue is of course the social concentration power.
It's not the same when an author is deprived of virtual money the copyright system entice them to extract from readers regardless of whether they have money or not, and deprive authors from a negotiation mechanism against corporate that swim in money. In both case, with current legal systems the most obvious law enforceable mechanism is copyright. It doesn't mean the underlying issues at stake are the same.
What's really staggering and demoralizing is that humans have all that it takes to feed all mouths, make equal incomes, live in peace in all kind of diversities that encompasses reciprocity of accepting differences, and yet we end up with people dying from war and starvation while other accumulate a toxic level of wealth in a system that tries to uniform everyone and harshly cuts anything that don't fit the standard box.
It's almost as if we are in a comment section where most peoples jobs exist due to "content" either being stolen (AI most recently) or crowd sourced under EULAs that effectively steal it.
Not that I think a back door to the shit Disney and other corporate content onwers pull is nessecarily a bad thing. But it is funny to see people here gaslight themselves into believing they have some moral right to just take what others have created.
> most of it has nothing to do with taking any serious moral stance
If many people are immoral, curb the roots of immorality then, fight that one battle - do not shift it to others' sound behaviour exercised with proportion and judgement.
The pitchforks aren't being raised for licence violations, they are raised because software should be free and unencumbered and now somebody is distributing it in a way which doesn't afford the same freedoms to the users.
Free/copyleft licences like the GPL are just the way we've been able to effectively make software free in a world in which copyright applies to software.
And yet JK Rowling is a billionaire and movies make hundreds of millions of dollars in profits.
So I guess we can have a world in which knowledge can be free without starving knowledge creators?
Not if you account for just how many authors never reach the pinnacle of breaking even.
Rowling is a rare exception and very much not the norm.
> Someone violates an open-source license and we grab our pitchforks.
Where do we do that? All LLMs are still doing it. I've not seen any MIT license or BSD license of all the repos LLMs have gone through. That's excluding copy-left licences. Zero pitch-forks in sight.
> Someone pirates books and it's fine - really, the authors should be thanking us.
I agree here with you. And I'll give a better reason to not pirate for most people. In the seas of endless content, If you want to read/learn from a book - try paying for them. You won't read most books. You'll curate/research better before buying. The books you buy will be worth it in your own mind. You re-consume them more which is very important IMO. For most pirates, I think the issue is not unavailability of resources/content. But need to consume in a better way. Buying slows things down. You'll consume better. Quality over quantity. Even then I have a ton of humble bundle books that I've not touched at all.
> Knowledge should be free. It was never created in a vacuum. It belongs to us all
Imagine you're a professional writer and it's your main source of income. How would you feel if someone said this to you? Would you still want to write books?
Things have changed a lot since the late twentieth century. The kind of people you imagine, who can live full-time off writing, are responsible for a vanishingly small amount of the books that appear today. Piracy has little to do with it; this is primarily due to the fierce competition from other books, the glut of content available today, but especially from mobile phones as fewer and fewer people read books. Even for those who make appreciable income off books, the books are nevertheless usually a side gig alongside other hustles.
> The kind of people you imagine, who can live full-time off writing, are responsible for a vanishingly small amount of the books that appear today
So it’s ok to take what they produced without paying for it? What a weird non sequitur.
Someday, some pro-piracy lunatic will come up with something vaguely coherent that isn’t just “me want, me get.”
GP is arguing that piracy is not the cause. That part is directly after your quote.
You forgot to stipulate, “…living in an ultracapitalist country with no meaningful social safety net,” i.e. the USA.
Imagine that terrible time before Copyright existed, and there was no motivation for anyone to make art, literature, or music.
"What an inestimable advantage it would be, if, in every branch of literature, there existed only a few but excellent books! This can never come to pass so long as money is to be made by writing." --Arthur Schopenhauer
We certainly wouldn't want to return to the pre-1976 era, where, as we all know, no books were written.
Keep in mind "copyright" explicitly does not cover "knowledge" or "ideas".
The reason I buy books is rarely for knowledge or ideas, its either for a good story in the case of fiction (which the author definitely should have the right to exclusively commercialise), or for the authors explanation of and idea or some knowledge which goes beyond the raw information I could find in the scientific papers or higher level descriptions.
Good storytelling and teaching are valuable and should come with some sort of exclusive rights to control and profit from by the author. And even bad storytelling and teaching should have that same protection from other people distributing it in ways that restrict the authors rights.
Clearly 130 years of protection is insane, and all it does is keeps Micky Mouses lawyers able to buy new yachts. But as others in this discussion have pointed out, after 20 year almost all of the authors who are still earning money off their works are already rich beyond most authors realistic hopes. I'm not sure 20 years is "the right length" for protection, you sometimes hear stories of works being rediscovered and becoming wildly popular more that 20 years past the original publication date (Kate Bush's Running Up That Hill getting back into the charts on the back of it being used in Stranger Things - for example).
> Imagine you're a professional writer and it's your main source of income.
There are probably a few thousand of these in the entire United States. The market system heavily under-values writing as is, independent of piracy
Obviously yes. While I have the privilege of earning money from services instead of products, I still think that producing creative works is important and should be done whether the motive is profit or not. Many things are not profitable. Should we leave them unwritten? Leave them to those who have the time to spend. For those who have chosen the life of producing products that are easily copied, it is part of _reality_ that those things will be copied when copying benefits the copier. That doesn't mean I think all books should be free. So many books I buy because I don't have another choice or because I want to support the author. But expecting everyone to be in the same situation as me is nonsense.
So you provide your services for free I presume.
This is such a ridiculous statement. The people who spent their time building up this body of work deserve to be compensated. Take whatever job you do and imagine people confidently stating that you should work for free.
Yes, but do their publishers need to be compensated for a century after their death?
But do their grandchildren deserve to live off the proceeds for the rest of their lives as well? Say I'm a carpenter and I make chairs. How many chairs do I need to make before I get to retire? If I make a really really good one and get it put it in the right place, one should be enough. Just sit back and collect $1 for every time that someone sits in it. I don't have to make the chair particular good or comfortable, just get it into the right place where people will pay. And then I don't have to work for the rest of my life. Nor do my children. Or their children either. Framing the question at the extreme, that one should be expected to work for free, is just as absurd as framing it as some people should just never have to work at all, ever. No one put me in charge, but I believe people need to do work of some sort. Who gets to decide what counts as work or not isn't for me to decide though, so the system we've got is just this whole unorganized unplanned economy.
This is a disingenuous argument.
Constant payment into perpetuity for the replication of digital information is just a form of rent-seeking. Except, unlike a landlord, you're not obligated to correct defects.
A better argument here is that yhe work you do should earn you income for a century every time someone else benefits from it.
https://xkcd.com/1228/
The entire llm chatbot craze is only possible because of the proceeds of crime. im reluctant to pick-on the little guy here.
I'm glad your username specifies your location. My biggest pet peeve online these days is someone telling a story about "my country" but never specifying which country that is.
We sell ebooks in every country, including Tunisia - https://www.ebooks.com/en-tn/. I understand that the price of books is sometimes prohibitive, but it's largely outside our control.
The issue is not the amount; it's the currency. Ours is not convertible.
To put it into perspective: even *free* services can be completely inaccessible here if they require payment info for verification purposes.
Acquiring hard currency to spend online is truly a bureaucratic nightmare, and even if you manage it, the annual limit is strictly capped (at around $300).
I can feel for both camps.
Some of my published work is pirated heavily. That's not my main income source, so I just shrug and let it go. If anything, I'm probably happy that people are reading my work. Especially if it's people that can't afford it, I'm glad they enjoy my work -- those books got pretty high reader ratings, and it seems to me many readers are actually reading the pirated version.
But I do have friends that depend on this income source, and fighting piracy has become a part of their day job. It's not a fun thing to do, they'd rather spend time working on their next story, but they still have to do this everyday. I feel for them.
That's almost every country these days. At least in the EU. Amazon doesn't carry nearly the selection you'd want, and certainly not in the non-fiction department.
https://SourceLibrary.org has about 16,000 rare books translated — most for the first time. 50,000 books archived (will be translated when we have $$ for it). More tokens than English Wikipedia and about .75 petabytes.
Not sure if we will qualify for a bounty, but happy to share! Btw, we are looking for funding from small or large donors who want to help us translate the Renaissance…
Hey, this looks fascinating!
I can't quickly tell what all you have archived^, but I have some friends who are academic historians who might be interested in certain categories of work (and could help verify some esoteric languages) - is it possible to search by region or language?
Have you reached out to any types of historians WRT the project? It seems like some PhD students might be able to find some projects in this work etc
^ when I looked at the timeline https://sourcelibrary.org/timeline, I got an error
Yes, this is designed with historians and librarians from the Embassy of the Free Mind (https://embassyofthefreemind.com) in Amsterdam, stewards of the collection of the Biblioteca Philosophica Hermetica
Please share with historian friends. I’m not great at socials or fundraising but this was really designed to support humanists. It can give DOIs for the versions of the translated books, which means they can be quoted and cited in academic papers.
Tip: Try it in Claude or Claude code (even better)! Just point it towards the source library. It can find quotes and evidence on any topic of interest. Or try the librarian — our source-grounded research agent https://sourcelibrary.org/librarian
Thanks for the feedback, I’ll fix the timeline.
Interesting site. I picked a random topic to listen to — flying chariots or something like that — and the conversation of one person talking and the other whispering was definitely not to my preference. I’ll have to take another look when I have more time.
Curious as to what your budget was to get where you are today? That's a lot of tokens. I presume you are using gemini flash?
All the models used are shown with each page of translation and each book has a whole data provenance treatment.
You can add it up!
How do you handle the more densely written pages in script ? I did a very similar exercise OCRing works from this exact collection, but I stuck with the English books for the first pass.
I don't see raw token counts, just a list of steps and page counts. For example, what is the rough average token count per page in the ocr and in the translation steps for a Greek book?
I have seen Gemini costs change quite a bit when processing very similar books from the same series lately, mainly because thinking tokens have increased about 5x. Has that has happened to you as well?
Edit: for ocr I am using about 15k-25k tokens per page, but I have a complex prompt.
Can't you just tell him?
Wow this is amazing!
TL;DR: AI translated books on the occult and occult-adjecent themes :/
beautiful work! the answers are relevant and poignant. thank you for building this. For funding, paid research api maybe?
Anna’s came clutch for me yesterday. I spent a few days trying to find a zip file of a CD that came with an old book from early 2000s on programming. One of those Thomson Publishing slap jobs that I actually enjoyed. I checked used copies all of them said does not come with CD. I tried googling around, nothing. LLMs couldn’t find it. ChatGPT kept saying it is on the archive (no it isn’t you useless piece of shit). Anyway, on a whim I went to AA, lo and behold, zip files for both first and second edition. Godsend.
What for?
Honestly? Nostalgia. I ran it in a sandbox just to check it out and play with it.
Consider putting them on archive.org then, they have a section for that and it helps spread sources
Good idea. I’ll see when one of the used copies I got comes with a CD so I can provide the actual .iso image.
I wonder how long it will be before they offer bounties for internet scrapes.
Cloudflare captchas have made the internet unusable for me, and I'm sure it will only get worse over time. I'd much rather just browse (or even torrent) a copy of archive.is or similar. The latter would be much better for privacy, and hey, I run ad blockers anyway.
https://x.com/CloudflareDev/status/2031488099725754821
Well, there is this little conflict of interest
https://xcancel.com/CloudflareDev/status/2031488099725754821
Someone on your network is likely playing one of the games that are monetized by bright data proxies, it was a thread on here a few days ago. It could be your smart TV. If you find the culprit and remove it there's a decent chance your ip reputation will improve enough to not see those captchas
When everyone's a bot, noöne is.
I live in Canada but was born in Italy. I want to often buy books in Italian (digital) and it's incredibly complicated because licensing deals are never for people speaking Italian in Canada.
You often need an Italian credit card to pay.
The digital world is crazy.
"If you work at Google and have access to this data, then we realize that $200,000 means little to you, but you'd be hailed a legendary archivist if you're able to sneak out this data."
Yeah, but still, I think I'd prefer to do it anonymously than be the legendary archivist rotting away in prison.
Who is behind Annas archive, there is a lot of english speakers involved in the team and forums! Anyway as long as buying isn´t owning no issues here.
I think Anna is behind it.
https://redlib.catsarch.com/r/Annas_Archive/comments/1f6h74r...
https://reddit.com/r/Annas_Archive/comments/1f6h74r/im_curio...
> ANother aNonymous Address.
True or not I rather like that one.
>Stands up "I am Anna!!"
This one's good as well. This is an example of a rare good Reddit thread.
Apparently this is the reference: https://www.youtube.com/watch?v=FKCmyiljKo0
But this is what I thought of: https://youtu.be/hdsB99HQ2S8?t=194
All are examples of people with less power protecting each-other from a more powerful force.
I think the main source may be in Russia; or that was with libgen.
But I could be wrong.
I am more surprised to see that there are so few alternatives to it. Or perhaps I am unaware of them but after Facebook and co declared war on libgen, and libgen going down, there were surprisingly few alternatives. Anna was one of the few. I still don't know what happened with libgen, but since the attack it really is kind of semi-gone.
Libgen and similar are more alive than ever with an extended botnet growing weekly. The "googlers" indexed framework is shrinking everyday, so users wont find it in those search engines easily, also it is hard to keep up with a good storage considering price trend last 5 years so the botnet and torrents are some kind of solution I guess. (We for instance are considering to use the old taping system, cause is at least a viable alternative.
If no issue there, then why would you ask who is behind it in a public forum?
I’d reckon many books available on there are otherwise available DRM-free, you’d be surprised really how many authors don’t bother with DRM.
And then you could obviously just buy it physically where buying is definitely owning, so I find that sentence a bit inappropriate for books
> Plead read [this] carefully before working on a bounty.
[this] appears as a link to a .li address, and that goes bad places.
Should be https://annas-archive.gl/volunteering#bounties
I went to that link and almost ran a malicious script in my terminal for a supposed reCAPTCHA verification. Just before pressing enter, I verified it with gemini and it said it was a ClickFix script designed to steal passwords and other sensitive information. Because of all the weird things we have to do for CAPTCHA verification these days, I almost believed it was legitimate and went with the steps. It is really frightening.
Trying to follow that link I have to go past multiple security warnings in my browser and the final destination blocks me for using a VPN.
It won't even open in Brave.
What's frightening is that you almost ran a script in your terminal for a captcha.
Good thing you have Gemini though.
So how did this happen? Is https://software.annas-archive.gl/AnnaArchivist a legitimate staff account? Is this a scam?
Or did they lose the domain to scammers and never update the link in the bounty?
Anna's Archive regularly rotates TLDs due to domains getting seized. They did have .li last year.
Anyone afraid of being laid off at google right now? Perhaps this is a backup :)
I think if you get caught exfiltrating data they'll sue you for much more than $200K.
If your money is in private crypto or offshore you have nothing to worry about.
Except perhaps jail time.
Lying about your assets to avoid paying a lawful fine is criminal. Just because they can’t see your money doesn’t mean they can’t prove that you have it, and can’t jail you for hiding it to get out paying a fine.
Google, Amazon, and FB: It's not me, right
So is stealing
i'd strongly caution anybody foolish enough to go down this path
financial watchdogs and international treaties make it impossible unless you are perhaps a multi billionaire who can afford to buy people at the political level
Copy data into extra large capacity micro sdcard and hide it in your rubiks cube, nobody will suspect a thing
I wish an extra capacity SD card was enough, google books holds (probably) an insane numbers of books
Comments on the source mention dataset sizes ranging between 1.5PB and 200PB
my guess would be the 7PB mark
For 200PB one would need 25kg worth of 2TB microSD cards... that would be lots of Rubik's cubes =P
It’s the “ Copy data into extra large capacity micro sdcard” step that gets you caught. Nobody is stopping you from leaving with an SD card or USB stick at Google.
Importantly, this would work way better on a speedcube than the rubick's brand cubes. Never use the rubick's brand rubicks cubes. I use the Moyu RS3m V5 maglev, and I think that it would work well for hiding uSD cards.
I don't think anybody would do it purely for money. I would rather see someone who is terminally ill and decides to do some "good".
There are not too many mentally-sharp, fully-employed, terminally-ill people that I have met. Even fewer at tech companies.
And even fewer who are single and childless. (Google would likely go after the estate of anyone who did this.)
I wonder how hard they would press an estate. It’s bad PR to go after widows and surviving children, and the data has already escaped.
This is something they’d want to settle quietly, so the family would have leverage.
They’ve made so many terrible decisions already. Going after widows wouldn’t change anything.
I wonder how hard Google could press an estate. For a living person the main consequences would be criminal, not civil, but you can't go after a dead person criminally. Civilly it's not really clear to me that Google would have been harmed enough to create noteworthy damages.
The book publishers might actually be the bigger problem. They'd have civil copyright infringement claims with giant statutory damages.
But the one would be enough, especially in large organization. Surely they would need access to the exact data too.
I'm sure they'd go after you, but hypothetically: What damages would they claim? They still have the data, which isn't their IP to begin with.
Good point. But it would still be a breach of Google policy, most likely and they signed a pact with the devil so ...
Sui generis database rights
Is that some sort of real thing, some significance in law?
Yes
You could look it up and see for yourself. It's applicable to all sorts of collections such as map data and I'd presume also a book database but IANAL so best if you see for yourself - my intention was just to point out (what I hope is) the applicable legal principle for anyone curious about this
I think the problem is more that financial damage would result from this. So people would need to be prepared to relocate to another country probably.
Doubtful that random employees just have access to the full archive. And among those few that do have access there are probably automated systems that will catch you once you start downloading even a small percent of the content
If you want to get fired / sued for leaking internal info, you should at least aim for 1 million (https://www.cnbc.com/2026/05/27/google-employee-polymarket-i...)
Piracy / copyright predictions?
The current situation feels untenable with renting. So many regular people I know have learned about VPN, NAS, etc.
It was never sustainable, just regulatory capture by large IP owners.
Spotify, Netflix, Amazon etc provided OK value for a while, but now enshitification is biting, this is due a massive comeback.
Hopefully the guillotines. Look up how much the authors and artists who create the actual work get paid.
Quite a few textbook authors I know are paid well to be part of the whole scheme (kickbacks, forced yearly repurchase for the 'online' component of books, etc). So I think it varies a lot.
All authors should have a pay + linktree type thing so pirates can pay them directly.
Or something like thanks.dev
Some more interesting bounties they offer: https://software.annas-archive.gl/AnnaArchivist/annas-archiv...
> Purchase all Library of Congress MARC datasets — $3,000 bounty
> English Wikipedia pages about relevant institutions — up to $100 per new page
> Internet Archive Digital Lending — $5000 per 1 million pdf files
> Text version of our full library — $20,000
...
Up to 500k for OPSEC failures is interesting, as well. It gives me hope that there are wealthy individuals contributing to sharing books, or many small donations.
https://software.annas-archive.gl/AnnaArchivist/annas-archiv...
The only legal hurdle keeping Anna’s Archive away from its noble goal (piracy laws) has been shown to mean zilch in the age of AI.
The link sort of reads like people who have very easy access to the requested material. Almost like they're Google employees.
No it doesn't?
Anna’s archive rocks
Micropayments (charged in mills, not cents) is the solution. Downloading books remains essentially "free" for the individual but the Internet scale is such that authors would receive compensation for their work. The Spotify model is better than downright piracy. It is very difficult to compete with free.
I would argue precisely that downright piracy is better than the Spotify model. It is based in micro-micropayments, so much so that even at internet scale very few artists outside of the uber-macro-Taylor-Swift size get proper compensation. Sending a single dollar through Bandcamp amounts to hundreds of listens on Spotify.
It really sucks, but I'd rather pirate and know I'm at fault with the artist – maybe I'll buy some tracks off Bandcamp to make up for it – rather than let Spotify cover the transaction with a legal blanket, while the artists get almost nothing in return.
Does Anna's Archive use a completely different "source repository" from LibGen?
annas archive is practically a compilation from all sources possible (including libgen afaik)
AA compiles from everywhere; LibGen and Z-Lib served as the major sources of books. This has unfortunately led to search results for a particular book containing multiple versions of that book, and it is not readily clear which one is the highest-quality version. A real library would have librarian staff who carefully curate everything, but in the pirate world this isn’t realistic so it just gets all thrown together.
LibGen is now more or less a dead project. The servers of the original version were reportedly seized a couple of years ago already, and other sites under the LibGen name were notorious for piggybacking the original collection and just plastering it with ads. If one wants to upload stuff, better now to upload it to Z-Lib (not a perfect site, but still) and it will then get picked up by AA in a few months.
Surely this is realistic now (or soon) in the form of LLM curation. A few auto-librarians reading everything, looking at different versions side by side, making choices, etc.
Donate your tokens then.
LLMs are many things, but one thing they definitely are not is cheap/free to run at scale.
LOL, not realistic at all. The differences between book versions lie in more than the raw text. Moreover, an archival project would be loathe to favour or disfavour a version unless an actual human made the call.
Just do it and be legends, Larry. ;)
Apple won't even help Asahi linux even though it would help hardware sales and give them a ton of goodwill.
Oh, I have no expectation that Page or Brin would do something like this, let alone do it openly. But Page did seem to care about Books access at one point, and I find the image of him secretly leaking the Google Books corpus amusing.
Gemini should be trained on those books already, so in theory it could regurgitate some verbatim fragments (as NYT lawsuit against OpenAI showed some time ago).
Gemini, gpt and fable are actually very good compressions of internet content. But is lossless compression as in they kept the most important part (for them to fulfill the next token task) and found a way to mimic the rest.
I think you meant lossy compression and not lossless. I'm not suggesting this as a method to extract those books from the models, which by their nature are not databases. Just commenting on the somewhat surprising fact that the bigger the model the more likely it is to produce some (short) excerpts of the original training material
So AA is a front for openai?
the bounty would be a bit higher with openAI money behind it
How did you come to that conclusion?
No, but they openly make a lot of money from selling their library to AI companies. Fast enterprise access to Anna's Archive starts at $100.000
Interesting. But AI companies drive the RAM prices, which costs me more. So someone makes me pay more here ... :(
A lot? I would be kind of interested if there were any known figures. Do companies want to be implicated in AA-cooperation in any capacity?
They likely use intermediary companies, but NVIDIA might have purchased from them directly, I don't remember the full story.
No specific figures, but see, for example:
https://annas-archive.gl/blog/ai-copyright.html
> Virtually all major companies building LLMs contacted us to train on our data. Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality.
> We have given high-speed access to about 30 companies. Most of them are LLM companies, and some are data brokers, who will resell our collection. Most are Chinese, though we’ve also worked with companies from the US, Europe, Russia, South Korea, and Japan. DeepSeek admitted that an earlier version was trained on part of our collection, though they’re tight-lipped about their latest model (probably also trained on our data though).
It's at least 30 companies, each of which paid hundreds of thousands of dollars.
There was a time where you would get a random page preview, some artists found a way to extract full books that way (F.A.T lab?).
How is Anna's Archive funded? I see they have memberships, but it's hard to believe that can fund all these bounties - some going into six figures. Ask any FOSS project about funding by that method.
It seems like there are some deep pockets funding them.
Chinese (and some other) AI companies buying fast access to their dataset.
So Anna's Archive is in some ways a front for AI companies, gathering the sources they can't get themselves?
They could get access for free through torrents.
Not really, AA is the source.
By 'sources', I mean the original books, papers, etc. AA didn't author or publish them; AA collected them after they were published.
They have a fast downloads tier starting at a few dollars a month
The US should just find a way to quietly share literature access with the Russians, rather than letting piracy be promoted and facilitated for US consumers as freedom-fighter "archiving".
Between all the piracy, and all the AI training and the purchase/visitor-circumventing AI services, the practice of writing and publishing genuinely good work is being wiped out.
We're killing the goose that lays the eggs, for selfish gain.
Possibly but this act of governmental self-harm is useful to The People. We live in a world where if your valuation is ~1T you can more or less just do what you like. And the work of The People is stolen from you and launderd.
In such a world, isnt it useful that governments are stupid enough to give adversaries reasons to undermine it? When the government props up a corporate tyranny domestically, and racketeering, should we make a temporary alliance with all its enemies?
</i> (Eg., the provision to AI companies of all corporate secretes and competitive practices via prompts, eventually to be used against their capital interests and their labour interests).
So when will the American people form an "Incorporation" to lobby against business for them?
This ship has sailed for academic publications, and academics define that term very liberally because we want to read everything, fiction included. The shadow libraries started off as a way for scholars in ex-Soviet countries in particular (but also India, SE Asia, etc.) to access literature that simply wasn’t available in their country. But the shadow libraries proved so successful and convenient that academics in all countries are using them now, even if they have access to official subscription services. I use AA several times a day and so do the researchers around me in my office; at conferences, if the presenter mentions an interesting publication, the whole room immediately opens AA on their laptops, etc.
Even if projects like AA didn’t have nation-level support, academics would find a way to keep as much of it as possible going. After all, we’re the ones who compiled the bulk of pre-2020 material, and we’re the ones who do all the hard work of scanning from our institutional libraries stuff that doesn’t exist anywhere in digital form.
>We're killing the goose that lays the eggs, for selfish gain
We already did that when the internet collectively agreed decades ago that everything digital should be free for anyone.
We're now 20 years downstream of ad-blocking being a virtuous good, and piracy being the ultimate show of liberty, and now suddenly everyone cares about the creator's revenue stream.
The mask slipped and unsurprisingly the internet is a bunch of selfish morally stunted children. Some of them even pushing 50 years old.
Yes, I am talking to you with the 4TB of pirated content, proud of not loading any ads in the last 15 years, and getting enraged over LLM training.
> Yes, I am talking to you with the 4TB of pirated content, proud of not loading any ads in the last 15 years, and getting enraged over LLM training.
That's oddly-specific :-)
In any case, I have no pirated content that I know off, neither proud nor ashamed of blocking ads[1], but I still get annoyed that a bunch of VCs can use their invested-into companies to launder all the worlds IP, then sell it back to them.
[1] Who feels proud of blocking ads? It's like feeling proud of tying your shoelaces: "Good job, well done, but that's the expectation, son".
>the practice of writing and publishing genuinely good work is being wiped out.
Most of the best literature in the English language was written before modern IP law was even a thing. There's very little good literature written by authors primarily motivated by money.
That's just cultural elitism. I hope you meet someone in your life who finds absolute joy in reading young adult romance novels or D&D fantasy books so you can understand how irrelevant "good" literature is. I love Dostoevsky and Verne (and D&D novels, especially those written by R.A. Salvatore), but I would never judge the modern "IPs" that got my daughter into reading.
> best literature
What does that even mean?
Everyone has their own opinion as to what the best literature is, just like what the best music is.
But there is also some consensus. For music it would be Beethoven, Mozart, the Beatles, etc.
For literature it would include Tolstoy, Shakespeare, Cervantes, Dickens, Austen, Tolkien, and many more. I would bet it will eventually include Stephen King but it's too early to make that call now.
I know many (including myself) who would never agree to this "consensus"
The best music is Goa trance. What you're referring to as "best" is actually classic music (or as I like to call it, classic plus music :P).
There is no such consensus. Those are just popular answers coming from people with eurocentric tastes over the last couple of centuries. Millions in the global south would swap The Beatles for Celia Cruz or Fela Kuti. People in Asia from the 1700s to this day would tell you Tyagaraja is a way better composer than Beethoven and Mozart put together... he is a literal saint in India, actually.
Not to mention, many people would disregard your answer for putting Stephen King (and Tolkien, to a lesser extent) in the same sentence as Tolstoy and Shakespeare.
There is no "better" anything when we talk about culture. No consensus, etc. There are just opinions.
How much of that literature was written by wealthy landowners who already had little need for money?
Well, you needed the means to get an education, since most of the poor in those days were illiterate, which is something of an impediment to becoming a successful writer.
I can only think of one writer off hand who wasn’t a wealthy landowner, although it is a particularly notable example; that of William Shakespeare.
Shakespeare wasn’t poor (his parents seem to be of upper middle class standing), he was able to get a basic (but not a university) education and then pursue an acting career (with perhaps a side hustle as a teacher). Whatever the case he certainly wasn’t independently wealthy before he started writing, he needed to earn a living.
He did seem to be in it for the money (and fame) since he wasn’t just a writer he was an actor, theatre owner, and something of a celebrity, and he did make enough money to become a wealthy landowner by the time he died.
Do you think that in "the old times" authors didn't need money and wrote books just out of a good will?
Do you have stats on that?
I’m not sure piracy or AI training are really affecting book publishing dramatically. But if you have data, I’d be curious to see it. AI scraper bots are a total pain for online publishers and FOSS sites, but AFAIK they’re not really harming book publishing directly.
The consolidation of publishers and Amazon’s own practices are probably worse for authors than “piracy”.
Russians will just share it back (I’m saying that as a Russian). And if not Russians, then somebody else will.
What you can do is make sure people can pay you easily, and not put (a lot of) hurdles in your readers way. And when people can’t afford to pay... maybe let them enjoy your work still, and you’ll get a couple more loyal fans who would pay you when they’re able to.
At least this was my world view before AI has arrived and ruined^W disrupted everything. Now I’m not so sure.
AI publishing is just email spam, but for books. When the cost of creating worthless text is low, people do it.
HN logic:
Training on copyrighted material
--> bad
Actually distributing copyrighted material
--> good
Needless to say, this is backwards. Any copyright holder will be much more worried about the latter.
This could be another fine example of the Goomba fallacy: https://en.wiktionary.org/wiki/Goomba_fallacy
Another explanation might be a general dislike of big establishments like AI companies and publishers (which glosses over individual authors, but they probably make up a negligible portion of total sales anyway).
At least Anna's archive is consistent.
Copyright reform is necessary for national security
https://annas-archive.gl/blog/ai-copyright.html
The problem is that it is quite difficult to access the published papers is you are not in academia or some company that pays for the access, so AA sort of serves that niche to transfer the knowledge. Training on the other hand is a commercial activity to later rent the model, if this would be purely for open weights I suspect everyone was cool with it.
It's all about what you actually care.
Piracy helps people who can't afford to pay or have no way of acquiring legally.
LLM training helps megacorps replace people.
If you side with common people against megacorops, you're okay with piracy and against LLM training on copyrighted works.
It's not backwards. Which of the two makes a profit? Which of the two comes away richer? Which of the two actually takes business away from the original copyright holder?
Another source I'd love to see scraped or opened up is the New York Times archive, along with other newspaper archives.
Curious as to how you would approach this. I have no experience in this area, anyone on this forum willing to share their expertise?
If it works as AA seems to theorize, you'd need to:
That way definitely will work with the current access google provides however its an extremely inconvenient way to scrape google books
That's why they're paying you $200k for it.
yeah it's similar to what I would value google books scraping job difficulty wise and data wise, I've done my probing on m own it was water tight as my probing has gone
(c) avoid being hounded to death by a zealous district attorney.
Rip Aaron Swartz
I think this would cross the line from civil copyright claims into criminal activity
https://chatgpt.com/share/6a4970e8-7fe8-83e9-8f81-3aefd76b6b...
On another note, if Google's cybersecurity were always one rogue employee away from a massive leak, then it wouldn't be Google. What was the last Google leak you remember, defense in depth people.
AA is an openly criminal organisation. Their attitude to prosecution is "you'll never catch us lol"
Indeed, took a closer look, and the court order that took down their .org domain included "Computer fraud and abuse" claims.
Surprisingly the order is very specific about DNS registrars and authoritative domains not advertising the AA servers, so sharing IP addresses or alternative domain names through other means, like WikiPedia, is not against the order. Which means that nowadays the Wikipedia page works as a pseudo authoritative DNS.
One of my hopes is that when the AI bubble bursts, some brave person will sneak out a copy of the last frontier model.
Not worried about that, you will only have to wait 3-6 months and get a Chinese model just as good.
Chinese companies giving away expensive models for free is a symptom of the AI bubble, too. It's not a law of nature that they'll always be able to scrounge up the money for yet another training run.
Shaping the tool that does the thinking is quite valuable when you're in the business of changing how people think - I think we can expect propaganda agencies to be subsidizing model creation forever.
This doesn't strike me as a symptom of a bubble - except in so far as the bubble pushes the competitors models forwards and thus they need to invest more to stay competitive.
All the models, have to respect their local laws, and most of all, pressure from users and the employees.
They all carry political weights, because humans behind defend their interests, and are promoting some social values.
https://pastebin.com/hjhvsBFg
This answer from Claude is so biased that it is ridiculous
I think it's a deliberate business strategy of commoditization of their complement.
China acts like an entire bloc, not as single companies, and they want to monetize hardware.
If you think Chinese companies always act as a bloc, your mental model needs to get about a billion times more detailed. But in this case just a few details may be enough: There are Chinese AI companies that have released LLMs without publishing the weights.
ByteDance is going the direct-to-consumer route with their Doubao chatbot (the most popular in China, probably thanks to their social media prowess). iFlyTek seems to be angling for enterprise and government use cases, where they already have an in.
The companies that have released weights have in common that they didn't have a monetization channel lined up and their models weren't good enough to make people pay attention with just API access. (You can see with Qwen Max that the calculus can change towards not releasing weights for better models.)
And who exactly among the investors is having their complement commoditized? When Nvidia releases Nemotron, the story is clear, but it's less obvious for say Z.ai's GLM.
I have never said they always act as a bloc, but their industry has a strong component of long-term strategic government planning behind them.
> China acts like an entire bloc
> I have never said they always act as a bloc
Pick one
As long as it is in the CCP's national interest to have a frontier model, Chinese companies will have the resources for another training run.
That’s misunderstanding why these models are behind. A large part of why they’re behind is they aren’t able to do the reinforcement learning post-training steps that takes a pre-trained model and turns it into a frontier model like GPT 5 or Opus. Instead they do their best to recreate these models using distillation.
Fundamentally, you can never distill your way to being the teacher, so these approaches will not advance the frontier.
[edit, after thinking about it I think my phrasing is unfair. It's not necessarily that aren't able to do it, but they haven't yet shown that they are willing to do it.]
That’s not remotely true. They did distillation as a cheap solution to the cold start problem. You need data/trajectories to hill climb to higher capabilities. All large Chinese labs do RLAIF.
Oh yes, not remotely true. Which is why the frontier labs all have invested heavily in trying to identify and thwart distillers, using known company names / domains to drive their exclusion lists.
/s
It's cheaper to distill than to do reinforcement learning, so of course they prefer that, but if it wasn't an option they could just pay up and spend more GPU time on RL.
>"they aren’t able to do the reinforcement learning post-training steps"
Not yet.
If there is a need someone will come and fulfill. Personally for me now I do not even want to use top models. Professionally I use AI to help with the coding using Junie agent that comes with IDEs from JetBrains. Junie is told to use Gemini Flash and works fine for what I ("I" being an emphasis here) ask it to do. I tried more advanced models and different vendors only to discover credits going down the toilet without any extra benefit.
I'll agree I guess and clarify that the better phrasing is probably something like "haven't yet shown the capability to."
> you can never distill your way to being the teacher
Are you sure?
What if you distill from 10 teachers?
In this case all teachers have also learned from each other.
I think GLM 5.2 having a higher ELO than Opus 4.8 shows that they did it [https://gptbased.com]
Prediction markets can solve this.
If it's a bubble, why do you care about frontier models?
Internet was a bubble, so was telecom etc. at some point. Being bubble does not mean that when 90% of investments go down the drain the remains are not useful.
"The Internet" was not a bubble. Companies with no long-term business model / sufficient product-market fit that were riding hype were the "dotcom bubble". But when those companies crashed, nobody said "I really want to get my hands on their IP", because it wasn't valuable – an important pre-requisite to the the bubble popping. Seems to be a different case here if people actually want the SOTA models.
A bubble just means it is overvalued beyond it's true fundamentals due to speculation on speculation. The underlying asset can still have value, just less than the market price.
Consider Cisco. On the 31st of March, 2000, it was valued at US$77.31 / share, which in inflation adjusted terms is $150.46 (above the current price over 26 years later). This valuation was on the basis of speculation that the price would continue to go up and Cisco would get a large cut of the industry profits. Cisco's business is still valuable, it was just treated as overvalued by the market.
Similarly, if we go back to one of the classic examples of a bubble - consider Dutch tulip prices in 1636; speculation drove future contracts high. But tulips still have value to people today, it's just the price was higher than was sustainable.
Railways were both a bubble and, eventually, one of the most significant technological innovations in history.
If we had the dotcom bubble, why are you still on the Internet?
I never wanted the IP from dotcom bubble companies.
Because they're useful
which will be very difficult to run unless you have a large budget to operate your own mini datacenter
In a crash the hardware will go for pennies on the dollar, if not for fractions of pennies on the dollar.
Lots of companies will pick them up for scrap metal prices and host them for fractions of what we are paying today.
That's the nature of bubbles.
Can't LLMs now just write books that aren't publicly available?
Comment by Borja is a great example of eternal September.
And why's that? I don't see how either of those two things relate to one another. A grape is a naturally occurring fruit, while the creation of a book is wholly a human creative endeavor.
When I buy grapes, yeah, I get that someone worked for me to get 'em. But I don't get all metaphysical and spooky about it. I'm buying the grapes as objects. Like a book. Once I take ownership of the object, yes, thanks for your work. I paid you and I wish you well.
As far as what I ought to be able to do with the thing? Beat it. Of course there are exceptions. You say, "Don't scan the book and upload it to object storage and place it behind a CDN so Ahmed from Tounis can access it". I say, "Don't make wine".
There’s a lot of crops that are very intensively bred to create new, specific cultivars. Take a look at all the new Apple varieties popping up. There’s nothing natural about their creation: https://applerankings.com
Monsanto here, tell us more about copyrighting grapes!
https://www.monsantotechnology.com/Vegetables/main-page-mons...
Poetic license, pal!
That's why they're grapes of wrath. Look for them in the grapes section.