I think you may have misunderstood the criticism. Yes, of course Lexis and Westlaw provide digital copies of case reporters. But they also do a ton of other stuff. One of their extremely expensive services allows you to watch state and federal court dockets and get a notification about new entries.
Published cases themselves are in relatively good shape -- most of what you can find in the West National Reporter system over the last 50 years is available for free on Google Scholar.
In response to your, "These services do not charge you for access to the case law, which are in the public domain." Sure they do. It costs money to buy the West Federal Supplement or Tennessee Reports in print. It also costs money to access those reporters online, via Westlaw or Lexis-Nexis. Many of these old cases (despite being uncopyrighted) are only available by visiting a law library or a paid online service. You can't just download them for free.
> Yes, of course Lexis and Westlaw provide digital copies of case reporters.
My point is that you don't have to pay Lexis and West just to get copies cases that are published digitally. You may have to do more leg work, but you don't have to pay Lexis or West.
I think Mr. Foster's use of the term "rent seeking" makes it clear that he is the one who misunderstands the situation. It implies that through some collusion with the court system, Lexis and Westlaw have made themselves the sole source of getting public domain information. This is the sense in which I was using the "charge for access to cases" because this is the sense implied by the term "rent seeking."
But charging for their copy of a public document is not "rent seeking," nor is it "rent seeking" to build a value-added service on top of publicly available information. Indeed, that's pretty much Google's business model--to make money categorizing information they don't own. I don't think anyone would accuse Google of rent-seeking if they e.g. showed ads associated with cached copies of a page.
In the context of pre-digital cases, why should "freely available" be coincident with "download[able] for free?" Back then, "freely available" meant putting a paper copy in a public law library (of which there are many). Going back and digitizing those reports, then charging for access to the digital copies is a value-added service, not "rent seeking."
> It's not "rent seeking" to build a value-added service on top of publicly available information, or even to charge for making obscure publicly available information more easily available.
I'm inclined to agree. But perhaps "rent seeking" or not is not the right debate.
It's hard to tease out how much of the value of WL/Lexis/BL is in comprehensive access to cases and documents vs. how much comes from the kind of value added products/services that you mention.
An interesting question is just how valuable these value added services actually are. Whatever value they provide today might reasonably be discounted by their lack of defensibility going forward. As you noted, there is much room for technological disruption here. One could imagine dramatically better products (my thesis is that the right Silicon Valley startup can build them). If WL/Lexis/BL want to survive in the next 5-10 years, they are going to have to get away from publishing and push technology farther, either internally or by acquiring the very few startups who are doing interesting things in this space. What do you make of that thesis? It sounds to me like your thinking about their value add might be clarified by distinguishing between publishing and technology.
> It's hard to tease out how much of the value of WL/Lexis/BL is in comprehensive access to cases and documents vs. how much comes from the kind of value added products/services that you mention.
Providing comprehensive access to disparate publicly available information is itself a value-add. Should courts submit their opinions to some sort of centralized system instead of posting a link to the PDF on their website? Sure, you can make that argument. But it's hard to say that courts are remiss in not doing so, considering that they are dozens of independent entities. Moreover, someone has to pay to build that system. Categorizing information is not free--Google just makes us think it's free because it charges us by selling our privacy rather than simply asking for a fee. Now it's certainly something that could be done at the public expense, and maybe it should be. But until then, providing comprehensive access to these documents is a value-add.
> One could imagine dramatically better products (my thesis is that the right Silicon Valley startup can build them). If WL/Lexis/BL want to survive in the next 5-10 years, they are going to have to get away from publishing and push technology farther
I can imagine dramatically better products, but not in the next 5-10 years. If history tells us anything,[1] it's that AI is really hard and any technology that relies on advances in AI are further off than we imagine. Google's search engine, at the moment, can't even tell the difference between a real product review site and SEO-optimized, auto-generated garbage. It'll be a lot longer than 5-10 years before I trust some algorithm to tell me whether one case cites another in a positive or negative light.
There are other business models. Westlaw / Lexis Nexus's customers and competitors who do not have this curated information, can get together (via an intermediary perhaps) to curate them better.
You're correct that Rayiner's statement that Westlaw/Lexis don't charge for access to case law is wrong. But he is right about the broader point that they do provide some value-add, and that value-add is why people continue to use their service.
With the entry of Google Scholar, the access and value-add businesses are divorced. The cost of the underlying access has reached zero. But the reason Westlaw and Lexis are still huge is because they do provide value-adds.
So this talk of opening up PACER is really missing the broader access point - the opinion access problem is largely solved via Google Scholar. PACER documents may be opened up, but they are not useful to lawyers in anything other than some basic docket scenarios.
>So this talk of opening up PACER is really missing the broader access point - the opinion access problem is largely solved via Google Scholar.
Yes and no. Suppose I want to take a stab at automating Westlaw. Now I need to download all of the court opinions ever so that I can put them in my database and use them to train my machine learning algorithm. Can I actually get them for any reasonable price? All of them? And without doing something that will get me charged with unauthorized access to a computer like Aaron? Because if not, fixing that is a legitimate goal. (And if so, do post the link.)
"Google Scholar provides a US Federal case law database which includes US Supreme Court opinions since 1 US 1 (pre - 1776), Federal Appeals opinions since 1 F 2d 1 (1924+), and many Federal District Court opinions from the Federal Supplement. Opinions from all 50 states are included since 1950. Internal page numbers are included as well, and cases are hyperlinked to other cases within each case. Click on the "Cited by" link to see citations of a particlar opinion in other case law." (http://virtualchase.justia.com/wiki/united-states-federal-ca...)
So very close to "everything."[1] Why don't you ask Google to put up a torrent?
[1] Anything that isn't in there is probably on a library shelf somewhere, and you're welcome to do the same thing Lexis/Westlaw did and scan it in yourself.
The distinction here is between you as the public trying to understand the law, and you as a businessperson trying to make money off of the law. The push to open up PACER is based on ideals about the former, not the latter. (And the former is Google Scholar Legal's Raison d'être.)
Certainly it will be great to have free access to the data in bulk, and that day will come. But from a practical perspective, the barrier to entry as a businessperson in this space is not data. There is enough data at https://bulk.resource.org/courts.gov/ to experiment with. And if your experiments work, there are companies that will sell you access to all the data.
I don't know if you can distinguish the two so easily. You can't really expect the government to provide a slick interface or make the data easy to use. It's well outside of the court system's core competency. But if you make the raw data easily accessible then it creates the possibility for entrepreneurs to do something innovative with it, which leads to the public benefit.
Heck, that's pretty much where Lexis and West came from so long ago, but now technology is offering the possibility (at least in theory) to democratize the process so that someone can screw around with the data over a weekend or three and see if they can create something worth following up before they go through the trouble of looking for funding in order to buy data.
I don't know if we're even particularly arguing about anything. Is there really an argument to be made that this data should not be freely available? If "that day will come" then I guess all I'm saying is that sooner is better than later.
There may not be a disagreement: I'm not arguing that this data should be closed. I believe it should be freely available, and should be freely available in bulk form.
My only point was that providing open access to cases so that individuals can learn the law is a solved problem via Google Scholar. Bulk access is not a problem for individuals. I say this as someone who has worked to create better legal research tools for nearly 5 years: the first 2 on a startup that failed, the next 1 at Scholar, and the last 1.5 on a new startup that I believe will finally succeed.
So while the desire to open up PACER - and the attention given to it - is great, I don't think it will have any effect on the future of individual access to law, or to the future of innovation and business in the legal space. I don't think lack of innovation today has anything to do with lack of bulk access today. I think the deficiencies are more a reflection of the fact that innovation in this space is really hard.
Scholar has problems in this area because of exclusive contracts to westlaw/lexis in the state arena, and for other reasons in the federal arena
Basically: The opinion access problem is not "largely solved".
If it was, Google wouldn't be one of the folks fighting to get access to opinions, and I wouldn't be funding folks like Carl Malamud to do things like scan federal reporters (https://yeswescan.org/index.court.html, #22).
I'm not certain I understand what problems you are alluding to. Scholar has cases, and for you as an individual finding a case is largely trivial.
If your interest is in getting and providing bulk access to cases, then yes, you have obstacles. But that is only a problem for a select few individuals/companies who care about this as a business or hobby. The end users you may be trying to service by offering access to the cases can already get access to the cases from Google for free.
EDIT: To be clear, I'm not intending to downplay or diminish the efforts of you and others to open up access to those older cases. I think that is great. All I am saying is that for the vast majority of people, the vast majority of the time, there really isn't a problem in finding cases online.
I'm saying scholar has a problem in actually getting the data to give it to you, and that is the problem trying to be solved by getting PACER to open up.
And I'm saying that is not true. Scholar has nearly all of the cases anyone cares about - that's how they can index and serve them. (I say this as a former Google Scholar engineer.) If you are an individual, you can read the cases on Scholar. If you are a business and want to buy the cases in bulk - then you can find someone to sell them to you.
Opening up PACER really doesn't help in practice because the data is ridiculously dirty - crappy scanned PDFs which people then pay to have triple keyed.
EDIT: Regardless, this has been an interesting discussion and I'd enjoy taking it out of the context of HN if you are interested: itai |at| judicata |.com|
Published cases themselves are in relatively good shape -- most of what you can find in the West National Reporter system over the last 50 years is available for free on Google Scholar.
In response to your, "These services do not charge you for access to the case law, which are in the public domain." Sure they do. It costs money to buy the West Federal Supplement or Tennessee Reports in print. It also costs money to access those reporters online, via Westlaw or Lexis-Nexis. Many of these old cases (despite being uncopyrighted) are only available by visiting a law library or a paid online service. You can't just download them for free.