Recreating a Bankruptcy Dataset #4641

firmai · 2024-11-01T16:07:24Z

firmai
Nov 1, 2024

There used to be a good research dataset for public company bankruptcies https://lopucki.law.ufl.edu/index.php

It has dissappeared and I want to create an updated academic dataset using your data with perhaps the use of LLM agents to extract the appropriate datapoints.

There are a few data attributes I am interested in as highlighted in the file:

Case Number, Company Name, CityDisposed, Chapter (7/11), ClaimsAgent (Yes, No) Date365Sale (if happend) DateConfirmed, TortCause etc.

I have been able to very nicely grab docket information using your api https://www.courtlistener.com/api/rest/v4/dockets/

using these case types bankruptcy_case_types = ['Chapter 7', 'Chapter 11', 'Chapter 13', 'Chapter 12']

I have since gone directly to your https://storage.courtlistener.com/bulk-data/dockets-2024-10-31.csv.bz2 bulk dockets instead.

And have found complementary data at https://storage.courtlistener.com/bulk-data/fjc-integrated-database-2024-10-31.csv.bz2

As you can see these datasets do not give me enough information to recuperate all the facts, I think for that I need to have access to the recap archive. I believe that accessing the Recap Archive is crucial for this purpose, but I am encountering challenges in using the API to download the main (free) PDF documents needed to automate the extraction of this information.

Perhaps I have got it all wrong:

Maybe dockets and fjc bulk already has all the data and I just don't understand the codes. Am I correct in saying that this is the data dictionary? https://github.com/freelawproject/courtlistener/blob/main/cl/search/models.py
Maybe there is not Recap achive api that can automatically download the main documents associated with a docket.
If there in fact is, I think I would be able to create a very nice structured dataset from the documents that I can share with the broader community.

Thank you so much for your time and for the remarkable work you continue to do in supporting research and data accessibility. Any bit of advice here would be so greatly appreciated.

mlissner · 2024-11-01T22:45:36Z

mlissner
Nov 1, 2024
Maintainer

Sorry, can you clarify exactly which data points you're having trouble getting?

2 replies

firmai Nov 2, 2024
Author

With using the recap api to get documents. What I wonder is first of all is there a programatic way to pull the docs if they exist on recap archive. And secondly is there any endpoint that gives the recap data as TXT or JSON, where the documents have already been lightly parsed or structured?

mlissner Nov 4, 2024
Maintainer

Sorry for the slow reply here. I've got a clearer head and can help more now:

Am I correct in saying that this is the data dictionary?

Yes, that's a good place to begin. I'd also recommend the API documentation which touches on how to get these kinds of field descriptions too.

is [there a] Recap achive api that can automatically download the main documents associated with a docket?

There's an API you can use to look up the docket entries for a docket, and from there you can get the links to the PDFs. Start here: https://www.courtlistener.com/help/api/rest/pacer/

is there any endpoint that gives the recap data as TXT or JSON, where the documents have already been lightly parsed or structured?

The recap-document API documented above will give you the plaintext of every doc, but we don't extract much from it at present.

Some of these APIs take special access. If you send your CL username by email to [email protected], I can grant it for this purpose.

One thing to consider: We've wanted to do something like this too. If you're game for a larger project, building it into our system might be an option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recreating a Bankruptcy Dataset #4641

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Recreating a Bankruptcy Dataset #4641

firmai Nov 1, 2024

Replies: 1 comment · 2 replies

mlissner Nov 1, 2024 Maintainer

firmai Nov 2, 2024 Author

mlissner Nov 4, 2024 Maintainer

firmai
Nov 1, 2024

Replies: 1 comment 2 replies

mlissner
Nov 1, 2024
Maintainer

firmai Nov 2, 2024
Author

mlissner Nov 4, 2024
Maintainer