Exporting forums for future historical researchers

ZippySLC

Member
Hi -

I've been running my forums for nearly 20 years (phpBB to vB to Xenforo) and over that time there have been a lot of stuff posted that would be a treasure trove to future historians. (My site deals with the history and ecology of a fairly famous area of my state, and there has been a lot of research and fieldwork to locate old ghost towns and other interesting stuff out in the woods.)

If something were to happen to me eventually the credit card that keeps the VPS alive will stop working. I would hate to have all of that data disappear.

It's probably not a Xenforo question, but does anybody know a way to scrape forums and have it generate a pile of static html that would allow someone to read every thread? I feel like static HTML is simple and future proof enough to be an archival format.
 

Brogan

XenForo moderator
Staff member
A lot of site owners have a will or contingency plan should something happen to them, to ensure the site can be taken over by another party.
 

ZippySLC

Member
A lot of site owners have a will or contingency plan should something happen to them, to ensure the site can be taken over by another party.
I've thought about that too, but there's no guarantee that whoever inherits the site will pass it on. For sure I want to find a way to keep it running after I go but I also want to figure out a way to donate an archive to my state library or one of the local universities.
 

Mendalla

Well-known member
Someone did that with the site we replaced (scraped it to create a read-only archive). Works somewhat better than the Wayback Machine for sure even if it has odd issues. Alas, I cannot ask him how. We are on the outs with each other right now for various reasons and if I could take that archive down somehow, I would. It has caused me some grief in the past few years.
 

CivilWarTalk

Active member
Not exactly sure Xenforo can meet these standards for archiving, but this might be something you would be interested in, assuming the US Library of Congress deems your content worthy of preservation….

 

ZippySLC

Member
Not exactly sure Xenforo can meet these standards for archiving, but this might be something you would be interested in, assuming the US Library of Congress deems your content worthy of preservation….

Thanks for the link. It's much appreciated.

I think I may end up writing some Python to export threads and messages from the database into plain text, which would be easy for some future search engine to index.
 

briansol

Well-known member
Where do these text files end up though? on a thumb drive no one can access?

My contingency plan is to pre-pay a lot and automate everything i can. But, it's not fool proof.

domains are always on auto renew with a 2 year package. I've considered going to 10 years with the .com.
hosting, it hits paypal, no credit card to expire
paypal is paid by google adense, which covers the bill.

in theory, this never runs out and is fully automated. The unknown is if any of those services go away, my hosting company gets bought out, or paypal no longer takes payments the way i have it. Or, adsense stops performing enough to cover the bill. There's some buffer in there, but only a few months worth.

I'm more worried about a table getting marked crashed or something that requires sys admin help to resolve and no amount of automation can consider everything without causing overhead (eg, running an optimize and rebuild every day)

ultimately, i need to name a beneficiary and create detailed instructions, and put it in my estate/will. MY wife will get it by default, but she wouldn't know what to do about anything. My plan for her is 'sell to X' for whatever they will take as an offer.
But, if i'm dead, she's sad and the last thing i want is for her to deal with my 'stupid website' when there's more important things to worry about like the kids, life insurance, etc.
 

ZippySLC

Member
Where do these text files end up though? on a thumb drive no one can access?

My contingency plan is to pre-pay a lot and automate everything i can. But, it's not fool proof.

domains are always on auto renew with a 2 year package. I've considered going to 10 years with the .com.
hosting, it hits paypal, no credit card to expire
paypal is paid by google adense, which covers the bill.

in theory, this never runs out and is fully automated. The unknown is if any of those services go away, my hosting company gets bought out, or paypal no longer takes payments the way i have it. Or, adsense stops performing enough to cover the bill. There's some buffer in there, but only a few months worth.

I'm more worried about a table getting marked crashed or something that requires sys admin help to resolve and no amount of automation can consider everything without causing overhead (eg, running an optimize and rebuild every day)

ultimately, i need to name a beneficiary and create detailed instructions, and put it in my estate/will. MY wife will get it by default, but she wouldn't know what to do about anything. My plan for her is 'sell to X' for whatever they will take as an offer.
But, if i'm dead, she's sad and the last thing i want is for her to deal with my 'stupid website' when there's more important things to worry about like the kids, life insurance, etc.
My site deals with the history of the southern part of New Jersey so ideally I'd ship a copy of the text files to the NJ State Library, Rutgers, or Princeton. I'm trying to plan out something to be still be useful to people 100-200+ years in the future when there might not even be a world wide web anymore but people would want to know about X, Y, and Z town or certain people who did certain things.
 

briansol

Well-known member
You assume a usb stick or CD will be able to be read that far out :D

most of us don't even have a VCR anymore... or record player, or mini disc or laser disc, or zip drive, or floppy drive, or.... And that's just in our lifetime! 200 years from now?
 

ZippySLC

Member
I trust that a research institution will know what to do with the data. For sure I don't think that a USB stick from 2022 would work 200 years from now. But a directory full of plain text ASCII files? I'm confident that's basic enough to still be usable in the future.
 
Top