Search This Blog

23 Dec 2013

Getting old messages from Genes Reunited

It bothered me for years that I had tonnes of valuable information tied up with Genes Reunited and on Ancestry messages.  I haven’t solved what to do with the Ancestry messages.  I hope I shan’t have to copy and paste them all.  Some websites would consider deleting my records if I stopped being a member, for example.  I realised that the Genes Reunited problem was simple – each message or thread of messages had a unique URL (web address).  All I needed to do was capture these 1000 URLs and then load each webpage (ideally automatically) and capture the contents from my web browser’s cache.

I quickly ended up with 1000 copies of webpages and initially thought – let me import these into a Word document (I realise now I can put the HTML into one webpage so they all load as one page then scrape that more easily).  I’m actually happy with these as a series of webpage files.  I will of course need to back these up, as otherwise the process is pointless.

No comments:

Post a Comment

Thanks for commenting on my blog! Your comment will be live once moderated. Sorry you have to log in. Not my choice. Tweet if preferred @fh_data_project