I quickly ended up with 1000 copies of webpages and initially thought – let me import these into a Word document (I realise now I can put the HTML into one webpage so they all load as one page then scrape that more easily). I’m actually happy with these as a series of webpage files. I will of course need to back these up, as otherwise the process is pointless.
23 Dec 2013
Getting old messages from Genes Reunited
It bothered me for years that I had tonnes of valuable information tied up with Genes Reunited and on Ancestry messages. I haven’t solved what to do with the Ancestry messages. I hope I shan’t have to copy and paste them all. Some websites would consider deleting my records if I stopped being a member, for example. I realised that the Genes Reunited problem was simple – each message or thread of messages had a unique URL (web address). All I needed to do was capture these 1000 URLs and then load each webpage (ideally automatically) and capture the contents from my web browser’s cache.
I quickly ended up with 1000 copies of webpages and initially thought – let me import these into a Word document (I realise now I can put the HTML into one webpage so they all load as one page then scrape that more easily). I’m actually happy with these as a series of webpage files. I will of course need to back these up, as otherwise the process is pointless.
I quickly ended up with 1000 copies of webpages and initially thought – let me import these into a Word document (I realise now I can put the HTML into one webpage so they all load as one page then scrape that more easily). I’m actually happy with these as a series of webpage files. I will of course need to back these up, as otherwise the process is pointless.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Thanks for commenting on my blog! Your comment will be live once moderated. Sorry you have to log in. Not my choice. Tweet if preferred @fh_data_project