thoughtmesh
thoughtmesh
what's this ?
what's this ?
excerpts here
excerpts out
peer review
Click on a tag above to see relevant excerpts from this site.
Click on a tag above to see relevant excerpts from other articles in the mesh.
Search this article for any word:

The following is a review of, "Finding and Archiving the Internet Footprint," by Simson Garfinkle and David Cox (2009). The article details the difficulties afoot (pun intended), and ahead, in the transition from personal computing to cloud computing. They lay out a scenario in the very near future, in which an archivist might enter the office of a deceased writer only to find she has no physical, or electronic files of significance to collect. It's not that she doesn't have any files or documents, on the contrary, they are just dispersed amongst a growing number of cloud storage services, apps and platforms. A collection of items of this sort will include the added difficulty of "finding and archiving her internet footprint" rather than copying a couple hard drives or a stack of notebooks and paper files.

An internet footprint as the name suggests consists of a person's activities / impressions online, but Garfinkle and Cox note that there are different types of internet footprints that would potentially need to be traced for different content and reasons. A public internet footprint consists of anything that a person puts online, or an item gets put online with their name attached. This could be an article, a review, social media accounts, a letter to an editor, or one's own website or blog. Websites that may, or may not not exist in the future cannot be counted on to archive their own content for that very reason note Garfinkle and Cox. While not always sustainable or stable long term, a person's public internet footprint would be some of the easiest data for an archivist to find in an increasingly vast internet sea of data.

But, what about pseudo-public data? The authors remind us of the pseudonymous nature of our online selves. An archivist finding and gathering materials from an individual's public internet footprint may pass right over other quality information because the name attached to the information is a pseudonym. People don't always use their "real name" online for all kinds of reasons, but this can be problematic for archivist who would not necessarily have a list of potential pseudonyms when starting to source and collect artifacts. A pseudonymous footprint is still a public. The information is 'out there,' it just may not be obvious or readily accessible at first glance. While a footprint hidden in plain sight would be hard to trace it is not completely anonymous or private.

 An anonymous footprint provides much further challenges for an archivist. Truly anonymous works, say Garfinkle and Cox can really only be linked to the author through the content of the work itself. They note that while anonymity would be difficult to find and locate, it would be much more difficult for one to be truly anonymous anywhere online today, as well as the lack of outlets to do so. In short, anonymity would be a tall task for an archivist, but the likelihood of an anonymous challenge is far less likely than the challenges one would face with an individual's private footprint

A private footprint differs from the others. While the previous footprints have differences in accessibility, they have in common that they can be assumed to be out there on the web, accessible in some way, and waiting for the archivist to find and collect them. The private footprint in contrast, may consist of data and documents that maybe have never been available or accessible to anyone other than its creator. Garfinkle and Cox note that increasingly (in 2009, maybe more so now), computer users are storing more of their information and data on remote servers or 'clouds' dedicated to private use rather than storing on their own personally owned computer systems. An example of this would be an individual's personal Google Drive account.

The idea of internet footprints brings me back to the beginning of my studies in curation. Early on were assigned to track our digital curatorial actions for a day. To attempt this, for just one day (and only for myself!) was a challenge in thinking outside the box and past the surface perceptions of seemingly mundane actions. Mapping one day, or a week, or a month of another person's life, potentially one who is no longer living, as a contemporary or future archivist might do, seems to be overwhelming. If we are to consider all the data we are creating almost all the time it would be almost impossible it seems for anyone not in a top-secret government position. Not all data is particularly salient in its own right, but I can imagine an archivist would want to have all of the data they could possibly collect, and then determine what is of value or worth saving after they feel like they have everything of potential importance.

In an ideal world the originator, their family, or others close to them would be able to provide an archivist with a nice tidy list of all their frequented online services complete with an up-to-date list of usernames and passwords. But even this unlikely ideal would still have potential of being woefully incomplete. As close to the ideal that the archivist is likely to get would be interviews with the content originator, or those close to them. This would help the archivist with starting points. Per Garfinkle and Cox, it should be noted that even access to interviews would not show the entirety of a person's internet footprint. To go deeper, they suggest three approaches to finding internet footprints: forensic analysis of an originators computer system, search, and social network analysis

To perform a forensic analysis, an archivist would look at the originator's computers and electronic devices to glean traces of internet activity. This can be accomplished by copying all of the originators computer and media files and scanning the files with some kind of forensic feature extractor such as 'bulk extractor.' Aside from the originators personal storage devices, an archivist could; and should implement search, and social network analyses. For a search analysis, an archivist can start by utilizing common internet search systems such as Google and Yahoo for data tethered to the originators name or alias(es). Secondarily, they could search specific social websites such as Facebook, Flickr, and Instagram for analysis. This is not perfect note the authors. Many people share the same or similar names and or pseudonyms online as previously mentioned, and can prove problematic for even the savviest archivist. After the above options are attempted, finding people that the originator interacts with, or has interacted with online can provide possible new avenues of information

It is important to note that time is often of the essence. When someone has died for instance, people close to them, and platforms that they have used online may do things with their data that do not align with the needs and accessibility ideals of an archivist. Content may be lost forever without any ill-intent. Garfinkle and Cox state that it is not enough to find the data. That's just the first step. An archivist will need to copy the content in as close to the original state as possible. They recommend utilization of technology such as 'web crawlers' and information extractors like 'wrappers' such as W4F (World Wide Web Wrapper Factory), or Rapier (Robust Automated Production of Information Extractor) to extract relevant portions of a webpage from the irrelevant elements. Much like any tool, the authors add, different extraction tools often have specific purposes and should be chosen by the archivist based on needs, affordances, and constraints.

Saving the content once extracted only provides further challenges to the archivist. How should such content be saved if content can appear different on different computers? Should a website such as an individual's Facebook, be saved as the originator sees it when logged in, or as the public would have seen it? Saving or printing elements of online footprints to paper may seem like a good idea but Garfinkle and Cox say that this adds another layer of problems. Instead, they suggest that web pages should be 'printed' to a bitmap such as a TIFF or PNG file that can produce an exact copy of what was seen on the screen.

A more contemporary resource for saving web data not mentioned within the 2009 Garfinkle and Cox article is Webrrecorder. Webrecorder is a free online tool that allows an archivist or individual to create high-fidelity interactive recording of any website they visit. Upon signing up with just a valid email address, users have 5gb worth of storage to play with at webrecorder.io . Previous digital content preservers were created with static content in mind. Webrecorder enables the preservation of sites like social media pages with dynamic elements including embedded video and JavaScript. A brief test of this tool revealed it to be easy to use and intuitive. Anyone can try the tool and save webpages for a finite temporary period of time without signing in, but in order to record a page permanently, a user will have to sign in to the service. Webrecorder is an example of the evolving technology that could enable current or future archivist seeking to collect and save dynamic web elements in their original state.

When collecting the content of others, copyright and ethical considerations must also be taken into account. Copyright law in the United States is generally open (fair use) to archives made for scholarly, non-commercial purposes, and assuming that the original work is not compromised. Fair use standards do not mean however that web service providers and hosts will be receptive to having their content archived. Facebook for instance prohibits archiving an originators content by anyone other than the originator themselves. This is not a dead-end for Garfinkle and Cox, but something that they suggest could prove difficult in the future. In addition, they warn that access of an originator's content online through the use of their username and passwords can open up an archivist to issues of computer crime for unauthorized access and violation of a websites 'Terms of Service.'

As this article attests, communication and personal documentation is increasingly migrating online, and this opens many opportunities for archivists; as well as a new horizon of challenges and issues. This article delineated a scenario that could potentially happen as documents physically stored in the same location as their originator become less common. It has helped me to further conceptualize what an archivist of digital content might encounter going forward. There are many tools available at the disposal of an archivist, but the most important tool might be a mindful and dynamic archivist who is able to adjust, adapt, and innovate.