Sunday, November 30, 2008

Muddiest Point

Would making your personal data that the government collects available and secure to yourself be a viable option for the government to maintain security and privacy issues eliminated?

Wk 13 blog post

You tube video was no longer available due to copyright claims from Viacom.

TIA and Data Mining

I thought this was interesting. I had no idea just how much the government put programs in place to track it's citizens. I have nothing to hide and could care less if they know, yet it is totally an invasion of privacy. I like how many newsworthy stories are on this website that I don't hear about on mainline news.

Notes:
A database by TIA would be populated by transaction data contained in current databases such as financial records, medical records, communication records and travel records as well as new sources of information.

A key component of the TIA project was to develop data mining or knowledge discovery tools that would sort through the massive amounts of information to find patterns and associations.

In September 2003, Congress got rid of the TIA, however other similar programs are still being implemented. Those programs include Novel Intelligence for Massive Data and Transportation Security Administration.

Friday, November 21, 2008

Muddiest Point

Are there certain qualifications required to be a part of the "Wikipedia Community"? Do they go through sort of a hiring process to make sure they know what they are talking about?

Week 12 post

Using a wiki to manage a library instruction program: Sharing knowledge to better serve patrons
-creates better information sharing
-facilitates collaboration in the creation of resources
-efficiently divides work loads
-two uses-sharing knowledge and ability to cooperate in creating resources
-Commercial sites abound to help you build your own Wiki includes seedwiki, pbwiki, jotspot, twiki, phpwiki.
-the creator of the wiki decides who has editing rights to the wiki.
-wikis are used to manage public services information, collaborate on and keep track of reference questions and assess databases.


Creating the academic library folksonomy: Put social tagging to work at your institution

Social tagging is a relatively new phenomenon that allows an individual to create bookmarks for web sites and save them online
Tags include subject keywords chosen by the user, brief descriptions of sites
Folksonomy is a taxonomy created by ordinary folks
U of Penn adopted PennTags where UP students, faculty and staff can book mark useful websites
Open source content management software is Drupal
Academic social tagging site is connotea

Jimmy Wales- Wikipedia
Neutrality on issues. If problems occur and opinions are given, they will be asked to leave.
The wikipedia core community meets off line too.
Whenever changes are made, a "wikipedia community person" is sent a copy to double check information and delete what they need to.
Votes for deletion page, to see if something needs deleted.
Next step is to create textbooks on wikipedia. It should take at least 20 years.

Friday, November 14, 2008

Week 11 Post

Dewey Meets Turing

-Librarians, computer scientists and publishers are interested in beginning the Digital Libraries Initiative in 1994, funded by the National Science Foundation.
-Computer scientist say DLI as a chance to impact society.
-Librarians say DLI as a means to get funding and to insure the libraries continued impact on scholarly work.
-when the web came along, it changed DLIs plans but the need to have better and more complete holdings remains a focus.
-With the web, deals with publishers and copyright restrictions made computer scientists change how they publish their work.
-Now the library was forced to change ideas because many journal publisher's business decision to charge at a premium for digital content computer scientists have named information hubs.
-Opportunities now arise for direct connections between librarians and scholarly authors.

Digital Libraries

-The mantra has been: aggregate, virtually collocate and federate. The goal of seamless federation across distributed, heterogeneous resources remains the holy grail of digital library work.
-DLI-1 funded six university led projects to develop and implement computing and networking technologies that could make large scale electronic test collections accessible and interoperable.
School are: U of MI, Stanford, U of CA-Berkely, U of CA-Santa Barbara, Carnegie Mellon and U of IL-Champaign-Urbana.
-Probably the most significant contribution of the IL project was the transfer of technology to our publishing partners and other publishers.
-A large number of significant digital library standards and technologies have been developed by entities outside of the federally funded projects
publishers
publisher consortium
Bibliographic utilities
W3C
Academic consortium
NISO
LOC
Library integrated system vendors
web search engines
Computer companies
Open Source community

Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age

-The development of institutional repositories emerged as a new strategy that allows universities to apply serious, systematic leverage to accelerate changes taking place in scholarship.
-Online storage costs have dropped significantly; repositories are now affordable.
-Operational responsibility for these services may reasonably be situated in different organizational units at different universities promoting collaboration among librarians, IT people, archives and records managers, faculty and university administrators and policymakers.
-A mature and fully realized institutional repository will contain intellectual works of faculty and students.
-Cautions
administration might try to gain more control over faculty intellectual work
overloading the repository
creating repositories to rapidly
Repositories need to be sure to preserve formats, have identifiers, and documentation and management of rights.

Monday, November 10, 2008

Now working

I made a couple adjustments and my web page will work. I discovered that it will work on Internet Explorer but not FireFox.

www.pitt.edu/~lar68

Sunday, November 9, 2008

Assignment : web page

Here is my link to my web page. I emailed Lucie because I couldn't get the links to work. I was able to use them within my Publisher document, but not once I uploaded them to Pitt's server. I am placing the link to the web page here, however I am emailing a copy of the Publisher document to Lucie so she can see that it works in Publisher.

www.pitt.edu/~lar68

Thursday, November 6, 2008

muddiest point

Is it possible to give the writers of these deep web data the ability to tag their own documents? Wouldn't this help them also to tag data in such a way to help web searchers that have interests that are the same as there own to find accurate information?

Nov 7th notes

The Deep Web: Surfacing Hidden Value

--Most of the Web's information is buried far down on dynamically generated sites, and standard search engines never find it.
--Deep Web sources store their content in searchable databases that only produce results dynamically in response to a direct request.
--Search engines obtain their listings in two ways: Authors may submit their own Web pages, or the search engines "crawl" or "spider" documents by following one hypertext link to another. The latter returns the bulk of the listings
--Cross referencing web sites gives better results ie-google
--BrightPlanet's technology is a "directed-query engine."
--The deep Web is about 500 times larger than the surface Web, with, on average, about three times higher quality based on our document scoring methods on a per-document basis.
--Serious information seekers can no longer avoid the importance or quality of deep Web information. But deep Web information is only a component of total information available. Searching must evolve to encompass the complete Web.

How Things Work--Part one

--Within a data center, clusters or individual servers can be dedicated to specialized functions, such as crawling, indexing, query processing snippet generation, link-graph computations, rusult caching, and insertion of advertising content.
--Currently, the amount of Web data that search engines crawl and index is on the order of 400 tB, placing heavy loads on server and network infrastructure.
--The crawler initializes the queue with one or more seed urls. A good seed url will link to many high-quality web sites.
--Crawling proceeds by making an http request to fetch the page at the first url in the queue. When thecrawler fetches the page, it scans the contents for links to other urls and adds each previously unseen url to the queue. Finally the crawler saves the page content for indexing. Crawling cotinues until the queue is empty.
--Simple crawling algorithm must be extended to address the following issues
----Speed
----Politeness
----Excluded Content-robots.txt file determine whether the webmaster has specified that some or all of the site should be crawled.
----Duplicate Content
----Continuous crawling-carrying out full crawls at fixed intervals would imply slow response to important changes in the web.
----Spam rejection Primitive spamming techniques, such as inserting misleading keywords into pages that are invisible to the viewer.
-------Spammers also engage in cloaking, the process of delivering different content to crawlers than to site visitors.

Web Search Engines: Part 2
--Search engines use an inverted file to rapidly identify indexing terms.
--An indexer can create an inverted file in two phases. Scanning and Inversion
--Scaling up--document partitioning
--Term lookup-The webs vocabulary is unexpectedly large, containing hundreds of millions of distinct terms.
--Compression Indexers can reduce demands on disk space and memory by using compression algorithms for key data structures.
--Phrases Special indexing tricks permit a more rapid response.
--Anchor text Web browsers highlight words in a web page in indicate the presence of a link that users can click on
--Link Popularity Score--Frequency of incoming links
--Query-independent score ranking of websites