2007 Meeting of the Classification Society of North America (CSNA) -- Exploring ways of extracting useful information from large and complex machine-readable data sets, including machine learning, pattern recognition, or data mining problems, or with reference to the application area such as information retrieval, molecular bioinformatics, authorship attribution, market segmentation, psychometrics, social networks, and so on.

Greenstone question/answer archives
Order of Ecumenical Franciscans
Events of AWARE Presents
SDD : TDWG Working Group (Taxonomic Databases Working Group). Structure of Descriptive Data (SDD). If you are interested in Biology, XML, and life in general, this will interest you. The group is interested in interoperability and standardizing biological data.

TeleNature: citizen scientists, research, information sharing, wireless technology, nature.

"Software is often not what users want, but something they buy and try to use anyway." - Charles T. Meadow, 1989
"A painting is not thought out and settled in advance," said Picasso. "While it is being done, it changes as one's thoughts change. And when it's finished, it goes on changing, according to the state of mind of whoever is looking at it." -- PBS
"Libraries are a haven where people should be able to seek whatever information they want to pursue without any threat of government intervention" - Joan Airoldi

Who is Karen?

I am a graduate student in Library and Information Science at the University of Illinois at Urbana-Champaign. I am interested in Digital Libraries. Now that we have so much of our information in digital format,

  • How do we make sure we have continued access to information while file formats, software, hardware, and the Internet are constantly changing?
  • How do we improve our retrieval systems?
  • What do we really want content management systems to do?
  • How are content management systems like digital libraries? How are they different?

Teaching Stuff
  • Spring 2005: TA for "Document Processing" for The Incredible Dave Dubin (Instructor)
    If you want to schedule an appointment, then email me. Thanks.
  • Fall 2004: TA for "Foundations for Information Processing in Library and Information Science" for The incredible Dave Dubin (Instructor).
  • Spring 2004: taught Web Structures and Information Architecture (LIS350W2A).
  • Fall 2003: Teaching Assistant for 2 classes for Bryan Heidorn:
    1. LIS329A
    2. LIS370FO.
Current Research Interests: Representation of Digital Documents (dissertation topic)
Representation of Digital Documents

Brief Summary:
As more of the information available is in digital format, we are presented with new problems. Perhaps some of the problems are not really that new, but new twists on problems we (as librarians and information scientists) have seen before. How do we represent digital resources? How do we preserve them for future generations? How do we provide access to these resources as computer technologies continue to change at a very fast rate (software, hardware, and file formats are changing)? How do we define a single digital resource (where does it begin and end, does an electronic article of a journal have the same relationship to the journal as the print articles did?). How do we characterize the internal structure of a digital resource? Do we characterize all digital resources in the same way? Are there commonalities between all these resources? What about all the different versions of digital documents? How do we track the intellectual content to these documents? How do we track their use and changes? What does all this mean for information storage, information retrieval, archiving, harvesting, exporting, sharing, etc?

For instance, the recorded stories of the Holocaust survivors should never be lost. Yet this is a real possibility if we store the information in digital formats that become obsolete in a few years. Vital to the retrieval and preservation of, and therefore access to, digital resources is the representation of these documents. Without proper representation we cannot locate, retrieve, or preserve information being produced today.

Areas of Interest Prior to This

The following is a reporting of an evolution of interests.

Spring 2002
A System to Aid Users to in Making Relevance Judgements

Brief summary: Information Retrieval systems have achieved somewhat of a plateau in producing good precision and recall for the users. While there is still room for advances, there tends to be a dramatic increase in costs for diminishing returns. One method for improving systems cheaply is to collect relevance judgements made by the system users. In the retrieval community, there is also a trend to move away from binary relevance judgements: while "relevant" and "not relevant" have been helpful categories in the development of information retrieval systems, these categories are not absolute, nor are they sufficient. Collecting real relevance judgements from users may be the best way to improve Information Retrieval systems.

There are two ways of collecting user relevance judgements: collection of implicit judgements, and collection of explicit judgements. A system that would let a user create categories of relevance may be better than the implied user relevance judgements now being used by some systems. Currently, implicit judgements are measured through interpreted actions such as how much time the user spent with the document open, or whether the user printed or saved the document, etc. While there is some merit to these implied rankings, they are used as measurements of relevance mostly because they are easier to collect than explicit ratings. However, if a system were to be created where explicit ranking were easy to collect without burdening the user, then explicit rankings would be better measurements of relevance than implicit rankings.

The idea is to build systems which support actions that the user does naturally, which greatly benefit the user immediately, and that would also serve as feedback mechanisms about explicit relevance judgements. The normal part of information gathering that is missing from current systems is support for side-by-side comparisons, or better yet side-by-n-side comparisons. I propose a system which allows users to create their own system of sorting things out.

My proposed system would allow the users to do digitally what they naturally do in the physical world. In a physical library setting, people collect a large number of possibilities, bring them all to a table, and critically examine each item several times. Comparing and creating multiple stacks is a normal part of the information seeking. The criteria of the stack varies widely over a very short period of time. The final selection of just a few items from a vast heap is best done with multiple piles. These multiple piles must be all visible at the same time, and the top items work as glanceable reminders to the searcher. This ability to constantly reform criteria and ranking actually aids in the user's understanding of the topic. Allow the human users do relevance judgements for their current task, and the systems can collect the judgements to use to improve system.

Another reason this system might be helpful is that people tend to select items which presents several sides of the story rather than a repeat of the same information. This type of selecting would be extremely difficult to program a computer to do while humans do this very well. So enable the humans to do what humans do well, and collect the information.

Fall 2002
Information Seeking in Context

Brief summary:
Much of the research done in Information Seeking deals with the formal situations of information seeking in a single information source. For instance, Kuhlthau studied information seeking in the library setting, Bates looks at query terms typed into the search box of an Internet search engine. Brashers looks at information seeking on a listserv (Cunningham and Downie also looked at listserv). More recently, researchers have begun studying the information seeking of special work situations (like scientists). Very few people have looked at the information seeking during a common task. Kelly (under Belkin) recently studied information seeking of graduate students over a semester. She did this by giving the students laptops, recording their Internet searches, and looking at the documents the students stored on the computers over time. "[Kelly] discovered that people spend more time reading documents when they know less about the subject. According to [Vakkari]'s empirical studies, novices to a subject rate most things as relevant. Novices are on the first leg of the learning curve, so spending a long time reading a document probably does mean that the document is relevant to a novice, but may not be relevant to an expert using the same information need statement or query." -- I wrote this in a final paper (for a class). The paper was entitled "Exactly What I Was Looking for: Adding Context to Internet Information Seeking." The paper discussed a study where I captured the activities of people on their personal computers during a two-hour session. This study was looking at the context surrounding activities to see if automatically adding the contextual information might improve the quality of the Internet searches people do. After this study, I decided it would be interesting to study the information seeking of people who were all doing the same task. After considering a wide variety of activities, I finally decided to study the information seeking of people during the task of filling out an Internal Review Board (IRB) form. The study of interruptions was also something I had looked in to during the semester - when are people interruptable during a task so that a computer could hold off with reminders and alerts until less annoying times (and possibly increase people's ability to be productive). The two topics could be looked at during my proposed study. I wrote up an IRB form and got approval for the study. I have collected some data, but the Advanced Studies Committee suggested that I not focus on this until I work on other aspects of the program.

Spring 2003
Information Sharing, Data Sharing - Botanists and Citizen Scientists

Brief Summary: Continuing with my interest in Information Seeking in Context, I became interested in how botanists share information and data in their everyday jobs and how citizen botanists seek out the same information. There are two very good reasons for studying these groups of people. 1) citizen scientists have long been involved in collecting botanical information and 2) botany in general has been slow on the adoption of computer-dependent activities. Citizen scientists were some of the first botanists. Much of the classic works in botany were written by men and women who were not specifically paid for their botanical research. Today, citizen scientists still play a role in botanical data collection. These citizen scientists are very dedicated to their data collection activities, but rarely actively consult the manuals written to aid them in their identification of the species. Bryan Heidorn is looking in to the information technologies that could help in the correct identification of plant species. During my work with him, I discovered similarities and differences between information seeking activities of the two groups. With regards to my second point above (that botany in general has been slow in the adoption of computer dependent activities), a large number of botanists are now putting their information on the Internet and sharing large portions of their data with other botanists. This sharing of data is exciting. The data-sharing that I observed between botanists is very interesting because each botanist seems to have their own way of collecting and storing data. There is a strong interest in creating a standard for data sharing, yet a reluctance to rely on computers as a technology - especially for field botanists. There is a third reason that botanists are very interesting. Botanists are experts at classification and I see very strong similarities between botanists and librarians especially in their management of information in physical formats.

Fall 2003
Uncertainty and Information Seeking, Personal Information Systems

Brief Summary:
For a class paper, I looked at uncertainty in information seeking of three very different situations. During this work, I noticed a similarity between all the situations - there seems to be a distinct line between resources that an information seeker will use and those they are reluctant to use. There seems to be a "personal information system" for each person. This personal information system is a set of information sources are readily consulted without much consideration or concern. Other less familiar sources are rarely consulted and when they are, there is a barrier of uncertainty about approaching the source and an uncertainty about the information received from these sources. Information retrieval systems do very little to reduce uncertainty. In face to face information seeking, the Other human beings readily display their expertise, give clues as how to converse with them, and share context--all of which reduce uncertainty. In computer mediated reference services, these are hidden and the uncertainty reduction tasks seem to be the sole responsibility of the information seeker. I became interested in looking deeper into this, especially with the botanists and citizen scientists from my previous interests.

My definition:
Computer Supported Cooperative Work is the study of how computers can mediate, support, and improve the work we do in groups. karen medina

Really Important Stuff and Things I Value:

Parker Evans : my nephew.

Karen Medina
... is a displaced librarian.
... believes that computers are just tools... tools that should work.
... actually hates computers.
... thinks that libraries should study how real people acquire and use information.
... to see how poorly we understand the information need statement.
So she joined the phd program.

Old classes I enjoyed
cs497 - Special topics in Human-Computer Interaction
readings from cs491 - Human Computer Interactions.
reading schedule
prosem talks
lis450cw - computer supported cooperative work (seminar)
lis450kn - knowledge networks
Grand Prairie Friends website -

"This is a very open forum for brainstorming.

"What can Library and Information Science Student do to contribute to a healthy environment. Many of the information organization skills learned in LIS would be of great value to many environmental organizations.

"We can work through independent study, class projects, internships and practicum. We might do web site construction for local organizations, organization of their libraries, perform reference work, help author reports and proposals, help a school gather water quality data, and maybe a little working in the prairies, forests or rivers if you are so inclined. Also your LIS education is much more enjoyable if you are doing something interesting and constructive as you learn." - Bryan Heidorn

Old Projects

Tracking Bird Migration : A working bibliography with Bill Cochran and Martin Wikelski.

My favorite instructors (alphabetical):
Instructor Course Number Topics
Cole LIS450ds Distributed Systems, Active Server Pages, My first exposure to Microsoft SQL
Dubin LIS 317 (best course I have ever taken) , 329, 353, 415, 429, 450da, 450dp, 450dsi, 450opr I learned PERL and so much more. He also teaches Information Retrieval, Document Processing, Cluster Analysis, Standards, Visualization, Measurement. Dubin is brilliant.
and Evaluation, XML
Heidorn LIS 315, 329, 370, 415, 429 IR, Biological Information Browsing, Vibe, Cognition, Natural Language, faculty advisor for ASIS&T
Nichols LIS250W1A, LIS45LW Web Design, Technologies, and Techniques
Schlipf LIS 410, 428; Library Buildings - a fantastic class!
Wolske LIS 315, 353, LANs, Internet, Prairienet
Zych CS110c++, CS125, CS225; Java, C++, Data Structures

Major Events in the History of Information Retrieval
My resume
Information Needs in Computer Science (for Science Reference class)
A Research Review of five Online Public Access Catalog User Studies - Karen Medina
A building program (example) - by Fred Schlipf for LIS428, not made for html.
Classes that I have taken and When and links to leep pages
LIS590I Indexing and Abstracting Fall 2004 LIS590I class list

