2008年6月12日 星期四

Mining communities and their relationships in blogs: A study of online hate groups

1. Introduction
Blogs, or Weblogs, have become increasingly popular in recent years. Blog is a Web-based publication that allows users to add content periodically, normally in reverse chronological order, in a relatively easy way.Therefore, Many communities have emerged in the blogosphere. These could be support communities such as those for technical support or educational support. In addition, there are also hate groups in blogs that are formed by bloggers who are racists or extremists. The consequences of the formation of such groups on the Internet cannot be underestimated. Beacuse Young people are the major group of bloggers, are more likely to be affected and even ‘‘brainwashed’’ by ideas propagated through the Web as a global medium.

Facing the new trend in the cyberspace, our study has two objectives.First, we propose a semi-automated approach that combines blog spidering and social network analysis techniques to facilitate the monitoring, study, and research on the networks of bloggers, especially those in hate groups.Second, our study seeks insights into the organization and movement of online hate groups.

2. Web mining and social network analysis
Techniques based on both Web mining and social network techniques have been used in intelligence-and security-related applications and achieved considerable success.Web mining techniques can be categorized into three types: content mining, structure mining, and usage mining (Kosala and Blockeel, 2000).

  1. Web content mining refers to the discovery of useful information from Web contents, including text, images, audio, video, etc.
  2. Web structure mining studies the model underlying the hyperlink structures of the Web. It usually involves the analysis of in-links and out-links information of a Web page, and has been used for search engine result ranking and other Web applications.
  3. Web usage mining employs data mining techniques to analyze search logs or other activity logs to find interesting patterns.

3. Proposed approach
We propose a semi-automated approach for identifying groups and analyzing their relationships in blogs. The approach is diagrammed in Fig. 1. Our approach consists of four main modules: (a). Blog Spider, (b). Information Extraction, (c). Network Analysis, and (d).Visualization. The Blog Spider module downloads blog pages from the Web. These pages are then processed by the Information Extraction module. Data about these blogs and their relationships are extracted and passed to the Network Analysis module for further analysis. Finally the Visualization module presents the analysis results to users in a graphical display. In the following, we describe each module in more detail.

3.1. Blog spider
A blog spider program is first needed to download the relevant pages from the blogs of interest. Similar to general Web fetching. Alternatively, asynchronous I/O can be used for parallel fetching (Brin and Page, 1998). In either case, after a page is downloaded it can be stored into a relational database or as a flat file. In addition, the spider can use RSS (Really Simple Syndication) and get notification when the blog is updated.

3.2. Information extraction
After a blog page has been downloaded, it is necessary to extract useful information from the page. This includes information related to the blog or the blogger, such as user profiles and date of creation. This can also include linkage information between two bloggers, such as linkage, commenting, or subscription.

3.3. Network analysis
Network analysis is a major component in our approach. In this module we propose three types of analysis: topological analysis ,centrality analysis and community analysis.

  1. The goal of topological analysis is to ensure that the network extracted based on links between bloggers is not random and it is meaningful to perform the centrality and community analysis. We use three statistics that are widely used in topological studies to categorize the extracted network (Albert and Baraba’ si, 2002): average shortest path length, clustering coefficient and degree distribution.
  2. The goal of centrality analysis is to identify the key nodes in a network. Three traditional centrality measures can be used: degree, betweenness, and closeness.
  3. Community analysis is to identify social groups in a network. In SNA a subset of nodes is considered a community or a social group if nodes in this group have stronger or denser links with nodes within the group than with nodes outside of the group (Wasserman and Faust, 1994).

3.4. Visualization

The extracted network and analysis results can be visualized using various types of network layout methods.

4. Case study
4.1. Focus and Methods
We applied our approach to conduct a case study of hate groups in blogs. We chose to study the hate groups against Blacks. There are two reasons for the focus. First, the nature of hate groups and hate crimes is often dependent on the target "hated" group. By focusing on a type of hate groups it is possible to identify relationships that are more prominent. Second, among different hate crimes, anti-Black hate crimes have been one of the most widely studied (e.g., Burris et al., 2000; Glaser et al., 2002). Our approach consists of four main modules:

  1. Spiders were used to automatically download the description page and member list of each of these groups. A total of 820 bloggers were identified from these 28 groups. The spiders further downloaded the blogs of each of these bloggers.
  2. The extraction program also analyzed the relationship between these bloggers. In this study, two types of relationships were extracted:
    (1). Group co-membership: two bloggers belong to the same group (blogring).
    (2). Subscription: blogger A subscribes to blogger B. This is a directed, binary relationship.
  3. After collecting the blogs and extracting information from them, we performed demographical and network analysis on the data set in order to reveal the characteristics of these groups and ascertain whether any patterns exist.
  4. Visualization was then applied to present the results. We discuss the details of our analysis in the following sections.

4.2. Discussion
a. What are the structural properties of the social networks of bloggers in the hate groups?
Ans : Similar to the network of white supremacist Web sites (Burris et al., 2000), the network of bloggers in hate groups is decentralized.
b. Are there bloggers who stand out as leaders of influence in these groups?
Ans : Burris et al. (2000) found that the decentralized white supremacist groups had different centers of influence.
c. What is the community structure in these groups?
Ans : Communities, However, these communities are not composed of Web sites but individual bloggers. Communities provide an environment for its members to exchange their ideas and opinions and reinforce the shared ideology.
d. What do the structural properties suggest about the organization of the hate groups?
Ans : As mentioned in point (a), the structure of the network suggests that the hate groups in blogosphere have not formed into centralized organizations.
e. What are the social and political implications of these properties?
Ans :Burris et al. (2000) commented that extremist groups are a type of social movement which has profound social and political implications.

5.Conclusion and future directions
In this paper, we have discussed the problems of the emergence of hate groups and racism in blogs. Our contributions are twofold. First, we have proposed a semi-automated approach for blog analysis. Our approach consists of a set of Web mining and network analysis techniques that can be applied to the study of blogosphere. Such techniques as network topology analysis. We believe that the approach can also be applied to other domains that involve virtual community analysis and mining, which we believe would be an increasingly important field for various applications.


Second, we applied this approach to investigate the characteristic and structural relationships among the hate groups in blogs in our case study. Our study not only has provided an approach that could facilitate the analysis of law enforcement and social workers in studying and monitoring such activities, but also has brought insights into the structural properties of online hate groups and helped broaden and deepen our understanding of such a social movement.

2008年6月9日 星期一

先進網際服務系統--Homework6-2008-05-03

Read Papers 12, 13, and 14. Write a brief summary within 200 words for each paper.

Paper 12 --Empirical analysisof online social networksin the ageofWeb 2.0
Today the World Wide Web is undergoing a subtle but profound shift to Web 2.0, to become more of a social web. The use of collaborative technologies such as blogs and social networking site (SNS) leads to instant online community in which people communicate rapidly and conveniently with each other. Moreover, there are growing interest and concern regarding the topological structure of these new online social networks.

In this paper, we present empirical analysis of statistical properties of two important Chinese online social networks—a blogging network and an SNS open to college students. They are both emerging in the age of Web 2.0. We demonstrate that both networks possess small-world and scale-free features already observed in real-world and artificial networks. In addition, we investigate the distribution of topological distance. Furthermore, we study the correlations between degree (in/out) and degree (in/out), clustering coefficient and degree, popularity (in terms of number of page views) and in-degree (for the blogging network), respectively.

We studied the frequency of shortest path length, demonstrating that the famous law “six degrees of separation” is present in both the networks. We confirmed that for both networks, the clustering coefficient’s dependence on degree is nontrivial, further suggesting some level of hierarchy in topological organizations. Finally, we examined the mixing pattern, We found that the blogging network shows disassortative mixing pattern in general, while Xiaonei network is an assortative one. Our case study might help us to understand the topological features of online social network in the age of Web 2.0.

Paper 13-- A short walk in the Blogistan
In the paper, To explain `blogs’ differs from traditional Web pages both in characteristics and potential to applications. And to explore three aspects of the blogistan : its overall scope and size, identification of emerging hot topics of discussion and link patterns, and implications both to blogs and applications such as search.we develop a general methodology of mining evolving networks and connections. we develop a general methodology of mining evolving networks and connections. The first part of our study is longitudinal—based on a five-week continuous fetch of a seed collection of nearly 10,000 blog URLs. The second part is based on a successive crawl of pages suspected to be blogs leading to a larger collection of several million URLs. The collection is examined for a variety of properties. We characterize blogs and study different facets of the link structure in blogs and its evolution over time, attributes of servers and domains that host many of the blogs including their IP addresses, and how blogs behave with respect to various HTTP/1.1 protocol issues. Inferences from our in-depth exploration are relevant to applications ranging from mining to hosting of blogs and other issues of relevance to the measurement community.

An important contribution of our work is the methodology we developed to identify emerging interests by mining hyperlinks in blogs and their change over time. The methodology constitutes a general approach to mine evolving interconnection networks that we believe can have applications well beyond the Blogistan. By canceling out “repeated patterns” we are able to identify emerging ones.

Paper 14-- Analysis of User Relations and Reading Activity in Weblogs
This paper focuses on the relationships among blogs and analyzes how great an effect blog relationships have on the reading behavior of the user. First, it is examined whether there is a correlation indicating that users often visit blogs with strong relationship. Various definitions for the relationship are considered, based on factors such as comments and trackbacks, in order to analyze what relationship is the most effective for the purpose. Second, the Authors analyze whether blogs that are read frequently by the users can be identified from blog relationships. If such identification is possible, it will be possible to construct effective recommendation services based on blog relationships.

This paper has analyzed blog networks focusing on unique relationships. The range of 2-hop connection from a blog is considered, and an attempt is made, by using the index, to reveal the reading behavior of users, such as strength and kind, on the basis of the number of routes. It is evident that bookmarks have a strong effect, and that users circulate around the bookmarks in a blog network. A tendency is found that users who repeatedly read a blog with a given interest also tend to repeatedly read other blogs that are targets of action by the owner of that blog. This tendency will provide a basis for information recommendations.

Action Science Approach to Experimenting Nonprofit Web 2.0 Services for Employment of Individuals with Mental Impairments

Introduction
This research is interested in applying Web2.0 to employment services for people with mental disabilities. Web2.0 gives Internet users a chance to easily publish their work on the Web and introduce it to the world. In this paper, we study the processes how to unleash the power of Web2.0 to assist people with mental disabilities and their caregivers.

Methodology
Action research is an established research method in use in the social and medical sciences since the mid-twentieth century. The method produces highly relevant research results, because it is grounded in practical action, aimed at solving an immediate problem situation while carefully informing theory. Action researchers are among those who assume that complex social systems cannot be reduced for meaningful study. They believe that human organizations, as a context that interacts with information technologies, can only be understood as whole entities.

WEB 2.0 Solutions
We built a Web 2.0 based architecture for nonprofit organizations (Fig.1). It includes: (1) multimedia database, (2) discussion forum, (3) introduction website, and (4) anywhere portal. In the domain we study, mission-specific database is employment service database. Multimedia database gives the caregivers a space to manage all kinds of information about the mentally disabled persons. The data is kept in private with technology such as identity management, end-to-end encryption, public key infrastructure. A discussion forum platform enables the caregivers to share knowledge and feelings which extend connections to other caregivers. The introduction website is aimed to extend the public relations to the people who may care the organization. The main components are aggregated with an “anywhere portal.”

Conclusions
We used the technology of Web2.0 in an employment services system for individuals with mental impairments to make their organizational operation more efficient, which we call Nonprofit 2.0. In the beginning, we devoted ourselves in the way of thinking how a job coach accomplishes her helping tasks.
Guided by principles of action science in 4 months of organized participant observations, in-depth interviews, field work, and focus group studies, a working prototype has been built and tested by the job coaches with significant success.

2008年6月8日 星期日

Mobile Computing for Indoor Wayfinding Based on Bluetooth Sensors for Individuals with Cognitive Impairments

In this paper, The Authors propose a novel personal guidance system based on Bluetooth for individuals with cognitive impairments.

For an adult with mental disorder may want to lead a more independent life and be capable of getting trained and keeping employed, but may experienced difficulty in using public transportation to and from the workplace. The growing recognition that assistive technology can be developed for cognitive as well as physical impairments has led several research groups to prototype way finding systems.

Bluetooth is an industrial specification for wireless personal area networks (PANs). Bluetooth provides a way to connect and exchange information between devices such as mobile phones, laptops, PCs, printers. In this paper, Bluetooth is used for personal way finding purposes where Bluetooth beacons and ID scanning are used. Bluetooth operated in this discovery mode saves power, eliminate manual passkey challenges, and reduce privacy and security concern as the use does not expose her ID. Based on the Bluetooth beacon received, the position where the user is can be identified at the remote server and enable the way finding sequences.

Prototype Design:

The Bluetooth beacons trigger downloading of photos with directional instructions, thus eliminating the need of a shadow support team behind the user. Route personalization is accomplished by the system identifying the user and the destination set ahead of time. Therefore, even sensing the same beacon on the same spot, different users may receive different directional instructions. It works indoors where GPS signals cannot reach. The design draws upon the psychological models of spatial navigation, usability studies of interfaces by people with cognitive impairments, and the requirements based on interviews with nurses and job coaches at rehabilitation hospitals and institutes.

A PDA is carried by the individual who has difficulty in indoor way finding or taking public transit to and from work. The PDA shows the just-in-time directions and instructions by displaying photos, triggered by Bluetooth beacons sensed by the PDA’s built-in reader. The photos have to be prepared ahead of time.

Although the routes are preset, very few patients can hook up the PDA to a networked PC so that photos can be stored on PDA and invoked immediately when needed. Alternatively, downloaded photos are locally cached for future use. This could potentially save communications energy and cost, while reducing response time.

Conclusions
This paper present a wayfinding prototype system based Bluetooth sensors for individuals with cognitive impairments. The design draws upon the cognitive models of spatial navigation and consists of wayfinding devices and a navigation system. The prototype is implemented and tested with routes in the campus. The results show the prototype is user friendly and promising with high reliability. The success ratio can depend on the extent to which participants suffer from mental disabilities, the complexity of routes, the degree of received training and self-practices, and the distractions the participants may encounter.