2014年10月25日星期六

Link Analysis--Relative importance between you and me



When analyzing a network, we are often interested in understanding the relative importance of each vertex. For example, identifying the most influential people, schools, 
countries, or scientific publications with the greatest impact. When we search for a given topic through searching engine, we usually focus on the page that is ranked at top one. This is because we want to find out the most authoritative Web page. In addition, the following factors are also considerable.
- Pages that contain the largest number of occurrence of the keywords
- Pages that are written by people/organizations that are trustworthy, or are experts in the topic
- Pages that are read by many people
- Pages that are linked to by many other pages (having a lot of in-links)

There are two popular Link Analysis techniques:

• HITS (Hyperlink-Induced Topic Search)
: there are two types of vertices in a network (hubs and authorities)
. Hubs contain list of inks to other pages while authorities contain useful information of a given topic.
PageRank:
Endorsement from more important vertices should be considered more valuable or credible.
HITS used in Twitter’s “Who to follow”, page ranking, recommendation Systems and PageRank used in part of Google’s search engine. Both algorithms rank pages by analyzing their in-coming and out-going links. And another algorithm named EigenRumor can identify blog posts in a blog community. The following chart shows the different application fields and algorithms among HITS, PageRank and EigenRumor.

In my last blog, I use the software "NodeXL Excel Template" to draw a sociagram of my blog network and know who is more active in sub groups of the whole class. And this time, I also use it to import the data of YouTube User's Network and use the function of Graph Metrics to analyze all kinds of information about this network. From the graph, we choose the mode named Harel-Koren Fast Mul to see the followers clearly. 

The users named UNC-Chapel Hill and themrfinneth followed many users. And we can see that tedtalksdirector followed tededucation, tedtalks, tedyouth, tedpartners, and tedfellowstalks. After we enter the webpage of “tedtalksdirector” on YouTube platform, we really can find this five users. It indicates that tedtalksdirector is more like a hub and from this hub we can find related users that tedtalksdirector followed.

There is one thing I have to mention. Because one of my friends shared a wonderful tool named "AlchemyAPI" with us. I think it is really powerful and useful. So I can't wait for using it to find more interesting things and added them in this blog. Firstly, I also test out the AlchemyLanguage API demo on the webpage based on this blog. I chose one of the functions, which can extract some keywords from my blog and calculate the relevance and sentiment value on what I’ve mentioned. The web use colorful squares to show the positive, neutral, negative and some mixed words. 
And then, I tried the function of AlchemyVision Face Detection, which can automatically detect faces and identify people within images. I selected an image, which was taken after our first group meeting. Then I uploaded it on the webpage and the result is quite funny. It shows us different ages of each group member. Actually, Wenwen is older than me, so I think the system may detect the gesture of victory of her. I guess this gesture can make people look very young? And the same to you, your ages are less than 18. Are you a student in high school? haha...



2014年10月14日星期二

SNA tells you: Who is the most popular in your social network



Social Network Analysis (SNA) is actually a technic and first employed by the sociologist. It studies various social phenomenon. For example, two different people in the visual display and they socially interact with each other, check with each other and they related to each other in sort of formal or informal relationship like the teacher and student relationship, the classmates relationship or the manager and his or her staff. So there can be many different kinds of social relationship as well as social interaction

And social network analysis provides both a visual and a mathematical approach to analyze the structure of social relationship. And SNA is not limited to the analysis of individuals, it can be a group, a company or even a nation. So we can use SNA to help us to better understand about what’s going on.

The characteristics of network analysis include on relationships between actors and the effect of the structure on the outcome. Perhaps one of the most interesting features of network analysis is its visual display called network diagram.


SNA is related to graph theory and I've developed a clearer understanding about some basic items of graphs such as path, degree, weighted, order/size, density etc. The nodes in the network are the people and groups while the links show relationships or flows between the nodes. For example, weighted graphs are graphs whose edges are associated with some weights. From that we can know the degree of interaction we connect with each other. See the figure below, C to A is 0.9 and C to D is 0.1, so it shows that C interact with A more frequently tan D.

To understand networks and their participants, we evaluate the location of actors in the network. Measuring the network location is finding the centrality of a node. These measures give us insight into the various roles and groupings in a network. It tells us who is the most influential and the most prestigious, who are the connectors, mavens, leaders, bridges, isolates, where are the clusters and who is in them, who is in the core of the network, and who is on the periphery.

There are three parameters identify central nodes in network.

DEGREE: the number of people a person is connected to
-In-degree or Out-degree is the number of links that lead into or out of the node.
-Might reflect the importance of this actor among other people.

CLOSENESS: the extent to which an actor is close to all other people in the social network
-The mean length of all shortest paths from a node to all other nodes in the network.
-If the node is close to all each other, its closeness is high.

BETWEENNESS CENTRALITY: The number of shortest paths that pass through a node divided by all shortest paths in the network
-measure the importance of an actor in a social network.

For the purpose of further understanding and review of Graph theory related to SNA, I also drew a Sociagram of my blog network (see Figure, due to 21 Sep 2014 20:00 pm). The people involved in the diagram are the ones who gave comments to my blogs or the ones who received my comments. I got comments from 4 different classmates and gave comments to 2 other guys in one-way direction. Lan Shishi (the central node) got comments from most people and the blue node has only one tie that means this node has little communication with other nodes in this network. However, it doesn't necessarily say the central node is more active than the blue nodes in other sub groups of the whole class.





At the end of the class, professor Chan showed us one of the applications of SNA based on their project. They perform a social network analysis to study the relationships of the blogs’ comments and the calculation of centrality. They study the connection. The work is the sociogram number of 52 classmates and the development of the social network. From that, we can see at the very beginning only a few of classmates commented each other, but in the end almost everyone connected each other, so the density is increasing. 


The quite interesting finding is to calculate the centrality for male students as well as female students. For the female students, they consistently have a higher In-Degree than the male and likewise on the Out-Degree. That may show that female students are more outgoing in the online social network, similarly for the Closeness, female students also out perform than the male. She shows the social graph to that group of students on week 5, so they found actually they have not commented on other people’s blogs, this group of people change their action. Out-degrees actually measuring how many comments you made on other students, so you can see large increase here.


But one thing is very special, is about Betweenness. While female students win over every aspect but for Betweenness, the male students generally have a higher betweenness, male students somehow are the controller of information. They are situated in between the path while the female students they post more messages.

SNA methods provide some useful tools for addressing one of the most important in the aspects of social structure. The network perspective suggests that the power of individual actors is not an individual attribute, but arises from their relations with others. I think there are a lot of interesting concepts and theories in the field of SNA. Let's continue to find more and learn more.

[1] Chan R Y Y, Huang J, Hui D, et al. Gender differences in collaborative learning over online social networks: Epistemological beliefs and behaviors[J]. Knowledge Management & E-Learning: An International Journal (KM&EL), 2013, 5(3): 234-250.