科学网

 找回密码
  注册

tag 标签: automated

相关帖子

版块 作者 回复/查看 最后发表

没有相关内容

相关日志

Did Trump's Gettysburg speech enable support rate to soar?
liwei999 2016-10-29 08:58
Last few days have seen tons of reports on Trump's Gettysburg speech and its impact on his support rate, which is claimed by some of his campaign media to soar due to this powerful speech. We would love to verify this and uncover the true picture based on big data mining from the social media. First, here is one link on his speech: DONALD J. TRUMP DELIVERS GROUNDBREAKING CONTRACT FOR THE AMERICAN VOTER IN GETTYSBURG . (The most widely circulated related post in Chinese social media seems to be this: Trump's heavyweight speech enables the soaring of the support rate and possible stock market crash ). Believed to be a historical speech in his last dash in the campaign, Trump basically said: I am willing to have a contract with the American people on reforming the politics and making America great again, with this plan outline of my administration in the time frame I promised when I am in office, I will make things happen, believe me. Trump made the speech on the 22nd this month, in order to mine true public opinions of the speech impact, we can investigate the data around 22nd for the social media automated data analysis. We believe that automated polling based on big data and language understanding technology is much more revealing and dependable than the traditional manual polls, with phone calls to something like 500 to 1,000 people. The latter is laughably lacking sufficient data to be trustworthy. What does the above trend graph tell us? 1 Trump in this time interval was indeed on the rise. The soaring claim this time does not entirely come out of nowhere, but, there is a big BUT. 2. BUT, a careful look at the public opinions represented by net sentiment (a measure reflecting the ratio of positive mentions over negative mentions in social media) shows that Trump has basically stayed below the freezing point (i.e. more negative than positive) in this time interval, with only a brief rise above the zero point near the 22nd speech, and soon went down underwater again. 3. The soaring claim cannot withstand scrutiny at all as soaring implies a sharp rise of support after the speech event in comparison with before, which is not the case. 4. The fact is, Uncle Trump's social media image dropped to the bottom on the 18th (with net sentiment of -20%) of this month. From 18th to 22nd when he delivered the speech, his net sentiment was steadily on rise from -20% to 0), but from 22nd to 25th, it no longer went up, but fell back down, so there is no ground for the claim of support soaring as an effect of his speech, not at all. 5. Although not soaring, Uncle Trump's speech did not lead to sharp drop either, in terms of the buzz generated, this speech can be said to be fairly well delivered in his performance. After the speech, the net sentiment of public opinions slightly dropped, basically maintaining the fundamentals close to zero. 6. The above big data investigation shows that the media campaign can be very misleading against the objective evidence and real life data. This is all propaganda, which cannot be trusted at its face value: from so-called support rate soared to possible stock market crash. Basically nonsense or noise of campaign, and it cannot be taken seriously. The following figure is a summary of the surveyed interval: As seen, the average public opinion net-sentiment for this interval is -9%, with positive rating consisting of 2.7 million mentions, and negative rating of 3.2 million mentions. How do we interpret -9% as an indicator of public opinions and sentiments? According to our previous numerous automated surveys of political figures, this is certainly not a good public opinion rating, but not particularly bad either as we have seen worse. Basically, -9% is under the average line among politicians reflecting the public image in people's minds in the social media. Nevertheless, compared with Trump's own public ratings before, there is a recorded 13 points jump in this interval, which is pretty good for him and his campaign. But the progress is clearly not the effect of his speech. This is the social media statistics on the data sources of this investigation: In terms of the ratio, Twitter ranks no 1, it is the most dynamic social media on politics for sure, with the largest amount of tweets generated every minute. Among a total of 34.5 million mentions on Trump, Twitter accounted for 23.9 million. In comparison, Facebook has 1.7 million mentions. Well, let's zoom in on the last 30 days instead of only the days around the speech, to provide a bigger background for uncovering the overall trends of this political fight in the 2016 US presidential campaign between Trump and Clinton. The 30 days range from 9/28-10/28, during which the two lines in the comparison trends chart show the contrast of Trump and Clinton in their respective daily ups and downs of net sentiment (reflecting their social rating trends). The general impression is that the fight seems to be fairly tight. Both are so scandal-ridden, both are tough and belligerent. And both are fairly poor in social ratings. The trends might look a bit clearer if we visualize the trends data by weeks instead of by day: No matter how much I dislike Trump, and regardless of my dislike of Clinton whom I have decided to vote anyway in order to make sure the annoying Trump is out of the race, as a data scientist , I have to rely on data which says that Hillary's recent situation is not too optimistic: Trump actually at times went a little ahead of Clinton (a troubling fact to recognize and see). The graph above shows a comparison of the mentions (buzz, so to speak). In terms of buzz, Trump is a natural topic-king, having generated most noise and comments, good or bad. Clinton is no comparison in this regard. The above is a comparison of public opinion passion intensity: like/love or dislike/hate? The passion intensity for Trump is really high, showing that he has some crazy fans and/or deep haters in the people. Hillary Clinton has been controversial also and it is not rare that we come across people with very intensified sentiments towards her too. But still, Trump is sort of political anomaly, and he is more likely to cause fanaticism or controversy than his opponent Hillary. In his recent Gettysburg speech, Trump highlighted the so-called danger of the election being manipulated. He clearly exaggerated the procedure risks, more than past candidates in history using the same election protocol and mechanism. By doing so, he paved the way for future non-recognition of the election results. He was even fooling the entire nation by saying publicly nonsense like he would totally accept the election results if he wins: this is not humor or sense of humor, it depicts a dangerous political figure with ambition unchecked. A very troubling sign and fairly dirty political tricks or fire he is playing with now, to my mind. Now the situation is, if Clinton has a substantial lead to beat him by a large margin, this old Uncle Trump would have no excuse or room for instigating incidents after the election. But if it is closer to see-saw, which is not unlikely given the trends analysis we have shown above, then our country might be in some trouble: Uncle Trump and his die-hard fans most certainly will make some trouble. Given the seriousness of this situation and pressing risks of political turmoil possibly to follow, we now see quite some people, including some conservative minds, begin to call for the election of Hillary for the sake of preventing Trump from possible trouble making. I am one with that mind-set too, given that I do not like Hillary either. If not for Trump, in ordinary elections like this when I do not like candidates of both major parties, I would most likely vote for a third party, or abstain from voting, but this election is different, it is too dangerous as it stands. It is like a time bomb hidden somewhere in the Trump's house, totally unpredictable. In order to prevent him from spilling, it is safer to vote for Clinton. in comparison with my earlier automated sentiment analysi blogged about a week ago ( Big data mining shows clear social rating decline of Trump last month ),this updated, more recent BPI brand comparison chart seems to be more see-saw: Clinton's recent campaign seems to be stuck somewhere. Over the last 30 days, Clinton's net sentiment rating is -17%, while Trump's is -19%. Clinton is only slightly ahead of Trump. Fortunately, Trump's speech did not really reverse the gap between the two, which is seen fairly clearly from the following historical trends represented by three different circles in brand comparison (the darker circle represents more recent data): the general trends of Clinton are still there: it started lagging behind and went better and now is a bit stuck, but still leading. Yes, Clinton's most recent campaign activities are not making significant progress, despite more resources put to use as shown by bigger darker circle in the graph. Among the three circles of Clinton, we can see that the smallest and lightest circle stands for the first 10 days of data in the past 30 days, starting obviously behind Trump. The last two circles are data of the last 20 days, seemingly in situ, although the circle becomes larger, indicating more campaign input and more buzz generated. But the benefits are not so obvious. On the other side, Trump's trends show a zigzag, with the overall trends actual declining in the past 30 days. The middle ten days, there was a clear rise in his social rating, but the last ten days have been going down back. Look at Trump's 30-day social cloud of Word Cloud for pros and cons and Word Cloud for emotions: Let us have a look at Trump's 30-day social media sentiment word clouds, the first is more about commenting on his pros and cons, and the second is more direct and emotional expressions on him: One friend took a glance at the red font expression fuck, and asked: what afre subjects and objects of fuck here?The subject generally does not appear, the default is a general network In fact, the subject generally does not appear in the social posts, by default it is the poster himself, reflecting part of the general public, the object of fuck is, of course, Trump, for otherwise our deep linguistics based system will not count it as a negative mention of trump reflected in the graph. Let us show some random samples side by side of the graph: My goodness, the fuck mentions accounted for 5% of the emotional data, the poor old Uncle Trump were fucked 40 million times in social media within only one-month duration, showing how this guy is hated by some of the people whom he is supposed to represent and govern if he takes office. See how they actually express their strong dislike of trump: fucking moron fucking idiot asshole shithead you name it, to the point even some Republicans also curse him like crazy: Trump is a fucking idiot. Thank you for ruining the Republican Party you shithead. Looking at the following figure of popular media, it seems that the most widely circulated political posts in social media involve quite some political video works: The domains figure below shows that the Tumblr posts on politics contribute more than Facebook: In terms of demographics background of social media posters, there is a fair balance between make and female: male 52% female 48% (in contrast to Chinese social media where only 25% females are posting political comments on US presidential campaign ). The figure below shows the ethnic background of the posters, with 70% Caucasians, 13% African Americans, 8% Hispanic and 6% Asians. It looks like that the Hispanic Americans and Asian Americans are under-represented in the English social media in comparison with their due population ratios, as a result, this study may have missed some of their voice (but we have another similar study using Chinese social media , which shows a clear and big lead of Clinton over Trump ; given time, we should do another automated survey using our multilingual engine for Spanish social media. Another suggestion from friends is to do a similar study on swing states because after all these are the key states that will decide the outcome of this election, we can filter the data by locations where posts are from to simulate that study). There might be a language or cultural reasons for this under-representation. This last table involves a bit of fun facts of the investigation. In social media, people tend to talk most about the campaign, on the Wednesday and Sunday evenings, with 9 o'clock as the peak, for example, on the topic of Trump, nine o'clock on Sunday evening generated 1,357,766 messages within one hour. No wonder there is no shortage of big data from social media on politics. It is all about big data. In contrast, with the traditional manual poll, no matter how sampling is done, the limitation in the number of data points is so challenging: with typically 500 to 1000 phone calls, how can we trust that the poll represents the public opinions of 200 million voters? They are laughably too sparse in data. Of course, in the pre-big-data age, there were simply no alternatives to collect public opinion in a timely manner with limited budgets. This is the beauty of Automatic Survey , which is bound to ourperform the manual survey and become the mainstream of polls. The following figure is the most influential followers authors: Authors with most followers are: Most mentioned authors are listed below: Tell me when in history did we ever have this much data and info, with this powerful data mining capabilities of fully sutomated mining of public opinions and sentiments at scale? Big data mining shows clear social rating decline of Trump last month Clinton, 5 years ago. How time flies … Automated Suevey Dr Li’s NLP Blog in English
个人分类: 社媒挖掘|4274 次阅读|0 个评论
Automated survey based on social media
liwei999 2015-6-14 22:35
It is now an evident trend that automated survey using social media as sources complements and will eventually largely replace manual surveys. That is an unstoppable direction as social media are becoming the major outlets of public opinions. The technology is ready too. Automated survey, or auto poll, refers to the use of computers to collect the public opinions and sentiments on a topic. The data sources are social media big data where people are discussing most every topic all the time. The technology is a parser that reads social media posts and mines salient information (facts, evaluations and emotions) about any topic. More specially, deep information extraction and sentiment analysis are the required and mature text mining technology that can be enabled by an underlying parser. This is the part of Artificial Intelligence that is proven to work and has been serve the clients in the business world (e.g. our customer insight products). Polls can provide quantitative information for decision-making in government, businesses and the general public, enjoying an extremely wide range of applications for many years. The presidential election is a prominent example, polls are conducted from time to time during the election to inform the voters as well as the president candidates how the public feels about the race so voters can make an educated choice and the candidate president teams can adjust their policies and campaign strategy to enhance their public image. Product launch is an example of the enterprise, feedback collected from customer surveys can help businesses to detect issues and to address them. Auto-poll is dong the same, just that it is doing it much faster, more comprehensive, in a larger scale and is less costly. Compared with the traditional manual questionnaires or polls, auto-poll has the following salient features. Real time. No need to go through a series of traditional survey process, designing the questionnaire, distributing them or by telephone interviews or street interviews, collecting and summarizing the results, with all steps carried out manually. It often takes days or even weeks to complete a serious survey. But auto-polls are instant, you get results as soon as you enter your topic. As long as there are people discussing it , the insights will mined out of the text sea. For any topic, using automated survey is as easy as using a search engine with the same response time but much more accurate results, Our deep parser reads social media day and night to feed our storage just as a search engine indexes the Internet in their storage. Low cost . Manual surveys are constantly struggling between the required costs and the scale of surveys (bigger scale reduces the error margin to be more reliable and convincing). They often have to compromise the sample size given the budget. Auto-poll is done fully automatically by the system, and the same system can serve a variety of different customers in different topics, each poll is inexpensive, costing just a fraction of the traditional poll. The sample size can easily be magnitudes higher than that of manual surveys (often millions of data points vs. several thousand data points), way beyond the reach of most traditional polls. Objectivity . Traditional polls or surveys need to design a questionnaire, which may intentionally or unintentionally introduce subjective bias or implied suggestions. Auto poll is bottom-up data analysis and mining, hence more objective by nature. The public opinions are collected from the natural comments people make on topics, not as a response to a designed specific question. Moreover, in order to collect sufficient number of survey responses, the investigators administering the surveys sometimes need to offer incentives, which introduce a possible bias because some customers who answer the surveys too quickly to be honest, mainly do so to gain rewards, not to really air their opinions, causing the return of low-quality or polluted results. Multi-topic comparison . This is particularly important, because almost for any topic, we need a competitor or industry as background to figure out the real image in public mind. For example, the poll on Obama's presidential campaign's effectiveness is of little sense if it is not contrasted to his rival Romney. Likewise, customer surveys on ATT's cellular network service is inseparable from comparisons of its competitors like Verizon. Ideally, a full picture will be clear on one brand once it is in comparison with all leading brands in the same industry. In theory, manual surveys can also perform multi-brand comparisons, but in practice, the costs and time required to investigate many brands at the same time are often beyond feasible. Investigators have had to reduce or sacrifice on the front of investigation of competitors, and use the limited resources on their own brand. Automated survey is different, multi-topic survey and comparison of these products is designed as a feature in these systems, it is just as easy as surveying one brand in this fully automated environment. BPI (Brand Passion Index) in our products is one such feature that instantly surveys multiple brands in one industry and compar them in three dimensions, buzz (size of the bubble), popularity (up or down in the graph), passion intensity (right or left in the graph: the more right, the more intense). For example, the illustration of BPI for the US retail stores gives a clear picture of the landscape and where each brand stands in its space.. In short, we are entering a big data age with no short of information on any topics you may need to study. With mobile-web and social media in everyone's hands, public opinions and sentiments are buried in the big data calling for deep technology to mine. Thus, there is absolutely no doubt that automated survey will become the direction of polls as the mainstream. Its supporting technologies are mature, large-scale multi-lingual text mining system that parses and reads big data around the world is just around the corner. Related posts in my original Chinese blog: 【立委科普:自动民调】 奥巴马赢了昨晚辩论吗?舆情自动检测告诉你 社会媒体舆情自动分析:马英九 vs 陈水扁 舆情自动分析表明,谷歌的社会评价度高出百度一倍 【置顶:立委科学网博客NLP博文一览(定期更新版)】
个人分类: 社媒挖掘|3211 次阅读|0 个评论

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-21 03:52

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部