what are your views on Latent Semantic Analysis (LSA)? LSA is a cool machine learning technique based on lexical evidence of co-occurrence in order to decode the underlying semantic categories (clustering or classification) of the given text (Deerwester et al. 1990). Typically, the first step of LSA is to construct word-vs.-document co-occurrence matrix. Then singular value decomposition (SVD) is performed on this co-occurring matrix. The key idea of LSA is to reduce noise or insignificant association patterns by filtering the insignificant components uncovered by SVD. Given that there is no parsing, no structures, hence no understanding involved in LSA, it is amazingly successful in some areas which are supposed to require Natural Language Understanding (NLU) or Artificial Intelligence (AI). For example, it is a dominant approach in the area of automatic grading of high school reading comprehension tests, at least it was dominant 8 years ago when I was collaborating with education researchers in proposing a new parsing based approach to this task to compete with the popular LSA approach. The reason for its (partial) success in uncovering some natural language semantics lies in the fact that sentences have two sides: structures (trees) and words (nodes). Putting structures aside, the words used in a natural language document (discourse) are not random collection, they have inherent lexical coherence holding them together to make sense. In addition. the lexical coherence evidence and the structural evidence are often overlapping in terms of reflecting underlying semantics to certain extent. Therefore, for some coarse-grained semantic tasks, there is a possibility of maximizing the use of the lexical side of evidence to do the job, ignoring the structure part of language. But there is a fundamental defect in LSA that limits how far it can go in decoding semantics, due to the lack of structures. In my past research, we have used LSA in our Word Sense Disambiguation (WSD) research project, as an auxiliary method to help perform synonym expansion in order to generalize our parsing evidence from literal node to cluster node. It seems to be effective to certain extent, but cannot be claimed better than using synonym lexicon encoded by linguists if we had human resources. It does have the benefit of automatically clustering synonyms based on the data, hence automatically adapting to the domain we are interested in. The weakness of LSA is the same as most other so-called bag of words (BOW) learning approaches based on keyword density or co-occurrence. Since LSA does not involve structures or understanding, it is at best an approximation to the effect of parsing-based (or understanding-based) approaches for almost all the tasks involving natural language text. In other words, the quality in theory (and in practice as well, as long as the parser is not built by inexperienced linguists) can hardly beat a parsing-based rule system. Another weakness of LSA is that it is much more difficult to debug a learned system for a given error or error type in results. Either you tolerate it all or you re-train LSA with new or expanded data, in which case there is no guarantee that the bulk results will get that error corrected. In a rule based system of multiple levels, it is much easier to localize the error source and fix it. My own experience with using LSA for synonyms clustering is that when I examine the results, I sort of feel that it seems to make sense, but there are numerous cases which are beyond comprehension: it was difficult to determine whether that incomprehensible part of the results is due to the noise of imperfect data and/or bugs in the algorithm, hence difficult in coming up with effective corrective methods. When we talk about rule-based semantic approach, we do not mean that the approach only relies on parsing structure in decoding semantics. When we do semantics, whether extracting sentiments, or factual events, we always bring lexical evidence and structural evidence together in accomplishing the task. For example, in order to extract the emotional sentiment of an agent expressed towards an object or brand, our sentiment rule will involve trigger words like love/like/favor/prefer and then check its logical/grammatical subject and object of certain lexical type (e.g. human type versus non-human type) to ensure we decode the semantics of the underlying text precisely. As you see, the rule approach thus used has the advantage of having two types of evidence support than LSA that has only one type of evidence. This is a fundamental difference when we compare rules with BOW class of techniques, no matter what new approaches or techniques are hot in the community. Admittedly, BOW learning in general and LSA in particular do have the benefit of being robust in handling noisy data and it can also be quickly built up once data are available. The automatic adaptation to a domain based on the training data is also a strength as it narrows down the semantic space to start with. The approximation in treating language as a black box rather than analyzing language as a de-composable hierarchy of structures is sometimes good enough in certain use cases of semantics. LSA is often cited as an alternative to grammar approach partially because it got a good, eye-catching name, I guess. It suddenly shortens the distance between sentence meaning and the building blocks words, without the trouble of having to use structures as a bridge. (But language is structured! As true as the earth is revolving.) 【相关篇什】 【科普笔记:没有语言结构可以解析语义么?(之二)】 泥沙龙笔记:儿童语言没有文法的问题 2015-07-01 【置顶:立委科学网博客NLP博文一览(定期更新版)】
http://personality-project.org/r/r.anova.html Mixed (between and Within) designs Now it's time to get serious. Appendix V contains the data of an experiment with 18 subjects, 9 males and 9 females. Each subject is given one of three possible dosages of a drug. All subjects are then tested on recall of three types of words (positive, negative and neutral) using two types of memory tasks (cued and free recall). There are thus 2 between-subjects variables: Gender (2 levels) and Dosage (3 levels); and 2 within-subjects variables: Task (2 levels) and Valence (3 levels). Get the data from the file and run the following analysis: aov.ex5 _ aov(Recall~(Task*Valence*Gender*Dosage)+Error(Subject/(Task*Valence))+(Gender*Dosage),ex5) Notice that you must segregate between- and within-subjects variables in your command. In the above example, I have put the within-subjects factors first with the within-subjects error term, followed by the between-subjects factors. datafilename=http://personality-project.org/r/datasets/R.appendix5.data data.ex5=read.table(datafilename,header=T) #read the data into a table data.ex5 #show the data aov.ex5 = aov(Recall~(Task*Valence*Gender*Dosage)+Error(Subject/(Task*Valence))+(Gender*Dosage),data.ex5 ) summary(aov.ex5) print(model.tables(aov.ex5,means),digits=3) #report the means and the number of subjects/cell boxplot(Recall~Task*Valence*Gender*Dosage,data=data.ex5) #graphical summary of means of the 36 cells boxplot(Recall~Task*Valence*Dosage,data=data.ex5) #graphical summary of means of 18 cells Should result in the following (extensive) output: datafilename=http://personality-project.org/r/datasets/R.appendix5.data data.example5=read.table(datafilename,header=T) #read the data into a table data.example5 #show the data Obs Subject Gender Dosage Task Valence Recall 1 1 A M A F Neg 8 2 2 A M A F Neu 9 3 3 A M A F Pos 5 4 4 A M A C Neg 7 5 5 A M A C Neu 9 6 6 A M A C Pos 10 7 7 B M A F Neg 12 8 8 B M A F Neu 13 9 9 B M A F Pos 14 10 10 B M A C Neg 16 ... SNIP .... 100 100 Q F C C Neg 17 101 101 Q F C C Neu 19 102 102 Q F C C Pos 19 103 103 R F C F Neg 19 104 104 R F C F Neu 17 105 105 R F C F Pos 19 106 106 R F C C Neg 22 107 107 R F C C Neu 21 108 108 R F C C Pos 20 aov.ex5=aov.ex5 = aov(Recall~(Task*Valence*Gender*Dosage)+Error(Subject/(Task*Valence))+(Gender*Dosage),data.example5 ) summary(aov.ex5) Error: Subject Df Sum Sq Mean Sq F value Pr(F) Gender 1 542.26 542.26 5.6853 0.03449 * Dosage 2 694.91 347.45 3.6429 0.05803 . Gender:Dosage 2 70.80 35.40 0.3711 0.69760 Residuals 12 1144.56 95.38 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Error: Subject:Task Df Sum Sq Mean Sq F value Pr(F) Task 1 96.333 96.333 39.8621 3.868e-05 *** Task:Gender 1 1.333 1.333 0.5517 0.4719 Task:Dosage 2 8.167 4.083 1.6897 0.2257 Task:Gender:Dosage 2 3.167 1.583 0.6552 0.5370 Residuals 12 29.000 2.417 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Error: Subject:Valence Df Sum Sq Mean Sq F value Pr(F) Valence 2 14.685 7.343 2.9981 0.06882 . Valence:Gender 2 3.907 1.954 0.7977 0.46193 Valence:Dosage 4 20.259 5.065 2.0681 0.11663 Valence:Gender:Dosage 4 1.037 0.259 0.1059 0.97935 Residuals 24 58.778 2.449 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Error: Subject:Task:Valence Df Sum Sq Mean Sq F value Pr(F) Task:Valence 2 5.389 2.694 1.3197 0.2859 Task:Valence:Gender 2 2.167 1.083 0.5306 0.5950 Task:Valence:Dosage 4 2.778 0.694 0.3401 0.8482 Task:Valence:Gender:Dosage 4 2.667 0.667 0.3265 0.8574 Residuals 24 49.000 2.042 print(model.tables(aov.ex5,means),digits=3) #report the means and the number of subjects/cell Tables of means Grand mean 15.62963 Task C F 16.6 14.7 rep 54.0 54.0 Valence Neg Neu Pos 15.3 15.5 16.1 rep 36.0 36.0 36.0 Gender F M 17.9 13.4 rep 54.0 54.0 Dosage A B C 14.2 13.5 19.2 rep 36.0 36.0 36.0 Task:Valence Valence Task Neg Neu Pos C 16.00 16.72 17.00 rep 18.00 18.00 18.00 F 14.56 14.22 15.28 rep 18.00 18.00 18.00 Task:Gender Gender Task F M C 18.93 14.22 rep 27.00 27.00 F 16.81 12.56 rep 27.00 27.00 Valence:Gender Gender Valence F M Neg 17.67 12.89 rep 18.00 18.00 Neu 17.44 13.50 rep 18.00 18.00 Pos 18.50 13.78 rep 18.00 18.00 ... snip .... , , Gender = M, Dosage = B Valence Task Neg Neu Pos C 10.00 11.67 12.33 rep 3.00 3.00 3.00 F 8.33 8.67 11.00 rep 3.00 3.00 3.00 , , Gender = F, Dosage = C Valence Task Neg Neu Pos C 20.67 21.67 21.33 rep 3.00 3.00 3.00 F 19.67 18.67 20.33 rep 3.00 3.00 3.00 , , Gender = M, Dosage = C Valence Task Neg Neu Pos C 18.00 19.00 19.00 rep 3.00 3.00 3.00 F 17.33 17.33 17.33 rep 3.00 3.00 3.00
http://personality-project.org/r/r.anova.html (Repeated measures ANOVA) One way ANOVA - within subjects Example 3. One-Way Within-Subjects ANOVA Five subjects are asked to memorize a list of words. The words on this list are of three types: positive words, negative words and neutral words. Their recall data by word type is displayed in Appendix III. Note that there is a single factor (Valence ) with three levels (negative, neutral and positive). In addition, there is also a random factor Subject . Create a data file ex3 that contains this data. Again it is important that each observation appears on an individual line! Note that this is not the standard way of thinking about data. Example 6 will show how to transform data from the standard data table into this form. #Run the analysis: datafilename=http://personality-project.org/r/datasets/R.appendix3.data data.ex3=read.table(datafilename,header=T) #read the data into a table data.ex3 #show the data aov.ex3 = aov(Recall~Valence+Error(Subject/Valence),data.ex3) summary(aov.ex3) print(model.tables(aov.ex3,means),digits=3) #report the means and the number of subjects/cell boxplot(Recall~Valence,data=data.ex3) #graphical output Because Valence is crossed with the random factor Subject (i.e., every subject sees all three types of words), you must specify the error term for Valence , which in this case is Subject by Valence . Do this by adding the termError(Subject/Valence) to the factor Valence , as shown above. The output will look like: datafilename=http://personality-project.org/r/datasets/R.appendix3.data data.ex3=read.table(datafilename,header=T) #read the data into a table data.ex3 #show the data Observation Subject Valence Recall 1 1 Jim Neg 32 2 2 Jim Neu 15 3 3 Jim Pos 45 4 4 Victor Neg 30 5 5 Victor Neu 13 6 6 Victor Pos 40 7 7 Faye Neg 26 8 8 Faye Neu 12 9 9 Faye Pos 42 10 10 Ron Neg 22 11 11 Ron Neu 10 12 12 Ron Pos 38 13 13 Jason Neg 29 14 14 Jason Neu 8 15 15 Jason Pos 35 aov.ex3 = aov(Recall~Valence+Error(Subject/Valence),data.ex3) summary(aov.ex3) Error: Subject Df Sum Sq Mean Sq F value Pr(F) Residuals 4 105.067 26.267 Error: Subject:Valence Df Sum Sq Mean Sq F value Pr(F) Valence 2 2029.73 1014.87 189.11 1.841e-07 *** Residuals 8 42.93 5.37 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 print(model.tables(aov.ex3,means),digits=3) #report the means and the number of subjects/cell Tables of means Grand mean 26.46667 Valence Valence Neg Neu Pos 27.8 11.6 40.0 The analysis of between-subjects factors will appear first (there are none in this case), followed by the within-subjects factors. Note that the p value for Valence is displayed in exponential notation; this occurs when the p value is extremely low, as it is in this case (approximately .00000018). Two-way Within Subjects ANOVA Example 4. Two-Way Within-Subjects ANOVA Appendix IV contains the data from an experiment where five subjects were tested on their recall of words of differing valences. There were two different memory tasks: free or cued recall. Thus, there were 2 independent factors: Valence (3 levels) and Task (2 levels). Again, Subject serves as a random factor. Enter the data into a file entitled ex4 and run the following analysis: In this example, Subject is crossed with both Task and Valence , so you must specify three different error terms: one forTask , one for Valence and one for the interaction between the two. Fortunately, R is smart enough to divide up the within-subjects error term properly as long as you specify it in your command. The commands are: datafilename=http://personality-project.org/r/datasets/R.appendix4.data data.ex4=read.table(datafilename,header=T) #read the data into a table data.ex4 #show the data aov.ex4=aov(Recall~(Task*Valence)+Error(Subject/(Task*Valence)),data.ex4 ) summary(aov.ex4) print(model.tables(aov.ex4,means),digits=3) #report the means and the number of subjects/cell boxplot(Recall~Task*Valence,data=data.ex4) #graphical summary of means of the 6 cells results in the following output datafilename=http://personality-project.org/r/datasets/R.appendix4.data data.example4=read.table(datafilename,header=T) #read the data into a table data.example4 #show the data Observation Subject Task Valence Recall 1 1 Jim Free Neg 8 2 2 Jim Free Neu 9 3 3 Jim Free Pos 5 4 4 Jim Cued Neg 7 5 5 Jim Cued Neu 9 6 6 Jim Cued Pos 10 7 7 Victor Free Neg 12 8 8 Victor Free Neu 13 9 9 Victor Free Pos 14 10 10 Victor Cued Neg 16 11 11 Victor Cued Neu 13 12 12 Victor Cued Pos 14 13 13 Faye Free Neg 13 14 14 Faye Free Neu 13 15 15 Faye Free Pos 12 16 16 Faye Cued Neg 15 17 17 Faye Cued Neu 16 18 18 Faye Cued Pos 14 19 19 Ron Free Neg 12 20 20 Ron Free Neu 14 21 21 Ron Free Pos 15 22 22 Ron Cued Neg 17 23 23 Ron Cued Neu 18 24 24 Ron Cued Pos 20 25 25 Jason Free Neg 6 26 26 Jason Free Neu 7 27 27 Jason Free Pos 9 28 28 Jason Cued Neg 4 29 29 Jason Cued Neu 9 30 30 Jason Cued Pos 10 aov.ex4=aov(Recall~(Task*Valence)+Error(Subject/(Task*Valence)),data.example4 ) summary(aov.ex4) Error: Subject Df Sum Sq Mean Sq F value Pr(F) Residuals 4 349.13 87.28 Error: Subject:Task Df Sum Sq Mean Sq F value Pr(F) Task 1 30.0000 30.0000 7.3469 0.05351 . Residuals 4 16.3333 4.0833 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Error: Subject:Valence Df Sum Sq Mean Sq F value Pr(F) Valence 2 9.8000 4.9000 1.4591 0.2883 Residuals 8 26.8667 3.3583 Error: Subject:Task:Valence Df Sum Sq Mean Sq F value Pr(F) Task:Valence 2 1.4000 0.7000 0.2907 0.7553 Residuals 8 19.2667 2.4083 print(model.tables(aov.ex4,means),digits=3) #report the means and the number of subjects/cell Tables of means Grand mean 11.8 Task Cued Free 12.8 10.8 rep 15.0 15.0 Valence Neg Neu Pos 11 12.1 12.3 rep 10 10.0 10.0 Task:Valence Valence Task Neg Neu Pos Cued 11.8 13.0 13.6 rep 5.0 5.0 5.0 Free 10.2 11.2 11.0 rep 5.0 5.0 5.0 ------------------------------------------------------------- # Note 这里用的模型是: aov.ex4=aov(Recall~(Task*Valence)+Error(Subject/(Task*Valence)),data.example4 ) 而上篇博文中用的是: aov.out = aov(SSS ~ diet * test + Error(subject/test), data=hill) http://blog.sciencenet.cn/home.php?mod=spaceuid=285393do=blogid=672361 区别在与第一个模型中,两个变量都基于重复测量。即每个个体都经历过各种组合的处理。 而第二个模型中,一部分个体经过diet中的 chicken ,而另一部分个体经过diet中的 chicken 的 pasta , 所以diet不是within变量。
R and Analysis of Variance A special case of the linear model is the situation where the predictor variables are categorical. In psychological research this usually reflects experimental design where the independent variables are multiple levels of some experimental manipulation (e.g., drug administration, recall instructions, etc.) The first 5 examples are adapted from the guide to S+ developed by TAs for Roger Ratcliff. For more detail on data entry consult that guide. The last three examples discuss how to reorganize the data from a standard data frame into one appropriate for within subject analyses. For this discussion, I assume that appropriate data files have been created in a text editor and saved in a subjects x variables table. One Way Analysis of Variance Example 1: Three levels of drug were administered to 18 subjects. Do descriptive statistics on the groups, and then do a one way analysis of variance. The ANOVA command is aov: aov.ex1= aov(Alertness~Dosage,data=ex1) It is important to note the order of the arguments. The first argument is always the dependent variable (Alertness ). It is followed by the tilde symbol (~) and the independent variable(s). The final argument for aov is the name of the data structure that is being analyzed. aov.ex1 is the name of the structure you want the analysis to store. This general format will hold true for all ANOVAs you will conduct. The results of the ANOVA can be seen with the summary command: #tell where the data come from datafilename=http://personality-project.org/R/datasets/R.appendix1.data data.ex1=read.table(datafilename,header=T) #read the data into a table aov.ex1 = aov(Alertness~Dosage,data=data.ex1) #do the analysis of variance summary(aov.ex1) #show the summary table print(model.tables(aov.ex1,means),digits=3) #report the means and the number of subjects/cell boxplot(Alertness~Dosage,data=data.ex1) #graphical summary produces this output datafilename=http://personality-project.org/r/datasets/R.appendix1.data data.ex1=read.table(datafilename,header=T) #read the data into a table aov.ex1 = aov(Alertness~Dosage,data=data.ex1) #do the analysis of variance summary(aov.ex1) #show the summary table Df Sum Sq Mean Sq F value Pr(F) Dosage 2 426.25 213.12 8.7887 0.002977 ** Residuals 15 363.75 24.25 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 print(model.tables(aov.ex1,means),digits=3) #report the means and the number of subjects/cell Tables of means Grand mean 27.66667 Dosage a b c 32.5 28.2 19.2 rep 6.0 8.0 4.0 Two way - between subject analysis of variance Data are from an experiment in which alertness level of male and female subjects was measured after they had been given one of two possible dosages of a drug. Thus, this is a 2X2 design with the factors being Gender and Dosage. Read the data file containing this data. Notice that there are two independent variables in this example, separated by an asterisk *. The asterisk indicates to R that the interaction between the two factors is interesting and should be analyzed. If interactions are not important, replace the asterisk with a plus sign (+). Run the analysis: datafilename=http://personality-project.org/r/datasets/R.appendix2.data data.ex2=read.table(datafilename,header=T) #read the data into a table data.ex2 #show the data aov.ex2 = aov(Alertness~Gender*Dosage,data=data.ex2) #do the analysis of variance summary(aov.ex2) #show the summary table print(model.tables(aov.ex2,means),digits=3) #report the means and the number of subjects/cell boxplot(Alertness~Dosage*Gender,data=data.ex2) #graphical summary of means of the 4 cells The output should look like the following: datafilename=http://personality-project.org/r/datasets/R.appendix2.data data.example2=read.table(datafilename,header=T) #read the data into a table data.example2 #show the data Observation Gender Dosage Alertness 1 1 m a 8 2 2 m a 12 3 3 m a 13 4 4 m a 12 5 5 m b 6 6 6 m b 7 7 7 m b 23 8 8 m b 14 9 9 f a 15 10 10 f a 12 11 11 f a 22 12 12 f a 14 13 13 f b 15 14 14 f b 12 15 15 f b 18 16 16 f b 22 aov.ex2 = aov(Alertness~Gender*Dosage,data=data.example2) #do the analysis of variance summary(aov.ex2) #show the summary table Df Sum Sq Mean Sq F value Pr(F) Gender 1 76.562 76.562 2.9518 0.1115 Dosage 1 5.062 5.062 0.1952 0.6665 Gender:Dosage 1 0.063 0.063 0.0024 0.9617 Residuals 12 311.250 25.938 print(model.tables(aov.ex2,means),digits=3) #report the means and the number of subjects/cell Tables of means Grand mean 14.0625 Gender f m 16.2 11.9 rep 8.0 8.0 Dosage a b 13.5 14.6 rep 8.0 8.0 Gender:Dosage Dosage Gender a b f 15.75 16.75 rep 4.00 4.00 m 11.25 12.50 rep 4.00 4.00 The generalization to n way ANOVA is straightforward.
'Languagegene'speedslearning ‘ 语言基因 ’ 让你学的更快 Mouse study suggests that mutation to FOXP2 gene may have helped humans learn the muscle movements for speech. 对老鼠的研究表明: FOXP2 基因变异可能会对人们掌握说话时的肌肉运动有帮助。 A mutation t hat appeared more than half a million years ago may have helped humans learn the complex muscle movements that are critical to speech and language. 50 万年前出现的变异可能一直在帮助人们掌握复杂的肌肉运动,这种肌肉运动对说话和语言至关重要。 The claim stems from the finding that mice genetically engineered to produce the human form of the gene, called FOXP2 , learn more quickly than their normal counterparts. 这一说法源自这样的发现:通过基因改良的老鼠产生了人类具有的基因,这种基因叫 FOXP2 ,这些老鼠比他们的同伙(没有基因改良的老鼠)学的更快。 The work was presented by Christiane Schreiweis, a neuroscientist at the Max Planck Institute (MPI) for Evolutionary Anthropology in Leipzig, Germany, at the Society for Neuroscience meeting this week in Washington DC this week. 德国莱比锡的马科斯普朗克人类进化研究院 (MPI) 的一位神经学家,叫做克里斯汀 - 施瑞斯,在本周出席在华盛顿召开的神经科学协会会议上提交了上述发现。 Scientists discovered FOXP2 in the 1990s by studying a British family known as 'KE' in which three generations suffered from severe speech and language problems 1 . Those with language problems were found to share an inherited mutation that inactivates one copy of FOXP2 . 科学家发现 FOXP2 基因是上世纪 90 年代的事,当时科学家对英国一家代号为 ‘KE’ 的家庭进行研究,这家三代人都有严重的说话和语言障碍。这三代人都遗传了一个变异,这个变异阻止了 FOXP2 基因的复制。 Most vertebrates have nearly identical versions of the gene, which is involved in the development of brain circuits important for the learning of movement. The human version of FOXP2, the protein encoded by the gene, differs from that of chimpanzees at two amino acids, hinting that changes to the human form may have had a hand in the evolution of language. 多数脊椎动物都有几乎一样的 FOXP2 基因形式,这种基因涉及到对掌握运动至关重要的脑部的发育。人类的 FOXP2 基因(该基因编码的蛋白质)与黑猩猩的相比有两个氨基酸不同,这表明人类的这种基因变化可能对语言的进化发挥了作用。 A team led by Schreiweis’ colleague Svante Pääbo discovered that the gene is identical in modern humans ( Homo sapiens ) and Neanderthals ( Homo neanderthalensis ), suggesting that the mutation appeared before these two human lineages diverged around 500,000 years ago. 施瑞斯的一个同事叫做塞万提 - 帕博,他领导的一个小组发现了现代人(智人)和穴居人(尼安德特人)的 FOXP2 基因是一样的。这表明在 50 万年前这两支人类先祖分道扬镳之前变异就出现了。 Altered squeaks 变了的叫声 A few years ago, researchers at the MPI Leipzig engineered mice to make the human FOXP2 protein. The ‘humanized’ mice were less intrepid explorers and, when separated from their mothers, pups produced altered ultrasonic squeaks compared to pups with the mouse version of FOXP2. 几年前,德国莱比锡的马科斯普朗克人类进化研究院 (MPI) 的研究人员对老鼠进行了基因改造,让老鼠具有人类的 FOXP2 基因蛋白。这种 ‘ 人类化的 ’ 老鼠变成了胆小的探险者,并且当把它们和它们的妈妈分开时,与带有老鼠原版 FOXP2 基因的小老鼠相比,这些基因改良后的小老鼠会发出变化了的超声波叫声。 Their brains, compared with those of normal mice, contained neurons with more and longer dendrites — the tendrils that help neurons communicate with each other. Another difference was that cells in a brain region called the basal ganglia were quicker to become unresponsive after repeated electrical stimulation, a trait called ‘long-term depression’ that is implicated in learning and memory. 改造后老鼠的大脑与正常老鼠的大脑相比较,含有更多的神经元而且神经元的树突更长。神经元树突是一种须状物,可以帮助神经元相互之间进行通讯交流。另外一个不同是,改造后的老鼠大脑底部神经中枢的脑细胞经过反复的电刺激后,更快进入冷漠状态,这一特征叫 ‘ 长期压抑 ’ ,这种 ‘ 长期压抑 ’ 涉及到学习和记忆。 At the neuroscience meeting, Schreiweis reported that mice with the human form of FOXP2 learn more quickly than ordinary mice. She challenged mice to solve a maze that involved turning either left or right to find a water reward. A visual clue, such as a star, along with the texture of the maze's surface, showed the correct direction to turn. 在神经科学大会上,施瑞斯报告说:具有人类 FOXP2 基因的老鼠比普通老鼠学习的更快。他让老鼠走迷宫,左转或者右转,走对了就奖给老鼠水喝。在迷宫里有诸如星状的可视标记,加上通道的表面的质感,可以指明正确的方向。 After eight days of practice, mice with the human form of FOXP2 learnt to follow the clues to the water 70% of the time. Normal mice took an additional four days to reach this level. Schreiweis says that the human form of the gene allowed mice to more quickly integrate the visual and tactile clues when learning to solve the maze. 经过 8 天练习后,带有人类 FOXP2 基因的老鼠在 70% 的情况下可以根据线索找到水喝。普通老鼠需要另外化四天时间练习才能达到这样的水平。施瑞斯说:在老鼠走迷宫时,人类的 FOXP2 基因让老鼠更快的把可视线索和触觉线索联系在一起。 In humans, she says, the mutation to FOXP2 might have helped our species learn the complex muscle movements needed to form basic sounds and then combine these sounds into words and sentences. 对人类而言,他说,向 FOXP2 基因的变异可能帮助了我们这一物种掌握复杂的肌肉运动,要形成基本声音然后把基本声音合成为字然后再合成为句子,复杂的肌肉运动是必须的。 Another MPI team member, Ulrich Bornschein, presented work at the neuroscience meeting showing that the changes to brain circuitry that lead to quicker learning come about with just one of the two amino-acid changes in the human form of FOXP2 . The second mutation may do nothing. 另一个 MPI 小组成员,叫做乌里奇 - 本斯新,在神经科学大会上提出了他的研究结果,他的结果表明:导致学习更快的脑部变化的只是人类 FOXP2 基因里两个变化了氨基酸中的一个,另一个变化了的氨基酸毫无作用。 “That makes sense,” says Genevieve Konopka, a neuroscientist at the University of Texas Southwestern Medical Center in Dallas, who also studies FOXP2 . Carnivores, including dogs and wolves, independently evolved the other human FOXP2 mutation, with no obvious effect on their brains. 位于达拉斯的得克萨斯大学西南医学中心的一位神经学家,叫做吉纳维夫 - 科诺普柯,也在研究 FOXP2 基因。他说: ‘ 是那样 ’ 。食肉动物,包括狗和狼,独立的进化成了其他的人类 FOXP2 基因变种,对它们的大脑没有明显影响。 Faraneh Vargha-Khadem, a neuroscientist at University College London who has studied the KE family in which FOXP2 is mutated, thinks that the new findings could help explain the gene's role in perfecting the facial movements involved in speech. 法拉尼 - 乌迦 - 科登是伦敦大学分院的神经学家,她研究了 KE 家族 FOXP2 基因变异,她认为新的发现可以帮助我们解释在说话时形成的脸部运动中 FOXP2 基因所起的作用。 But she does not see how changes in basic learning circuitry could explain how FOXP2 helps humans to automatically and effortlessly translate their thoughts into spoken language. “You are not deciding how you are going to move your muscles to form these sounds,” she says. 但是她没有找到如何用(负责学习的)脑部变化来解释 FOXP2 基因是如何帮助人类自觉地而且毫不费力地把想法转换成口头语言的。她说: “ 人们不用刻意去想如何使用你的肌肉来发出声音 ” 。 http://blog.sina.com.cn/s/blog_70f7edbc0100ydq3.html Scientists Identify a Language Gene Bijal P. Trivedi for National Geographic Today October 4, 2001 Researchers in England have identified the first gene to be linked to language and speech, suggesting that our human urge to babble and chat is innate, and that our linguistic abilities are at least partially hardwired. "It is important to realize that this is a gene associated with language, not the gene," said Anthony Monaco of the University of Oxford, England, who led the genetic aspects of the study. The gene is required during early embryonic development for formation of brain regions associated with speech and language. The gene, called FOXP2, was identified through studies of a severe speech and language disorder that affects almost half the members of a large family, identified only as "KE." Individuals with the disorder are unable to select and produce the fine movements with the tongue and lips that are necessary to speak clearly. "The most obvious feature is that they are unintelligible both to naive listeners and to other KE family members without the disorder," said neurologist Faraneh Vargha-Khadem of London's Institute for Child Health, who studied the family. The members of the family also have dyslexic tendencies, difficulty processing sentences, and poor spelling and grammar. FOXP2 is responsible for the rare disorder seen in the KE family that is a unique mixture of motor and language impediments, said Monaco. But, Monaco cautioned, "FOXP2 is unlikely to be the cause of less severe language deficits that affect approximately 4 percent of schoolchildren. FOXP2 will not be the major gene involved in most of these cases." Their findings are published in the October 4 issue of the journal Nature. Using data from the KE family, researchers narrowed the location of the FOXP2 gene to a region of chromosome 7 that contained about 70 genes. Analyzing these genes one by one is a task that could easily have taken more than a year. But Monaco's team made a breakthrough when researcher Jane Hurst of Oxford Radcliffe Hospital identified a British boy, unrelated to the KE family, who had an almost identical language deficit. The boy, known as "CS," had a visible defect in chromosome 7 that specifically affected the FOXP2 gene. "The defect was like a signpost, precisely highlighting the gene responsible for the speech disorder," said Monaco. The FOXP2 gene produces a protein called a transcription factor, which attaches itself to other regions of DNA and switches genes on and off. In the KE family, one of 2,500 units of DNA that make up the FOXP2 gene is mutated. Monaco suggested that this mutation prevents FOXP2 from activating the normal sequence of genes required for early brain development. "It is extraordinary that such a minute change in the gene is sufficient to disrupt a faculty as vital as language," he said. Although humans have two copies of every gene, just one mutated copy of FOXP2—as in the case of both CS and the KE family—can have devastating effects on brain development, said Vargha-Khadem. Brain imaging studies of the KE family revealed that affected members have abnormal basal ganglia—a region in the brain involved with movement—which could explain difficulty in moving the lips and tongue. Regions of the cortex involved in speech and language also appear aberrant. The discovery of FOXP2 offers Monaco and other geneticists a probe to fish for other genes involved in development—specifically those directly controlled by FOXP2. Also in progress is a collaborative project to study the evolution of the human FOXP2 gene by comparing it with versions in chimps and other primates. Monaco speculates that differences between the FOXP2 gene in humans and chimps may reveal a genetic basis for differing abilities to communicate. http://news.nationalgeographic.com/news/2001/10/1004_TVlanguagegene_2.html