Come say hi! edit

I am keen for (helpful) feedback and suggestions. Leave a message :) Ashyanya (talk) 18:49, 6 November 2017 (UTC)Reply

I saw somewhere (an article retrieved on my old university's systems) a table of the changing list of primes. Something like that would be sweet to see. — Preceding unsigned comment added by 67.2.165.142 (talk) 06:03, 2 May 2018 (UTC)Reply

Greetings, earthling! edit

I see you are a new user here. Welcome!

I'm what you would call a long-term drive-by editor. I use Wikipedia a lot for my own purposes, not especially to edit, but frequently stop to make small improvements (or larger ones on rare occasions), but rarely hang around to collaborate.

Somehow, most of my edits seem to survive. Insofar as I know the ropes, it's mainly oriented toward staying under the radar.

I hope you are not soon buffeted by outrageous slings and arrows. (In some ways it was easier to start back in the wild west of 2006.)

Guess what's going around on geek social media today?

Long ago I dropped out of my computer science degree to implement the first Chinese/Japanese/Korean word processor for the IBM PC. The machine had 640 KB of memory. While we spent 75% of our development time tightening the corset strings (we contrived somehow to fit statistical input methods, Asian fonts, and a word processor into memory at the same time), I did manage to learn quite a bit of linguistics along the way, and it has been a semi-professional passion ever since.

Despite being a bit long in the tooth to pick up a hot new field like deep learning machine translation (LSTM), that's actually what I've been doing lately.

This is a course I've mostly blitzed through to get my feet back on the ground:

I was mostly shocked that computational linguistics hasn't progressed that much since the mid 1990s. Slept through a few classes, didn't miss much.

This one is a lot more work:

It's a different kettle of drive-by Wikipedia fish for me concerning these articles:

I actually care about this topic, long term. My stance toward linguistics is more philosophic than most people working in the deep learning field (where all the brilliant young 'uns can barely keep up the roiling technicalities—so it goes).

I was speculating to myself whether something like these new results would be possible since the 1990s. I read a lot back then about the cryptographic methods used at Bletchley Park (an enterprise far more linguistic than most people realize) and convinced myself that it was possible, though I wasn't sure I'd live to see the day.

I like Pinker well enough, but The Language Instinct has always made my head explode. In my view there are enough practical constraints to keep people's minds operating in similar grooves without parsing all the way down to a grammar gene. I felt this entire sphere of speculation was premature. Bletchley was data oriented, and I've always suspected that some kind of shockingly effective gradient would one day be squeezed out of large corpora.

From that article above:

In the only directly comparable results between the two papers—translating between English and French text drawn from the same set of about 30 million sentences—both achieved a bilingual evaluation understudy score (used to measure the accuracy of translations) of about 15 in both directions. That's not as high as Google Translate, a supervised method that scores about 40, or humans, who can score more than 50, but it's better than word-for-word translation.

So I'm guessing this scale is out of 1000.

But—however pedestrian—progress is progress. A established nucleus of bare comprehension is not to be sneezed at. (Gosh. Fifteen! I hate to begin to imagine. Surely closer to Flaubert's parrot than Flaubert's Parrot. And I'm allergic to feathers!)

There are already rumours from the trenches of successful automatic machine language translation through a pivot language (usually English) or a synthetic distributed representation vector space (that's a fancy term for mathematical black box).

Surely this burgeoning black box contains something analogous to semantic primes. In Bletchley terminology they would call these "cribs", a crib being any linguistic feature that's stable enough to dig your fingernails into, en route to a larger crack. It doesn't really matter if your cribs are See Spot Run or Dr Seuss. There are basic constraints on the human condition that all languages must somehow countenance. Gradient will be found. It might only go skin deep. (Though bear in mind that skin is the body's largest organ.)

The kindergarten nucleus I've alluded to is probably pretty strong. At the other end of the spectrum, I suspect the Latin/Greek abstract terminology endowment also sticks out like a sore thumb (I think I've seen that result reported from fairly primitive automatic classifiers).

Back in my CJK days, the biggest hassle we had to deal with in Chinese was quantifiers. Chinese has a complex system of measure words. Detecting the measure words was crucial because otherwise the preceding digit strings could map onto scads of different syllabic homonyms (our input method was phonetic). This is another in that class of surface regularities you'd quickly notice if trapped forever in The Library of Babel with a language you didn't already know or recognize.

In any case, the dam is going to burst fairly soon (within about five years) on the entire field of human linguistic universals. The delay will mainly be due to a personpower shortage of those who truly understand the new algorithms willing to sit patiently with those who wish to approach this from a more philosophical perspective.

I suspect some kind of universal primer black box will emerge, half of which can be introspected in a way that makes (human) intuitive sense, and the other half as inexplicable as AlphaGo, which will instance twenty years of hard thinking.

Perhaps this new primer will become the C-3PO of automated language acquisition, fluent in 3000 languages, awkward in 3001 (one can not become fluent in Wookiee eyebrow sign language, but one had better learn). Maybe it can wow everyone and learn Navajo straight-up (without any paired sentences, on a relatively small corpus) to a sufficient degree for rough, idiomatic translation.

Anyways, I see you've set out to weed and clarify some of the overlap among these articles. I just burnished a bunch of link-rot out of the semantic primes article (at one point in history, people linked every damn thing that was linkable, but the current policy is only to link what a reasonable person might wish to consult, and with good effect). I actually think that big duplicated table shouldn't be.

That said, I don't really know this literature at all. I have more to say about the literature in this area that hasn't been written yet.

Is taking on these pages related to your PhD subject matter?

If so, hang on for a wild ride.

I'll leave this to you to do your thing, but do ping me if there's anything you'd like to discuss, about Wikipedia, or linguistics, or anything else. I'd probably have to work far too hard to restrain my WP:OR bent to edit seriously on these pages anyway.

Best wishes and good luck. — MaxEnt 02:04, 30 November 2017 (UTC)Reply

Thanks MaxEnt. I am still learning the ropes, and the pages are in serious need of a big edit, so feel free to drive-by as many times as you like. It would be good to get some experienced feedback on the work. There's a lot of information you have mentioned there, and I must confess, I have a burgeoning interest in NLP, but as yet have not had the time to pursue it to any detail. I will have to read through everything you have mentioned though.
The pages are related to my PhD, which is about the only reason that I feel that I've done enough reading in and around the topic to be able to improve the pages. I commend you for the wide range of interests and topics you can work on. The duplicated table of primes is awful, I agree. There are better resources out there, and I hope to find good versions to improve the Wikipedia pages. — Ashyanya (talk) 05:14, 18 December 2017 (UTC)Reply