Applications of Deep Machine Learning
Deep Learning / Machine Learning algorithms do in some way mimic human cognition, although the substrate is different (silicon) and the exact algorithms are different despite similar end functions/solutions.
While my interest is in helping develop better A.I. algorithms, I am mainly looking into making small changes and editing and re-training existing A.I. systems, and expanding datasets. Furthermore, this section focuses on the application of existing algorithms, already trained with extensive datasets. These are extremely powerful but provide limitations, especially with the limited labels in object processing.
<-- GIF references: Screenshot images saved by H Muzart; some are showing image stimuli made by H Muzart (a subset using other CCBY images), being processed by DML algorithms (by other open-source authors, see below), modified and run by H Muzart.
Emotional face recognition using DCNNs
Many works in visual neuroscience (Hubel, Wiesel, et al, 1970s-1990s) and human emotional behaviours (Ekman, Phelps, McGaugh, Cahill, Rolls, Feldman, et al, 1990s-2010s) have inspired a shift in computing machines from non-smart visual encoding to artificially intelligent computer vision (Hinton, et al, 2012-present) - with deep convolutional neural networks (DCNNs). The subtle cues in facial features are particularly important.
I used the Google Video/Vision API Platforms on my HM developer.console account, using Spyder and command console SDKs, to run these DML algorithms in the Cloud.
If one may incorporate this in augmented reality ( [ooooo] [ooooo] ), one may find a way to tackle issues like autism spectrum, alexithymia, prosopagnosia, etc, in real-time and remotely.
Object detection and varying scene understanding
In the real world, like viewing videos, things in one's field of view are not static, but move/change. But we can simply understand a video as a set of static frames/images to rapidly process. However, the precise frame label may actually depend on information in frames prior to it and immediately after it. For example, a zoomed out photograph depicting what appears to be a mobile phone in a scene may be detected as any other similar object because it is too small to see; if when zoomed in too much, small microfeatures may be detected as many different things; but given it being zoomed in the right size, and some context (e.g. the phone being held by a hand, or having the screen turned on), one would better be able to identify it. Another example: someone talking with a neutral facial expression, given a positive valence spatial and temporal context (e.g. award ceremony), may be interpreted as happy, but given negative valence (e.g. funeral), will be interpreted very differently. This is why DCNNs, RNNs and LSTM need to be used. There is the further use of style-transfer using GANs in DeepFake technology (good reviews at 2-min papers).
Input can be fed in, all prior, or it can be live real-time feeding. I have used ([-link-1-], [-link-2-]) tools (from Anaconda, darknet, openCV, RGB.v, Google AI Platforms, ImageNet x, etc), both on my Hard Disk Drive (HDD/SSD) and in the Cloud, to help me with these. I also used the Google Video/Vision API on my HM developer.console account, using Spyder and command console SDKs, to run some of these DML algorithms.
(see diagram)
What I am working on also is getting more training data for better labels - there is simply so many things which the current programs does not get with current training data, such as individual human body parts, types of non-common specialised workspace items, objects with non-photorealistic computer graphics, etc.
Computational neurolinguistics and language comprehension
As a teen, my interests in language comes from my multi-lingual (fluent English & French, and some delving into German, Thai, Japanese, Latin/Greek) background, and my own personal clinical experience with visual & hearing impairments (visual/auditory neurology). I then got interested in linguistics (N Chomsky, S Pinker, et al), modern and ancient English literature/novels, the philosophy of logical reasoning & communication (R Descartes, B Franklin, etc), and how modern-day stage performers (actor, magicians, mentalists, lifestyle gurus, motivational speakers, etc) use speech content and body language to re-program peoples' minds in social interactions. I then learnt more on body sign language; and then more on codified iconographic systems of language that use symbols, mathematics, and programming languages. In my 2 final years at UCL (2014-2016), while doing biopsychology and computational neuroscience, I became interested in how machines could implement that. In 2016, I discovered that actually since 2012, Google Brain, DeepMind, FAIR, and others, have tried to implement these human language comprehension traits in machines, with great effect.
I am very interested in the use of these technologies below, as I have a lot of experience using these, and am keen on finding novel ways on how they can be developed and integrated with other tools: Here are my research interests since 2017:
Optical character recognition (OCR); fairly standard, essentially like a simple DCNN, see CognTech AR/VR.
Speech-to-text using not just syllable dictionaries bots we've had since the early 2000s, but recent RNNs to backtrack and differentiate domain-general syntax semantics, for note-taking/dictation, close captioning, search engines, etc., see CognTech AR/VR. Also see G Drive below for a review by me (a few demos) using Google tools.
Language comprehension models - context-dependent generative language, for relational reasoning, understanding questions, writing novels from scratch that can pass the Turing test, etc.
-- (update 2022:) for Operational Productivity; I have forked the git from OpenAI's GPT-3 https://github.com/openai/ to allow for data-flow co-integrations with some of my work, see also my discord and fiddle: https://discord.com/channels/974519864045756446/974542851167887360 ; <script async src="//jsfiddle.net/neuroman247/4h81Lmzv/embed/"></script> ; https://jsfiddle.net/neuroman247/4h81Lmzv/#&togetherjs=vPQLzEmfDg ;https://jsfiddle.net/z2j8079c/ ; that is also the visual implems with Microsoft Windows Publisher Dev Tools, Microsoft Azure, Bing Images, Visual Studio 2022, https://beta.openai.com/codex-javascript-sandbox ; https://github.com/microsoft/visual-chatgpt ; https://beta.openai.com/playground ; https://openai.com/blog/gpt-3-apps/ ; https://codeshare.io/ ; https://sway.office.com/kxiHviilNYFxa7Ck?ref=Link
Qualitative and quantitative thematic topic modelling analyses for the extraction of conceptual semantic gist at different levels of abstractions. This is helpful for summarising long portions of text. I have worked with NViVo/R/other programs for text-vector transcript analyses, and other standard programs for systematic sci-lit reviews.
Translation between languages, with visual OCR in augmented reality, and a large database of words and combinations of words (culture-specific idiom phrases), see CognTech AR/VR. Also see G Drive below for a review by me (a few demos) using Google tools.
Sentiment Analysis with Python and online SQL/PHP databases, see BNT IDMLA.
Realistic human-like voice synthesis: ML using low amounts of audio training data (i.e. minutes, as opposed to traditionally taking tens of hours), so then text-to-speech can be auto-generated with the author's voice (e.g. Descript Lyrebird, DeepMind WaveNet, ...) .