Could going backwards actually be the way forward for neural networks?

With all the talk about AGI, I thought it would be interesting to bring up the fact that we need to limit how neural networks generalize in order for them to perform as well as they do. I don’t know if this is something everyone already knows, but the multilayer perceptron (MLP), as it was first thought of in 1958, is theoretically capable of doing almost anything today’s top architectures can. This is because of the universal approximation theorem, which says that a large enough and well-trained MLP can approximate any continuous function to a very high degree of accuracy. The problem is that this doesn’t explain how easy it is to find those representations through training. We are still limited to smaller problems because tasks like NLP and computer vision require huge amounts of computational power.

The deep learning models we use today — like convolutional neural networks for images, recurrent or transformer-based models for sequences, and graph neural networks for graphs — are all shaped by these computational limits. To make things easier, we purposely put certain structural elements into models to reduce complexity, share parameters, and make training more manageable. Basically, we’ve put a lot of effort into figuring out which structures need to be learned for specific tasks, and instead of letting a network figure them out by itself, we tell it what to look for. From a model training perspective, we haven’t really made neural networks better at generalizing. We’ve just reduced the number of functions they can learn in order to make them work better on specific tasks.

If we could train large, fully connected networks (MLPs) without issues, we wouldn’t need these carefully designed architectures anymore. We could just train an overparameterized, fully connected network that, with enough data, finds its own internal representations. No convolutions, no attention mechanisms, no hand-crafted structures. This might lead to a situation where models get simpler in design, but their complexity comes from their size and the data they’re trained on.

In other words, we might be able to move forward by “moving backwards” — improving the efficiency and scalability of MLPs and getting rid of our current limits. This could reduce the need for specialized architectures and the careful training needed to get neural networks to do what they do now. But here’s the real question: Is it even possible to train fully connected networks at a large scale, or is this like the theoretical idea of warp drives in physics — even if the exotic matter needed to power it exists, we would need an amount of energy far beyond what we can currently handle?

And since we still have to customize architectures for specific domains or tasks, will there always be gaps in how we can use neural networks unless we overcome this computational barrier with MLPs? The issue is that we’re designing networks for specific tasks, but we don’t know what kind of surprising properties could come from structures we haven’t even thought of yet (and if that doesn’t make sense, it’s probably because I took an edible about 90 minutes ago, and it’s really kicking in).

Welcome to this forum

Guidelines for Question Discussions


Please keep the following in mind when posting here:

  • Posts should be over 100 characters — more detail is better.
  • Your question might have been answered already, so try the search feature if no one is responding.
    • Questions like ‘Is AI going to take our jobs?’ have already been asked plenty!
  • Feel free to discuss the pros and cons of AI, but remember to stay respectful.
  • Provide links to back up your points.
  • There are no stupid questions, except if it’s about AI bringing about the end times. It’s not happening.
Thanks, and let the moderators know if you have any questions or comments.

I’m a bot and this action was performed automatically. Contact the moderators if you have any concerns.

I get your point, but I think we need to revisit the ideas we previously rejected. I don’t think fully connected networks are the way to go. My bet is that the answer lies in self-organizing systems that grow to fit what’s needed, rather than being designed from scratch.

@Torin

I bet the answer is in self-organizing systems that grow to fit what’s needed.

I think you’re onto something. Another possible direction could be exploring a more Bayesian approach to neural networks. If it ever becomes possible to train an LLM with a Bayesian network, we could update it based on user interactions. This kind of updating system could bring a lot of improvements.

Maybe this is related? Check this out: