7

Me and my friend are developing a web-app in Python + Flask + PostgreSQL. We have been working on it for the past few months and have developed a lot of schema/use-cases specific to Python + Flask + PostgreSQL. Now, all of a sudden, we plan to move to another NoSQL database (Neo4j) because it somehow fits better to what the core of our web-app is going to be. Python supports Neo4j through embedded/rest api bindings, but uses a technology JPype, which is rather unmaintained, to say the least.

So, this question of using Scala arose today. Ours is going to be next-to-realtime application, so we can't afford to have the lag/overhead of having intermediate steps of Python->Java requests (Neo4j standalone server is based on embedded Neo4j). So, whereas I am in favour of spending a month or two and learning Scala/Lift, he is in favour of carrying on with Python and porting to Scala whenever need arises, even though when we know that Python+bindings will be a bit slower as compared to the native Neo4j support for Java.

In the past few months, we had done a lot of work in Python + Flask + PostgreSQL already. If we port to Scala, we will need to port all of it to Scala.

Would it be wise to port now? Are there any personal experiences or advices from you all? Or is this just premature optimization?

P.S.: I am aware of the learning curve of Scala/Lift.

c0da
  • 1,526
  • 3
  • 12
  • 20
  • 5
    " Or is this just premature optimization?" It's always premature optimization until you've measured where your bottlenecks are. Are you at a point where you can _measure_ if issues related to python are having a great enough to justify learning scala? – Wilduck Mar 15 '12 at 20:17
  • As I said, we are working on a next-to-realtime app. Obviously, doing the steps A->B will definitely take less time than A->C->B. That's my point. We are aware of the bottlenecks, and that is why this issue of using Scala arose. We are aware of it all, its just that I want to port now to save hassle later on, and my maybe my partner wants to avoid the hassle now and port when need arises. – c0da Mar 15 '12 at 20:21
  • 3
    That's not what I meant, obviously it will be faster, but can you measure specifically that you need speed in that part of your application more than anywhere else? If it is not 100% abundantly clear that you need to improve speed in a specific area more than any other, you're almost certainly optimizing prematurely. – Wilduck Mar 15 '12 at 20:25
  • The key here is to have an _objective measure_ of how much time you could save by switching to scala. Exactly how much more time does A->C->B take? Is it worth switching based on that _objective number_ that you determined. Until you have that number, you cannot reasonably make a decision. – Wilduck Mar 15 '12 at 20:27
  • I need the database retrieval to be as fast as possible. So, I need speed there. And Neo4j provides me that speed. Measured that already. – c0da Mar 15 '12 at 20:28
  • Fine, but you've already decided to switch to Neo4j. You need to measure the possible improvement in performance of the _specific change_ (data access in python/scala) as opposed to _all other possible changes_ (total execution time of a task in your web app). Only then can anyone answer the question of whether it would be a good idea to switch. Once you have this number, post it in your question. – Wilduck Mar 15 '12 at 20:34
  • 10
    My feeling is that you should carry on in Python, get a prototype up and running, put it live ASAP and see if it takes off. In all probability it won't (along with 99% of other web-apps), so you'll have saved yourself a lot of effort. Learn Scala in the meantime and use it for your *next* project! –  Mar 15 '12 at 21:03
  • Do consider using Neo4j as standalone (instead of embedded) REST server, isnt is designed to work excellently this way? That way you can separate frontend nodes. There are good Python -Neo REST bindings, including http://bulbflow.com/overview/ – Jesvin Jose Mar 16 '12 at 05:37
  • BTW, I am also investigating Scala+Lift and Neo4j. These are unintuitive and have a steep learning curve (and promise extreme performance), but I am considering them for clean development practices (like describing the problem domain well) instead of raw speed. Do you actually think that way? – Jesvin Jose Mar 16 '12 at 05:49
  • **In our case, we need real-time speed**. And going with Python + Neo4j isn't going to be a wise idea as there is an extra layer of JPype (which, as I mentioned in my question, is rather unmaintained). And also, mixing 2 languages for work is a bad idea. Isn't it? – c0da Mar 16 '12 at 06:14
  • How have you tested the speed difference between two ways. Do you have some realistic test data which makes sure that you are getting enough performance improvement which justifies throwing away a lot of hard work and delay your launch for another 6 months? Lots of questions are there and we need realtime speed is not enough of an argument! – codecool Mar 16 '12 at 09:05
  • Another thing if you decide to migrate is calculating how much time it will take you to again reach to the situation where you are standing right now. You have to have some definite time period decided before taking such an extreme step. – codecool Mar 16 '12 at 09:10
  • @codecool Have a look at these statistics from Peter Neubauer, who is a part of the neo4j team. They will reveal some things to you. http://lists.neo4j.org/pipermail/user/2010-December/005812.html – c0da Mar 16 '12 at 09:24
  • Also, the delay will not be off by 6 months. My team worked for past 4-5 months. We took more time in use-case development, designing architecture than in coding. I believe if my team puts in as much effort as they had already did, we will be off by a month or so max. – c0da Mar 16 '12 at 09:27
  • Have u designed architecture keeping in mind neo4j or postgresql? If postgresql then u have to plan again from scratch. – codecool Mar 16 '12 at 09:40
  • My team already has the blueprints of graph algos needed... It would definitely require work, but not as much as it was required for postgresql... – c0da Mar 16 '12 at 09:44

2 Answers2

5

Obviously you're not going to get a definite answer here, because it really depends on your individual case and, in particular, who's working on the project. I'm not sure what you intend to gain though, but it sounds like you would have to throw away a lot of work.

Scala is slightly more buzzword-compliant these days, but Python is no slouch when it comes to functional programming either. Still, it all depends on the specifics of your workload.

Just remember Knuth's advice about premature optimization; he's a pretty bright guy. I'd say if you're even remotely close to finishing, then don't change. It's far more important to get some working code as quickly as possible than to "do it right" the first time. Because you simply don't know what you will have done wrong until you can test it.

Then, once you know what your choke points are, design something that addresses your real concerns.

tylerl
  • 4,850
  • 21
  • 32
1

Another option, if you don't use many Python 2.6+ features might be to use Jython. Since this targets the same JVM as Scala, it should allow tighter, more efficient integration.

We provide a Jython console inside our Eclipse application so users can write their own scripts which have full access to the power of our Java back end. We can use Jython and Java classes interchangeably and seamlessly move between them. I find it a really flexible environment to work in.

The only real downside is that Jython is stuck at Python 2.5, so subsequent innovations in the language are not yet available.

Mark Booth
  • 14,214
  • 3
  • 40
  • 79