Anyone know Python or Anaconda?

Mierin

Donor
No Sir Mix a Lot references or Out of Context Quotes tomfoolery please.

(Ok, that was really the only reason I mentioned anaconda).

 
Last edited by a moderator:
By "know", is "just starting to use it" enough?
default_biggrin.png


What's the question?

 
What's the question?
I've got about 1,200 different categories, which I've put into 1,200 dataframes (not sure if that's a good way to go about this, but it's what I did).

Example:

LastName FirstName

CD Teacher

Zoogs Bob

Damodred Moiraine

Damodred Alastair

Zoogs Larry

Zoogs Bob

where LastName is the category and I want to do "stuff" to the first names. By stuff I mean lots of different operations. There's a different dataframe for each last name. I want to do the same "stuff" to the first names for every LastName. I just can't figure out how to make a loop that will go through multiple dataframes.

In basic English the loop would be:

For each last name:

Do these 30 fancy things to the first names

Not sure why I just now thought of this but once I put them into dataframes I could delete the last name column. That might help uncomplicate things for me. I started learning Python a week ago. There are about 3 other columns in the data set.

 
Last edited by a moderator:
Bob Zoogs, that's me.

Do you use R, by any chance? "Dataframes" sounds rather R-like.

It *sounds* like which category it is determines what exactly you'll be doing to them. I don't completely see why there's one dataframe per category, rather than simply having (for example) the LastName variable be a category.

I apologize, since if this is a python syntax question I probably can't answer. Generally speaking, I guess I'd have a list of all the dataframes (in python, everything's technically a pointer, right? So this shouldn't be too costly?) and iterate through the list. In R you'd probably use some form of lapply family. Pseudocode wise

for(i = 0; i < length(dfList); i++) {
doTheFancyThings();
}
Where:
Code:
doTheFancyThings(fancyThingType) {
  // if the thirty things depend on the category...
  switch(fancyThingType) {
    case 'zoogs':
        return doAwesomeThings();
        break;
    case 'Damodred':
        return doDredfulThings();
        break;
    default:
        return banTeach();
        break;
    }
}
...although I think you basically have that part already, so I'm not sure this has helped
default_smile.png
Sorry if that was super basic. I should say I barely know python. I looked up looping over dataframes and saw something about pandas. [Technically, I think I only got stuff about looping through *one* dataframe]. Heh. Seems like a fun language!
default_biggrin.png

R should be pretty well suited to something like this. The methodology is Split/Apply/Combine but that kind of supposes you have everything (including the LastName field) in *one* dataframe. Have it as a factor, and then I think it's tapply your way through that. I'm not super familiar with dplyr, but I'm sure that would provide an even easier grammar for the operation.

 
Last edited by a moderator:
Bob Zoogs, that's me.

Do you use R, by any chance? "Dataframes" sounds rather R-like.

It *sounds* like which category it is determines what exactly you'll be doing to them. I don't completely see why there's one dataframe per category, rather than simply having (for example) the LastName variable be a category.

I apologize, since if this is a python syntax question I probably can't answer. Generally speaking, I guess I'd have a list of all the dataframes (in python, everything's technically a pointer, right? So this shouldn't be too costly?) and iterate through the list. In R you'd probably use some form of lapply family. Pseudocode wise

for(i = 0; i < length(dfList); i++) {
doTheFancyThings();
}
Where:
Code:
doTheFancyThings(fancyThingType) {
  // if the thirty things depend on the category...
  switch(fancyThingType) {
    case 'zoogs':
        return doAwesomeThings();
        break;
    case 'Damodred':
        return doDredfulThings();
        break;
    default:
        return banTeach();
        break;
    }
}
...although I think you basically have that part already, so I'm not sure this has helped
default_smile.png
Sorry if that was super basic. I should say I barely know python. I looked up looping over dataframes and saw something about pandas. [Technically, I think I only got stuff about looping through *one* dataframe]. Heh. Seems like a fun language!
default_biggrin.png

Yes I know R. Not super well but better than Python. But the things I need to do need to be done in Python.

The reason the last names need to be dataframes is due to memory. From what I've read the dataset I have will be too big to keep it all together while I'm doing these things to it, so I'm going to merge them back together in the end. Also, I know how to do the fancy things to the names individually. I need the loop to go through all of them instead of naming the 1,200 names. Anyhow... I will probably pester the people of stackoverflow again.

 
Last edited by a moderator:
Bob Zoogs, that's me.

Do you use R, by any chance? "Dataframes" sounds rather R-like.

It *sounds* like which category it is determines what exactly you'll be doing to them. I don't completely see why there's one dataframe per category, rather than simply having (for example) the LastName variable be a category.

I apologize, since if this is a python syntax question I probably can't answer. Generally speaking, I guess I'd have a list of all the dataframes (in python, everything's technically a pointer, right? So this shouldn't be too costly?) and iterate through the list. In R you'd probably use some form of lapply family. Pseudocode wise

for(i = 0; i < length(dfList); i++) {
doTheFancyThings();
}
Where:
Code:
doTheFancyThings(fancyThingType) {
  // if the thirty things depend on the category...
  switch(fancyThingType) {
    case 'zoogs':
        return doAwesomeThings();
        break;
    case 'Damodred':
        return doDredfulThings();
        break;
    default:
        return banTeach();
        break;
    }
}
...although I think you basically have that part already, so I'm not sure this has helped
default_smile.png
Sorry if that was super basic. I should say I barely know python. I looked up looping over dataframes and saw something about pandas. [Technically, I think I only got stuff about looping through *one* dataframe]. Heh. Seems like a fun language!
default_biggrin.png

R should be pretty well suited to something like this. The methodology is Split/Apply/Combine but that kind of supposes you have everything (including the LastName field) in *one* dataframe. Have it as a factor, and then I think it's tapply your way through that. I'm not super familiar with dplyr, but I'm sure that would provide an even easier grammar for the operation.

 
The reason the last names need to be dataframes is due to memory. From what I've read the dataset I have will be too big to keep it all together while I'm doing these things to it, so I'm going to merge them back together in the end. Also, I know how to do the fancy things to the names individually. I need the loop to go through all of them instead of naming the 1,200 names. Anyhow... I will probably pester the people of stackoverflow again.
Oh, wow, that's interesting. I'm not sure how memory issues work in python. Please keep us posted, as I'll be curious to see the solution!

Also, does this help?

http://stackoverflow.com/questions/36601956/how-can-i-iterate-through-multiple-dataframes-to-select-a-column-in-each-in-pyth

They have

for name in dfList:
Or: pandas http://pandas.pydata.org/pandas-docs/stable/groupby.html
(I can't tell if memory will play a factor there in your case. If you already have the data manually split out, is it not possible to have that in a list and loop over it?)

 
The reason the last names need to be dataframes is due to memory. From what I've read the dataset I have will be too big to keep it all together while I'm doing these things to it, so I'm going to merge them back together in the end. Also, I know how to do the fancy things to the names individually. I need the loop to go through all of them instead of naming the 1,200 names. Anyhow... I will probably pester the people of stackoverflow again.
Oh, wow, that's interesting. I'm not sure how memory issues work in python. Please keep us posted, as I'll be curious to see the solution!

Also, does this help?

http://stackoverflow.com/questions/36601956/how-can-i-iterate-through-multiple-dataframes-to-select-a-column-in-each-in-pyth

They have

for name in dfList:
Or: pandas http://pandas.pydata.org/pandas-docs/stable/groupby.html
(I can't tell if memory will play a factor there in your case. If you already have the data manually split out, is it not possible to have that in a list and loop over it?)

I've actually been to that post and since Python is so new to me it doesn't make a lot of sense and I can't really translate it to what I want to do. Pandas is what I'm using.

 
Back
Top