-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conceptualization Discussion #2
Comments
Fair points. We could put the convincing/realness scale on the first viewing only in the fiction condition to avoid the double presentation, what do you think? That'd mitigate the overall suspicion issue.
I really like how you framed that and indeed I didn't think of it in terms of 1st/3rd person + aesthetic studies.
I feel like if we want this study to be a step forward from our previous work we should aim at being a bit more refined in terms of our exploration of the psychological mechanisms hypothesized to be at stake. Like in this case, having 2 variables would make the data more rich (we can average them - if they are very correlated - to get an even more robust general measure; and analyze them separately - which I'm optimistic would yield some interesting dissociation). We could potentially explore the idea that the effect of fiction on emotion is related to a lowering of "Self" engagement, and thus would be more marked in the subjective scale than in the more objective one.
I takes <1.5 second to answer to a scale once participants are familiar with the instructions, so that's like + <2min total which I think is worth it.
Agreed, we need to work on the phrasing here "Focus on the first/third person" is a bit abstract (at least to me ^^), maybe something like:
Let's continue refining :) |
You totally convinced me about the 2 stimuli, thank you.
I'm still a bit concerned of asking realness together with the other 2 ratings, even if only for AI-generated stimuli. There are 2 reasons: (i) isn't that a bit hard to judge in just 2-3 secs? (ii) wouldn't asking that before / after asking to rate sexiness and arousal be influenced / influence the process of expliciting and conceptualising how arousing & how sexy we find the stimulus? |
Yes you're right about the fact that it potentially creates a confound... Mmh Okay then, maybe we re-lower slightly the number of stims (not 100% sure that's necessary, we can try first with the 80 and see how much time it adds up) and, indeed, have a second run at the end where we show (all?) the pictures and we say "In this second phase, we are evaluating the quality of our algorithm, and would like to ask you whether you notice any issues or problems with the images presented to you in the previous phase." and have a rating of like "Obviously fake - Very realistic". My hope is that we would expect:
One issue is that one could argue that then the ratings of arousal are influencing the posteriori ratings of reality, but that's fine as 1) the former is our primary target of interest and 2) we can statistically check the effect of arousal on reality beliefs and its interaction with the condition. 3) if that proves to be a big issue we can always in the future run a subsample with the order of the tasks inversed and see |
Despite having proposed the notion of "average viewer similar to you in terms of gender and sexual orientation" myself, I am not 100% sure we should keep it. I suspect that keeping it = improving accuracy. But refraining from specying who the average observer is (i.e. a samely sexually-oriented one) we would probably get a stronger dissociation. I stop here for now so to allow other to express their thoughts! |
Hey guys, 1 - About the individual traits scales (GAAIS, etc.): I bumped into this very interesting preprint, whose name is encouraging: "Can we assess attitudes toward AI with single items?" 2 - I like the fact that the image remains on the screen for a couple of seconds. It's different from our previous work, but I find it crucial, especially if we have images of couples, which are less likely to be perceived as AI-generated. One thing I'd slightly change is the presentation; I'd say: 1: AI-generated/Real > 2: the photo appears (but AI-generated/Real remains on the screen with the photo) > 3: everything disappears 3 - I see and appreciate Dominique's bottom-up logic saying: if the two measures are correlated, we average them; if they're not, even better: we have two potentially different measures. However, and I'd like to ask this to our philosophers here: let's assume that we average the two measures: what are we assessing, exactly? Can we think about some underlying factor? 4 - As for the main measures; I like the 1st/3rd person phrasing and the feel/think differentiation. However, I would avoid lengthy prompts and especially the "body" word. I suspect youngsters feel the body less, and/or are less prone to say that something was moving in their body (sounds a bit pervy?). This would lead to a a potential floor effect in the arousal measure and thus a decrease in the correlation of the two measures. I'd go for something like: Arousal Sexy *Other has some issue too: who are the others? What are their sexual orientations? 5 - I love Dominique's idea about the "we are evaluating the quality of our algorithm" thing! Hopefully, tomorrow I'll read everything again and come up with new suggestions. Bisous, besos, and baci =) Revilla, M., & Höhne, J. K. (2020). How long do respondents think online surveys should be? New evidence from two online panels in Germany. International Journal of Market Research, 62(5), 538–545. https://doi.org/10.1177/1470785320943049 |
Thanks @AleAnsani great comments I made some changes:
Can you give it a try and let me know how long does it take currently, so that we have a better idea. I'd still push back on having overlap text + image:
|
Hi guys, Concerning the "keep the cue" issue, I am conflicted. On the one hand, I can see Dominique's concerns that keeping the cue alongside the image as proposed by Alessandro might distract people (are they looking at the image or at the cue?) and ultimately elicit different processes. On the other hand, having performed the task, which is rather repetitive, I noticed that I hardly paid the cue enough attetion. |
In any case, giving that Dominique is putting a lot of effort and thought into it, I think it's good that we discuss, but when it comes up to deciding what to implement I trust his judgments -- and in worst case scenario we could switch the variables at some points (provided we stay sufficiently vague during the IRB/preregistration stage, that shouldn't be impossible, right?). |
Hello everyone! Sorry for taking so long. I've tried the exp, it took me 16 minutes, which is fine.
These are of course points of discussion. But @DominiqueMakowski , please feel free to ignore them if your time is running out. |
All good, I prefer to take a few days more and have something solid than rushing things out with major flaws. thanks a lot for your time and input, I'll read that again tomorrow and will propose some further changes and will let you know :) |
Some quick comments on @AleAnsani's points:
That being said, I am happy to discuss whatever further issue, but equally happy to trust @DominiqueMakowski for whichever choice he could make based upon this discussion! |
DurationOkay so it seems like we'er currently ~15min with a 2 phase expe with 54 stims. I think that's very good, we can easily throw in 1-2 questionnaires. CueIf we observe a small effect (or an absence) we can always try to make the manipulation stronger by going to the next level and crafting custom implicitly-priming descriptions or watermarks. It can be a natural follow-up to also see the difference between explicit dichotomous priming and something more "ecological" and realistic
Good idea, will try that (the colors will be assigned randomly between like, blue, green and red for each participants)
This makes it hard to counterbalance/randomly assign, opening us for criticism about potential confounds
Okay will do.
Me no-likey block paradigms 😬
lmao Too many images are completely unrelated to sexMmh that's an interesting one.
"By an Average person"
Fair point...
Thanks so much for the comments it's really cool to brainstorm this I feel like we're basically doing the job of future reviewers to ourselves and making our decisions fully justified and thought-of! Keep 'em thoughts coming! we still have some time |
Hi guys! I feel like I'm in uncharted territory, but "me quito el sombrero" (Spanish phrase) to the level of this discussion. I just carried out the experiment (16 minutes, which is nice), so here are some first impressions: -I wouldn't mention that the study takes 30 minutes on the first screen. In the age of tik tok, I´m afraid that 30 minutes can freak out more than one participant. If we do mention the duration, I would be inclined to say 15 minutes. Un abrazo! |
@marmarini said:
|
I tried to complete the experiment using a mobile device and encountered significant difficulties, particularly with slider movement and image formatting. The images were often too large for the screen, making it challenging to use the slider without zooming in. Probably, I would ensure compatibility with mobile devices, considering the prevalence of their use in online studies. |
@MarcoViola88 @marmarini @AleAnsani @AntonioOLR84 1. Strength of Manipulation
I hear the concerns that the manipulation is not strong enough and people will "not read" / "forget" the prime and there won't be any effect, but I am fairly confident that even with this weak form of manipulation it will work (we have done it in the past with even less strong type of cueing (Sperduti 2016) and it worked quite nicely. I think the sentiment might be also confounded by the fact that we as experimenters know the true nature of the stim and thus we are not engaged in the task in the same way as participants would be. I think the only way to definitely answer that is to run the study on a "preliminary" sample and see. We can start with the UK sample while we translate the experiment, then quickly check, if it works good we deploy at full scale if it doesn't we revise. 2. MultiplatformTo be honest, I would start with restricting the experimental condition to computers. I can see the value of going multiplatform to maximize the number of participants, but it introduces a ton of issues, among which:
Again, I'd advocate a step-by-step approach, we start with computers, and if we struggle to gather enough data to estimate reliable effects, then we adjust and open-up. 3. Analog/Likert scales
Don't say that to a statistician 😁 Yes, analog scales are slightly more tedious, but IMO it's one of their strength. Beyond being a "true" continuous variables, it also avoids some response biases (in particular automated responses where people just click on pre-determined numbers, as well as response clustering where the distribution gets skewed towards some response options). That said, I don't have a strong preference here (I just like true analog scales because it's nicer to statistically model and visualize), if the rest also think it's worth the change, we can give it a go :) 4. Neutral imagesI updated the stim selection to increase the number of ero stims and decrease the neutral ones. We have now 60 stims (see here). Again, the reason for their inclusion is this:
The risk of further decreasing their number is the "inverse nice buttocks effect" whereby their salience would be inflated. 5. Side questionnairesThoughts on side measures? Please continue arguing against/for. This dialectical process is really good :) |
Hi Dominique, RE: 1. Strength of Manipulation 2. Multiplatform Nothing to say about 3-4, I defer to the experts. In the meantime, I'll stay updated with the others' comments! |
I made a few additions: Demographics
"Control" questionnaire(s)I checked again exactly what we used in our recent FakeFace study. The goal was to add questions about people's expectations about image-generating algorithms (since we also had a cover story that we are testing an AI-image generation algorithm). But to not have these questions alone (avoid raising suspiciousness), we intermixed them with items from the GAAIS that contains general questions about attitudes towards AI. I re-checked in details the items and also read Schepman's revalidation paper of the GAAIS. Based on their newest data, as well as on ours, I decided to revise/improve the original combo - now named for the occasion the Beliefs about Artificial Images Technology (BAIT) questionnaire. I took 6 (3 positive + 3 negative) items from the GAAIS, + 6 BAIT-proper items. These items are aimed at measuring people's expectations regarding CGI that could interfere with our experiment design. What do you think? With these additions, the duration should now hover close to 20 min. Unless there is something important to add, I think we're almost "feature-complete". Note that the link to the experiment has changed (but always available from the README) |
Hey all! Sorry for the late reply. I took some time to participate to the new version of the study. Thanks, @DominiqueMakowski , for your incredible work! I agree that we're almost there! Here are my comments:
Here's the reference:
I think that's it. Thank you for your time and sorry again for my late reply. |
But arguably yes, this is very exploratory and this item is probably not a very good proxy of such bodily state, but I just thought we could throw it in there as an optional question just to out of curiosity
Regarding your question, I think your justification using alpha/omega holds, as it shows that you can reduce these 6 items to obtain one meaningful score. Then you can say that it was justified based on the needs and hypotheses of your study. But then maybe the editor is a GAAIS author you never know 😬
|
This comment was marked as resolved.
This comment was marked as resolved.
Hi guys, thank you for the splendid fine-tuning :-) Since I'm pretty convinced by your 'trimming' of the scales (& not authoritative enough when it comes to statistics, sadly), I won't anything there -- I like the idea to adopt 'slimmer' versions of several scales. (BTW, I love the idea & acronym of the 'BAIT' scale!) But let me express my (partial) skepticism about horniness.
Is not a good proxy of horniness. I might be a very horny person who does not engage in sexual activity (incl. masturbation) because I have no time. I might be horny and HENCE have frequent sexual intercourse; OR, I might be horny because I have not had sexual intercourse / masturbation recently. And so on. I might have had a sexual intercourse despite my lack of interest, just to please my partner (creepy as it sounds, it happens, I guess...). In sum, I am not convinced this is a good proxy. Perhaps a better way would be to ask it staright away: how "horny" do you take to be in general? I see some drawbacks of this sort of question, of course, but it seems still preferable that using such a spurious 'behavioral' proxy... ANOTHER ISSUE JUST POPPED TO MIND: I guess that after these latter fine-tuning we're almost done, aren't we? |
I implemented most changes, added a few screens here and there to fluidify the experience, finalized the consent form & debriefing, adjusted the order of things, grouped/conditionally displayed items (e.g., birth control) etc. I am quite happy with it :) Things of note:
Please give it a go: https://realitybending.github.io/FictionEro/experiment/english1.html |
Note: after a brief survey with native speakers, we changed the "Picture" cue in favour of "Photograph" which seems to be more "reality-loaded" |
Don't hate me but after further thinking, I think it would be incomplete, especially given the presence of non-ero stims, to not have a question about emotional valence. It offers a new dimension of pleasantness/unpleasantness that could capture reactions to non-ero stims as well as ero images that one could judge as disgusting etc. We piloted it this new version on a couple of people and the duration seems to be ~24min (probably adding the 3rd scale adds ~2min, but I think it's worth it?), but the feedback was that it's rather fun and not too long 🤷 |
Hi guys,
Whatever Dominique decides to do with the points above, I think we are ready to move to the next step. Dominique, have you already submitted the application to the IRB? In order to begin data collection we need to decide a protocol (e.g. snowball sampling? only free participants or paid participants too? and so on). Moreover, we need to translate the experiment in Italian & Spanish -- I'm ready & wlling to that in the next few days! PS happy new years guys! |
Nice decentering skills, we should EEG you while you adopt the two mindsets ^^
We did submit a first ethics application, we hope to hear back once people are back from the break :) Once it is cleared I'll add the document to the repo and will let you know. Here, as soon as we're good to go the students will start convenience sampling to recruit as much as they can (online, free). We'll see how it goes In the meantime, we can start the translation. For this, one needs to create a copy of this file, named e.g., |
Hi guys! |
the ethics has been approved 🥳 (it took more time than expected as the committee got reorganized just before Christmas). So we will start collecting some data asap (PS: do send me your OSF accounts names if you have ones so I can add you to the OSF data repo). I made a few improvements to facilitate the tracking of the "source" of participants (and avoid needless duplication of HTML files): So the new URL is now note the question mark so when we want to test the experiment, we can write TEST so we can then filter them out from the data. The link at the end of each experiment that invites participants to share it with others has Within the next few days/weeks (I need to finish to prepare a module first), I'll set up the spanish & italian versions so that we just need to then replace the text |
Hi guys, I'm finalizing the translation in ITA, but I'd like to ask Marco &/or Alessandro to check them before uploading it. BTW, here is my OSF profile: osf.io/e6js2 |
Hi guys, Fantastic news about the ethics approval! Thanks for letting me know about the tweaks you made. Huge thanks for all your hard work. Can't wait for this data collection to kick off! |
Good point, let me drop an email to Sperduti fto see if he has some bandwidth at the moment to run a French arm :) |
Hi guys, sorry I disappeared (again), I've just moved back to Jyväskylä.
What do you guys think? Anyway, these are minor things, and I wouldn't object to moving on to the next phase without making these tiny adjustments. (Sorry about any possible mistake here, it's a bit late and I'm sleepy, but I wanted to give you my take on the exp) P.S.: My OSF profile is: osf.io/47v9u |
Thanks @AleAnsani
+ We started collecting and we have already a couple of participants so far ^^ so let's just roll for now and see how it goes |
|
Hi,
|
Nice!!
Well afaic the only limitation for prolific is money 😁 Just for budget calculation, they say " We recommend you pay participants at least £9.00 / $12.00 per hour, while the minimum pay allowed is £6.00 / $8.00 per hour." So for a half-an-hour experiment it would be £3-4.50, + prolific fees. I'd say £4/4.70€ (or £4.30/5€) per participant in total (including prolific's cut) is fair. So we have to budget around 50€/10 participants.
Which makes me think that we kinda-sorta forgot to preregister 😱 Either we do it now (but we say data collection is on its way, but not processed); or we preregister the non-english versions "separately"? I don't think it's a big of a deal, since our hypothese are quite obvious and clearly stated but well better to do it than not. As for the inclusion criteria, what do you have in mind?
Thanks for the detailed issue on the translation, I'll try to fix the remaining points |
Hi guys, these are definetely good news. |
of course, feel free to remove me even if you prefer i don't mind (it's just that I need to be ther for the english version as per ethics approval but otherwise I really don't care) |
Nice to meet you all, and thanks for the collaboration proposal. |
Hello Marco, happy to have you onboard :-) And BTW, since this brings the # of Marcos involved in this study to 3 (including myself and Marco Marini), so I propose to include last name in futher communications XD I see from OSF that data collection is going smoothly in UK. Good to know! |
It'd be nice to have a chat about power analysis with @DominiqueMakowski. I don't know if we're going to go for GLMMs or go Bayesian. In the first case, power calculations are a bit blurred (in that, as far as I know, there's no consensus on the ultimate way to compute power (some indications here, here, and here); simr might be a good R tool for simulation, but again, there's no consensus). To be safe, we could just stick to what we did in Marini et al. (2024), but again, @DominiqueMakowski is the absolute master here =) Apart from that, I started drafting an R code to merge the CSVs and do some data cleaning, but I stopped when I realized that the response variable needed some JSON manipulation (i.e., all the DVs are within the same cell, whereas I would place their values in 3 different columns). I hope I'll continue in the next few days, although idk if I'll have time enough very soon. P.S.: Welcome on board @marcosperduti !! We're so glad to have you here!! Best of luck with the ethical approval ;) |
Thank you. BTW did you ask for ethical approval in your University? Or are you running the study under the Dominique's ethical approval? |
@marcosperduti as for me, I haven't requested any approval from Jyväskylä. I don't think I'll recruit participants through their official channels, so I don't think I need it. But you raised a good point, maybe I'd better ask for this... |
Need to translate these variables. Once it's done and we're ready to deploy, I will uncomment the saving of the data for the italian version and we can proceed
You can copy and paste that
I don't like power analyses, I think they rely on absurd assumptions, tend to give non-realistic estimates, and simply don't scale well with the type of analysis that we want to do (i.e., go beyond t-tests and correlations). I tend to be from the school of "collect as much as you possible and then do the right stats to reliably estimate the uncertainty and incorporate that in the interpretation and discussion". (See this and this that just came out; as well as this).
Bayesian GLMMs 🙃 (though TBF I expect Frequentist GLMMs will give the same evidence)
I see you're all hungry for some data eh ^^ I posted in
Let's if possible let's not post the analysis scripts publically yet (because as it's the job of the supervision students I don't want them to have everything already done :) - we can discuss results and show graphs etc. but just don't reveal the code ^^ Note:
Results spoiler alertIt seems like our manipulation works, but not primarily on the scale that we would have hoped... and not as strongly as I expected
|
This comment was marked as resolved.
This comment was marked as resolved.
weird why it doesn't let you (the mysteries of github), anyway I updated the file, let's make sure we didn't forget any line or button |
I asked for some advice to my local ethical committee. They basically tell me that I do not need to ask for ethical approval here if the data are completely anonymous (I think that is the case). |
Hi guys, today the ethical commitee of my university confirmed that I don´t need local aproval, since the study has already been previously approved (thanks @DominiqueMakowski !). Once the Spanish version is uploaded, we can proceed with the Colombian sample. |
I'm closing this - we'll reopen a new conceptualization discussion for the follow-ups |
In the instructions, DM proposes the following 3 scales:
Let us start with the latter.
In the first implementation, Convicing is not asked to avoid triggering distrust in our manipulation.
An idea might be to split the experiment in 2 tasks:
Now, let's consisder Sexy / Arousing.
Do we want 2 separate measures rather than a single, undifferentiated "sexual arousal" measure?
On the one hand, having 2 ratings can help disentangling 1st person/3rd person. They also somehow align with the literature on AI-generated arts, where experimenters often ask both "how beautiful is it?" & "how much YOU like it?". Moreover, it could be interesting to analyze whether the two correlates or not. Not to mention that arousal measures can be directly compared with normative ratings from the EroNAPS.
On the other hand, they might slow down the experiment a bit. This might not be necessary if we cut down the stimuli a bit (cf. the other issue). I am more concerned about the fact that, as they are currently presented, the instructions may under-constrain the construct we want to measure. To put it simply, I suspect that if we ask a single self-report measure people will somehow "put everything" in there, whereas if we give two measures they could arbitrarily interpret them, and each one would be reporting something different (e.g. how sexually appealing for me VS how sexually appealing for people in general VS how sexually appealing for people of your gender and sexual orientation etc.). Of course, this worry can be mitigated by more specific phrasing. Two sketches below:
The text was updated successfully, but these errors were encountered: